Post reorder due to top posting. > > On Fri, 2005-05-27 at 13:18, bruce wrote: > > hi.. > > > > i'm trying to get my hands around the issue of managing servers. in > > particular, i'm trying to get a better understanding of how people manage > > their servers when it comes to installing/upgrading apps. > > > > You have to implement a change control process. This process controls > everything that is modified on your servers. Part of this process is to > test all changes prior to implementation and to develop back out plans > in case something does not work as expected. Such a process documents > everything about your servers. > > You will not find a simple tool or set of tools to do this for you. It > is difficult to do. If it was easy then anyone could do the job. :) > > You may want to read up on ITIL. It is a documented method for > implementing such processes. > On Fri, 2005-05-27 at 13:52, bruce wrote: > i would have thought that some would have said that you used some sort of > control panel to aid in the overall management process... A control panel would be just one part of a full change management process for handling large numbers of servers. There are a variety of tools that can be used. But implementing a tool and calling it a change management process is not the right way to do such things. You need tools to monitor the systems and applications that make up your environment, tools to manage the systems, and tools to report on your systems. Many tools combine some or all of these functions. On top of the tools you need policies that detail how things will be done in your environment. For instance you can implement Big Brother to monitor your systems, MRTG to report on your environment, and setup your own yum repository for handling upgrades. But you still need a process around those tools that details how you test new software being introduced, maintenance windows, backout plans, escalation paths, etc. For the specifics of installing/upgrading apps regression testing against all applications on a system is something that needs to be done so you don't have surprises in your production environment. Upgrading the kernel most of the time does not present any problems. However, I have seen where upgrading the kernel breaks the Cisco VPN client (BTW: I have not found a work around for that one yet...) With out proper testing such things would be found the next morning when the call center gets thousands of calls from people that can not connect via VPN. Something someone called a "resume generating event". Not something you want to have happen. Just throwing a few tools onto a system or two and saying you are managing things is a disservice to your customers at best and could jeopardize those systems at worse. -- Scot L. Harris webid@xxxxxxxxxx Novinson's Revolutionary Discovery: When comes the revolution, things will be different -- not better, just different.