Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

The joy of data center automation -- and its hidden dangers

Paul Venezia | March 18, 2014
Automating manual or repetitive tasks is a source of great satisfaction for admins, but great disasters may await

What should happen in that case is that the server throws an error via an SNMP trap, or email, and does not bring up the Web services. This problem would be instantly visible and presumably easily fixed after some digging. However, if the server continues to bring up all the services and is joined to the load-balanced group, it may not function properly.

Depending on the actual problem encountered, this might mean that all services on the new server are broken, which would be fairly obvious after a fashion, but it may mean something much more subtle than that. In fact, it could very well be something that will evade detection by service, content, and application monitoring frameworks. The server may look OK, but it really isn't. That is a much more devious problem to tackle.

It's even more troubling if the impact is relatively minor, meaning that problem reports come in sporadically during certain times of the day when new servers are spawned from that template, or only a subset of all users are affected because servers that are already running do not have the same issue. Those issues are extremely difficult to find. I would much rather have a dozen servers spin up, hit an error, send an alarm, and halt than come up all the way and corrupt an application. An application that's slowing down due to lower capacity is better than a fast app that's broken and potentially damaging a database.

The point is that seemingly minor automation efforts may work flawlessly for a long, long time before they run off the rails. Autopilot is great, but we still want someone looking over the instruments to make sure that things are running as they should. Throwing as much error checking as possible into what are otherwise simple automation tasks may seem somewhat onerous at first, but it is as critical as the automation step itself.

Some of my automation scripts are 25 percent functional code and 75 percent error checking and failure handling. I'm also a big fan of automation scripts that have no output whatsoever if problems are not encountered, but will throw debug info out to STDOUTif they run into problems. When used with mail -Ein cronjobs or startup scripts, debug to STDOUT makes for a very simple notification step right from the source.

Automation truly is a source of great satisfaction. We get to build a clever framework to facilitate some goal, then watch it work. But like that Lego car, if we aren't paying attention, it will run into a wall eventually. It's best to plan for that right from the outset.

Source: InfoWorld


Previous Page  1  2 

Sign up for MIS Asia eNewsletters.