Element34's Adam Goucher suggests that micro-experiments now on an internal tool can prevent pain in production later.
To get there, Goucher suggests starting with something other than your production application. Your first cloud application might be a Toy Project, a sample that doesn't do anything. After that, the team might write an internal tool, perhaps something to automate part of the testing process. Building the internal tool bootstraps the framework to get a cloud production without creating the risks.
Once you're confident with the tools, Goucher recommends a strangler pattern. "Take a small piece of the application, perhaps just the REST API, or just one REST API service. Segment it from the rest of the code and implement an entire end-to-end cloud stack," he says. "Configurate as code with Chef or Puppet; automate provisioning with [ Amazon Elastic Cloud Compute] or OpenStack. Enable no-touch deployment and automate your database migrations. Do the entire conversion on a tiny sliver of functionality then extend it, not the other way around."
If a Server Falls In the Woods, But No One Hears It, It Makes No Sound
Its been six years since Ed Keyes, a test engineer at Google proclaimed at a Google Tech Talk, "Production monitoring, sufficiently advanced, is indistinguishable from testing." The "sense and respond" model that Keyes advocates asks this question: If your own operations team notices and fixes a problem in production before most customers, do you really have a problem at all?
Noah Sussman had that idea in mind when he implemented continuous integration and deployment at Etsy. "A lot of the research on preventing defects comes from air travel and medical software work, where the cost to make a change in production is very expensive and the impact of an error is catastrophic," Sussman says. "A long release tail might make sense for shipping a physical CD, or even pushing a version to an app store. With a website, you don't need to do that; anyone who refreshes the page gets the latest version. This changes the risk profile for the web."
To get to rapid deployment, Sussman continues, "You need to get comfortable pushing to production all the time. That means it has to be safe to make changes on a constant basis. We view a change that doesn't have the desired effect as a learning outcome."
That means changes might not increase revenue. A site might even go down for a few minutes, and revenue might decrease during that time, but the team learns something from the experience with lasting value for next time.
To catch errors fast, Sussman suggests monitoring hooks on every level of the application, perhaps using a tool such as Stats.d. This requires open access to the codebase, where any programmer can commit on any level of the architecture at any time and, therefore, monitor every piece of system behavior.
Sign up for MIS Asia eNewsletters.