What do Grand Theft Auto and Obamacare have in common?
by Andrew C. Oliver, President & Founder of Open Software Integrators
Despite the PR, launch day crashes are nearly always avoidable.
Next time Electronic Arts or one of the other game companies says their online content, site, or backend crashed on launch day due to “unanticipated demand,” know that they are lying. It is especially easy to call them on it if they pressed discs. However, even if they didn’t, they had a business plan with certain sales numbers, etc. In fact, instead of having a plugin that replaces “the cloud” with “my butt”, maybe we should have a plugin that replaces “unanticipated demand” with “inadequate testing and development practices”.
The agencies and companies involved with the Affordable Care Act, better known as Obamacare, also had stated how many people they thought would be signing up the first day. Moreover, there are numbers to predict the capacity required for the system. If 15.4% of the population is uninsured and the US population is roughly 316,896,864 then you can anticipate up to 48,802,117 people using the site. Granted they won’t all come at once, but there is plenty of data and research out there about when people use the Internet. However, you could also look at patterns in early voting, tax filing or anything else that has a “when can you start” and “absolute deadline”. Heck, you could look at the launch of the next iPhone or when people install OTA updates and you will see the same between-the-camel-humps curve. So we could “guess” that they needed to build a site that would handle roughly 7M (15%) users on the first day. Given the “gee whiz” factor and the additional people who would hit it for the “news” factor, I would probably double that number. There are plenty of sites on the internet built for that kind of load.
Rockstar, the maker of Grand Theft Auto (GTA), knows how many copies they sold during the last release. They had to tell their investors or stockholders how many copies they expected to sell. Rockstar even had historical data to know how this would relate to usage patterns. In other words, they knew almost exactly how many people would use the game the first day and throughout. There is always some random unpredictability that breaks the pattern, but almost always in a way that reduces the amount of traffic (i.e. the game could get a bad review). The Obama Administration actually had less data since they were launching a new site and probably lower expectations. Frankly, Rockstar had no excuse. Both probably knew the site would tank in advance and decided to launch anyhow. Since these crises are clearly avoidable, I am always baffled when they happen.
In fact, my anticipated demand for Heathcare.gov would have overshot the actual first day mark of 2.5m users. Open Software Integrators has worked on sites that handled several times that many users with probably equally complicated backends. We have also done so on an emergency basis for sites that failed to handle their load. The failures have similar characteristics and the successes have components that can be replicated.
It starts with good practices
I frequently tell our marketing people that I don’t want another “Eat your vegetables and brush your teeth” article. Software development has been researched and analyzed since the 60’s, yet people still seem to disregard the facts. The stuff that makes it successful, predictable, stable and able to handle a reasonable amount of load hasn’t changed since...well, since humanity first picked up tools. Sure, it’s more abstract, but the basics are the same.
To use a dreaded car analogy, if I designed and manufactured a car without concern for engine life, load, or any of that, manufactured it by hiring the cheapest labor I could find, and then tested it by doing a single lap around the parking lot before I sold it, I would not only be assured to have produced a pretty bad car, but a daunting lawsuit as well. There are hundreds if not thousands of software development methodologies to pick from and nearly all of the modern ones suggest the following:
Software quality is directly connected to the quality of developers that work on it
Integration should be continuous and iterative
Requirements should be refined throughout the project
Yet many organizations devolve into the good ol’ waterfall model. This assumes that somehow, some way, though it has never worked before, you will completely have all of the requirements up front, fully designed and have imagined the system before you start coding it.
The rest devolve into some kind of politically managed anarchy.
Load testing is the last great frontier
Even fairly “agile” shops that have good testing practices and which use “continuous integration” to assure quality often wait until the very end to test the load and capacity of the system. This is far too late for a massive project like Obamacare or even Grand Theft Auto. The best time to start capacity testing is with the production of your first runnable build. Management should be reviewing graphs of whether the system or some fraction of it still handles the expected load.
Each piece of code uses resources. Each release or hardware decision has different performance and scalability characteristics. The best time to know you have a problem meeting your goals is when you have the greatest flexibility to change direction.
This means making some of your hardware buys (or better yet cloud choices) incremental rather than as a Big Design Up Front (BDUF). This means making load test scenarios that predict scalability impacts (2x the number of servers don’t necessarily handle 2x the load of the existing system).
This is just good business
As a technology guy who became an entrepreneur and then a manager of an ongoing concern, I’ve spent a lot of time reading business, finance, and management books. Nearly all recommend having a grand overall plan as well as testing the waters, revising it periodically, and making incremental decisions. Nearly all recommend looking at the metrics of success and measuring them against your plan. Nearly all suggest some form of capacity planning. So why would this be an exception? Why wouldn’t your business plan drive your system capacity decisions? Why in the world would you not rate your system until launch day? I don’t know, maybe the Project Manager for healthcare.gov or the CEO of Rockstar can explain that to me.
Disclaimer: Open Software Integrators has worked for multiple game companies that compete with the ones I’m picking on. I strictly chose EA and Rockstar since we haven’t worked with them, so they can’t be mad at me.