Roll out the server...
But not this one. Right now I'm going to talk a bit about the rollout itself:
The Rollout
It actually went pretty well as far as big rollouts go. Most of the big components of the system had changed in one way or another so there were a fair number of upgrades to coordinate. For upgrades this large, Jenn (testing/QA), Ross (databases/sysadmin), and I usually draw up a detailed, step-by-step plan in advance to minimize downtime. Which we did: the web server was down only about 2 minutes. Well, it would have 2 minutes except for...
- Runaway squids
- We use the Squid Proxy to cache requests in front of our server and it uses separate redirector processors to rewrite the incoming URLs. Unfortunately, when we took squid down, the redirector processes didn't die. And more unfortunately, when we restarted squid the orphaned proceeses decided to start consuming system resources with reckless abandon. But once we realized what was going on we killed the offending processes and all was right with the world. Except for...
- Temporary amnesia
- Somehow in all of our planning we managed to omit two of the component upgrades that were scheduled to be performed: some Debian package upgrades and some Zope debug scripts. Fortunately these were relatively minor. We remembered the Zope scripts right after we put the server up and ran them. The package upgrades we remembered a bit later were finished today (I think. Right Ross?)
Who serves when you're not serving?
One of the things that did work really well about the rollout was the temporary webserver we put in place to handle requests while we were doing the upgrade. Much nicer than just refusing connections or hanging. It accepts requests for any URL on the server and returns a temporary message that lets people know the site is down and will be back up shortly. And it's written in 15 lines of python, using the SimpleHTTPServer module from the standard library and probably could have been even shorter. I love python.
