One of our systems administrators at work setup our zero downtime deployment solution with Tomcat which so far has worked pretty well for us. We needed a zero downtime deployment solution because we strive to practice agile development and therefor launch a new release of our site every 2 weeks, plus we need the ability to launch a bug fix without taking the site down. Lastly, we were able to do zero downtime deploys with our old Perl/Modperl code base so it wouldn’t have helped our case to port from Perl to Java if we now had to incur downtime to launch a new release.
This is one area where Perl, Ruby, Pathon, et al shine because you can cut over to a new release in a matter of seconds by doing “mv website website.old; mv website.new website; cluster apachectl graceful”. You can also patch a file in the release branch and just move that file live without having to redploy the whole release. Lastly you can add debug log messages to a live class file just by editing it on the production site when the fit really hits the shan and you can’t reproduce the bug on your dev or staging environments. Java has a lot of other merits though such as excellent IDE’s with extermely powerful refactoring abilities, OR mapping, plethora of MVC frameworks, etc… which is why we chose it to replace Perl.
As it stands with our current setup we can launch a new release of the site with zero downtime unless incompatible database schema changes need to go live. In those cases we’ll schedule a maintenance window late at night to launch a release. Nobody likes staying up late at night though so I also view zero downtime deploys as advantageous for employee retention.
Here’s our current setup for zero downtime deploys:
1. We use replicated sessions in Tomcat so that the load balancer can bounce users around from one app server to another as it sees fit.
2. Our configuration is such that each physical server runs Apache, mod_jk (not mod_jk2), and a Tomcat instance.
3. Each mod_jk is configured to favor using the Tomcat on the local machine but can failover to a Tomcat running on another machine if the local instance gets shutdown.
To deploy a new WAR we have a script we call that we pass the path to the WAR file to. In the future we want to add functionality to this script to automatically SCP the WAR over from our staging/qa server, checksum it, and then deploy it. Here’s what the script currently does:
1. It SSHs to each application server sequentially (using public/private key pairs to authenticate without a password).
2. Shuts down Tomcat and then Apache’s mod_jk automatically fails over to another still runnning Tomcat on another machine.
3. Drops the new WAR into place and starts up Tomcat and then waits until Tomcat is again listening on port 8009. At that point mod_jk on the local server will start using the local Tomcat again.
5. Moves onto the next application server and repeats until it’s gone through all the app servers and our upgrade is complete.