The May 2017 update for ACCS (17.2.3) brought a cool new feature to ACCS, zero-downtime updates. While previously, there was support for performing a rolling restart of an application instance, where each instance would be brought down and updated in turn, this only enabled zero-downtime updates if you had an application running two or more instances, and your application could satisfy performance requirements with one fewer node. While any production system should be able to satisfy these requirements, many of the utility systems I ran were on a single node, and I had to send out an email blast to avoid being disruptive when I wanted to push a quick code fix.
The behaviour of the Rolling Restart option has been changed in 17.2.3 and now it provides the ability to have zero downtime even for single node applications. This is provided by leveraging the underlying container framework of ACCS.
It may be worth taking a moment to discuss the ACCS container framework to understand how this works.
The above image shows the ACCS Build process. There is a collection of Oracle-controlled base images, which provide the run-time environment for specific languages and versions. During application creation, the code you upload is added into these images, to create a new container image, which is stored in a registry which is private to your cloud tenant. Then when you deploy an application instance, the image for that application is pulled to a runtime host and started. Scaling applications work the same way, the same tenant-specific application image is pulled down to a runtime host and started. The base Oracle image contains all of the smarts required to register on the load balancer, monitor health, etc.
Understanding this, it is relatively simple to see how zero-downtime updates are implemented. Essentially the new image is built, and stored version-controlled in the tenant specific registry. It is then pushed to a new runtime instance which is named ‘standby-web.x’ (where x is the application instance number currently being updated). Once the standby instance is up and running, it registers itself on the load balancer and cuts across to take over from the web.x instance which was previously running, which is then removed from the runtime host. This can be observed in the following screenshot, where the extra instances are clearly observable.
Zero downtime updates are a nice new feature, and now that I have observed how they function I cannot help but wonder why things were ever done differently. The approach is a simple and elegant one, leveraging the advantages of container technology, a flexible networking model and some smarts that are obtained through a controlled base image in order to provide automated networking-configuration.