How we meet uptime requirements and work from anywhere with Terraform

An introduction to Terraform and the benefits it provides our organization.

Nov. 10, 2021

Share with:

In a previous post we detailed how we use K3s to deploy our business server as a single node Kubernetes cluster. Beyond scalability, this approach has provided us with unparalleled reliability in terms of process management, with the node itself becoming the most likely point of failure. In this post we address deploying and maintaining that server from the point of view of a small software company that needs to meet stringent uptime goals. Our tool of choice for this task is Terraform, which has proven itself to be one of the most reliable pieces of software we use.

For those unaware, Terraform is a cloud infrastructure provisioning software that fully automates the resource creation/destruction process. A standard deployment entails formally defining all cloud resources in one or several files and then running terraform apply to have the current state of the resources brought into conformity with the files. Terraform has the advantage of having a powerful dependency resolution engine, enabling it to successfully create and modify large deployments in an intelligent fashion, only creating or destroying what is necessary, and in the correct order. 

In practice, Terraform is an extremely reliable tool that executes even large changes to our deployment very quickly, usually completing all operations within a few minutes. This enables us to offer 99.99% uptime in a given year with a solid 99.9% guarantee to all clients. Doing things manually, or with scripts or playbooks, would present great difficulties in meeting those guarantees. 

Additionally, Terraform offers a remote state functionality, allowing the full state of the deployment to be kept in a centrally accessible location, such as an S3 bucket. Coupled with the practice of placing all deployment files in version control, this allows any team member to make changes and redeploy from anywhere, with all other team members being able to instantly sync with them. Beyond enabling remote operations, this way of working also widens the number of team members who can play a productive role in redeploying things, should problems arise. In a genuine emergency, even a relatively nontechnical team member should be able to run terraform destroy followed by terraform apply to redeploy the entire stack, maybe changing the availability zone if needed.

Finally, because Terraform is open source and backed by a large community of users, we know we can offer this extremely reliable tool to our clients for no cost. This allows us to provide them with the same reliability and uptime guarantees that much larger organizations enjoy, at a fraction of the price.