Architecting for Kubernetes

skyline2

Image courtesy Jonathan Leung (https://www.flickr.com/photos/jonathan-leung/) under license CC BY-SA 2.0

Docker is taking over the world. Developers love it because it allows them to experiment without polluting their machines. It’s easy to create docker images for your apps and only moderately harder to bundle a stack together (using compose and swarm). It’s also pretty easy to create a Continuous Integration process that builds and tests the developer output. Docker has made life a lot more pleasant for developers.

Getting dockerized applications to production has proven more difficult. This post explains how kubernetes (k8s), a docker orchestration framework created by Google as part of their Google Cloud Platform initiative, helps solve that problem.

Sidebar: Concepts

Kubernetes has a variety of different component types. Here’s a brief description of the ones we use (both explicitly and implicitly):

  • Namespace: a namespace allows you to have multiple implementations of a stack or set of components which share hardware or virtual machines without interfering with each other. Examples: dev, test, qa, prod. Kubernetes provides 2 default namespaces: default and kube-system. kube-system is used for the kubernetes dashboard, and other internals.
  • Service: something that presents an interface to users or other components. Our wordpress service both exposes an interface (http port) and uses a service (cloudsqlproxy).
  • Deployment: the mechanism for deploying an application or other long-running component. The wordpress deployment is used by the wordpress service.
  • Stateful Set: a specific type of component, like a deployment, that maintains state across restarts or failures. We deploy Elasticsearch as a stateful set so that the index database is not lost if an Elasticsearch container fails.
  • Job: a job is a component that runs and exits. Kubernetes ensures that if the job exits with a non-zero exit code, it is restarted.
  • Replica Sets: deployments create replica sets when they are deployed into kubernetes. RS’s are versioned sets of a deployment. Beyond that, they are not directly touched by us.
  • Pods: pods contain docker containers, but there may be more than one container in a pod. (The sidecar pattern is a good example of this.) Pods come and go. When a container or a pod fails, kubernetes ensures it is restarted.
  • Secrets: secrets allow settings to be stored for use by other components without having to store them in the docker image. They are generally namespace or cluster specific.

Components get into k8s via the kubectl utility (which in turn uses a configuration to know which cluster to communicate with and how to do so). kubectl uses YAML or JSON files to deploy. It has many sub-commands so setting up command completion in your shell is a good idea. For our example, all files are YAML files.

A Real World Example

Our example is lifted from some work Avalon did for one of our clients. This client wanted to set up a site for publishing news, providing searchable links to publications, posting articles and otherwise sharing information with interested users. So the basic requirements include: storing articles, searching publications, navigating content. A combination of WordPress and Elasticsearch fit nicely.

In addition to WordPress and Elasticsearch, we also needed a database for persisting the WordPress content. WordPress plays well with MySQL so we went with that.

We also needed a way to index and search several sets of externally hosted publications related to the site. Thankfully those sites provided REST interfaces to allow us to index their content. We also had some local content to index (a relatively static set of documents). And we wanted the WordPress content to be searchable. Each of these types of content resulted in a batch style container (using python or logstash) called a content loader. These processes have to run periodically. Some are long running and re-poll for new content (logstash), while others are only run on-demand.

Our only public interface is WordPress, so that will require an HTTP listener. Everything else is private and only needs to be accessed internally. However, we don’t want to hard-wire any connections between components. We also want to be able to scale out and up for each component or for the system as a whole.

An Initial Docker Approach

Our first iteration, focused on getting developers developing and testing, used docker and docker-compose. Docker-compose allows a set of docker containers to run as a unit, with connectivity between components specified in a YAML file. Creating a set of docker images for each components (wordpress, elasticsearch, mysql, content loaders) and running them (in part or in whole) via docker-compose works well for the development and test phase. Developers can quickly teardown and rebuild their environments. They can also work on one component at a time (or a couple of them).

But docker-compose makes deployment to production environments difficult. If I only want to update one container or image, it’s not easy to do that. I also have to pay close attention to ensure I don’t accidentally wipe out the data volume backing mysql or elasticsearch. It also makes live updates tricky.

Mapping to Kubernetes

MySQL

The best first step is to pull MySQL out of the docker component list and move it to an external location, like Amazon RDS or Google SQL. While you can replicate and manage MySQL inside docker and k8s, it’s generally a lot simpler to allow a Database as a Service (DBaaS) to handle that function. We also get the benefit of having the DBaaS perform backups. We can scale up as needed.

Given that, we need some way to find that DBaaS. In k8s, there are a couple of ways to handle this. Using Google Cloud Platform (GCP), Google SQL MySQL and the Google Container Engine (GKE), we can use the cloudsql-proxy image to define a k8s service that acts as a proxy to the Google SQL MySQL. This means we have a service and a deployment, where the service uses the deployment and the deployment can scale out by having multiple replicas. Each deployment can provide an interface to multiple databases. By putting a service in front of each, we can reference each by a unique name and the default port. The information about how to connect to Google SQL is stored in a k8s secret that the deployment pulls into its environment at startup.

Elasticsearch

Next we need to support Elasticsearch (ES). ES needs to maintain data (the set of indexes) and can’t lose it because an individual container has crashed. We would also like to have multiple copies available. This maps into a Stateful Set. ES is long running and has data volumes to persist the indexes that are available in the same location in the filesystem for each instance (1 through N) whenever they start up. We used the kubernetes plugin for ES to allow the instances to discover each other on startup so that we have a real ES cluster and not N individual nodes. We also define a service interface so that other components can find ES by name.

WordPress

For wordpress, we need 3 things: the ability to scale out; access to ES and MySQL; and the ability to serve somewhat static content. We store the content in a gzipped tarball in a bucket on Google Cloud Storage (like AWS S3) and use gscfuse to mount the bucket on the local filesystem as a post-start hook. We then unpack the tarball onto the local filesysten to provide fast access to the Apache HTTPD while running. Using the bucket via FUSE proved to be too slow, while unpacking the 22MB tarball only takes a second or so. When the content changes, we simply delete the wordpress pods and let kubernetes restart them, ensuring they get the new content. We do this in a rolling update manner, deleting each pod in sequence, waiting for its replacement to be ready before deleting the next. WordPress is deployed as a deployment with multiple replicas and a service with a defined endpoint (external URL). It uses the service definitions for ES and MySQL.

Loading and Indexing

The last step is to support the indexing processes. Most of these are defined as k8s jobs, meaning they run once until successful completion. If any of them fail in the middle (i.e. exit with any non-zero code), kubernetes will restart it. If we need to run them again at a later time (new remote content to index), we simply resubmit the job.

The final indexer for wordpress content runs as a deployment with 1 replica. It uses the service definitions for MySQL and ES and uses logstash with a JDBC connector to monitor the WordPress MySQL database for updates. If the container crashes for any reason, kubernetes restarts it for us.

Each of these components has a single YAML file defining it to k8s (one for each deployment, one for each service, one for each job, one for the stateful set). In the YAML we define the number of replicas, the docker image to use, the environment, the secrets used to populate the environment or volumes, and a variety of other items that are not a part of the functionality of the container, but are key to the operation of the container.

We tell k8s to create these components in a cluster (within a namespace) using the kubectl command. Kubernetes manages the rest. We can view status via the kubernetes dashboard, and we can add consolidated logging and monitoring using StackDriver or we can implement the EFK stack (Elasticsearch-Fluentd-Kibana) and create our own dashboards.

kubernetes-architecture-example

Summary

By mapping our application onto kubernetes, we’ve added a very simple way to manage and scale our full stack application. We can add wordpress replicas. We can add Elasticsearch replicas. We can run our batch jobs as needed and have restarts managed by kubernetes. We can update the docker images used at each component independently and without service interruption. And we can do all of this on a fixed set of nodes in our kubernetes cluster. If we need to we can scale up, down, in or out at the cluster level by changing the number and size of the nodes in the cluster. All of that is done on the fly without downtime. Note: none of the content above even mentions virtual machines!

Benefits

  1. Better use of virtual machines. The cluster can be as few as 2 VMs plus 1 for the MySQL service (which is billed by instance size).
  2. No need to manage mysql. Google or Amazon handles backups.
  3. The lifecycle of each component is now independent. You can update wordpress without touching elastic or mysql. You can push new content without any reinitialization or new images.
  4. Updates are rolling – no downtime.
  5. Replication – there are at least 2 of every long-running component where it makes sense (wp-loader is the exception – no need for multiple of that).
  6. Scale out via more replication
  7. Scale out via more instances in the k8s cluster
  8. Scale up via bigger instances in the k8s cluster
  9. No nasty script to understand and debug
  10. Multiple namespaces to share hardware without clashes
  11. Auto-deploy to one namespace without affecting others
  12. A CI process can create namespaces on the fly for automated testing

NOTE: Sign up here for Avalon’s DevOps mailing list to receive timely notification when we publish tips, tricks, and videos related to increasing your success with DevOps initiatives.

Sean Dowd is VP and Chief Architect at Avalon Consulting, LLC where he focuses on apps dev, open source, automation and enterprise architecture.  He has over 25 years experience in enterprise applications development and management.

About Sean Dowd

Sean Dowd is VP and Chief Architect at Avalon Consulting, LLC.

Leave a Comment

*