Stateful containers in production, is this a thing?

Source: Veeam

As the new world debate of containers vs virtual machines continues, there is also a debate raging about stateful vs stateless containers. Is this really a thing? Is this really happening in a production environment? Do we really need to backup containers, or can we just backup the data sets they access? Containers are not meant to be stateful are they? This debate rages daily on Twitter, reddit and pretty much in every conversation I have with customers.

Now the debate typically starts with the question Why run a stateful container? In order for us to understand that question, first we need to understand the difference between a stateful and stateless container and what the purpose behind them is.

What is a container?

“Containers enable abstraction of resources at the operating system level, enabling multiple applications to share binaries while remaining isolated from each other” *Quote from Actual Tech Media

A container is an application and dependencies bundled together that can be deployed as an image on a container host. This allows the deployment of the application to be quick and easy, without the need to worry about the underlying operating system. The diagram below helps explain this:

stateful containers

When you look at the diagram above, you can see that each application is deployed with its own libraries.

What about the application state?

When we think about any application in general, they all have persistent data and they all have application state data. It doesn’t matter what the application is, it has to store data somewhere, otherwise what would be the point of the application? Take a CRM application, all that customer data needs to be kept somewhere. Traditionally these applications use database servers to store all the information. Nothing has changed from that regard. But when we think about the application state, this is where the discussion about stateful containers comes in. Typically, an application has five state types:

  1. Connection
  2. Session
  3. Configuration
  4. Cluster
  5. Persistent

For the interests of this blog, we won’t go into depth on each of these states, but for applications that are being written today, native to containers, these states are all offloaded to a database somewhere. The challenge comes when existing applications have been containerized. This is the process of taking a traditional application that is installed on top of an OS and turning it into a containerized application so that it can be deployed in the model shown earlier. These applications save these states locally somewhere, and where depends on the application and the developer. Also, a more common approach is running databases as containers, and as a consequence, these meet a lot of the state types listed above.

Stateful containers

A container with stateful data is either typically written to persistent storage or kept in memory, and this is where the challenges come in. Being able to recover the applications in the event of an infrastructure failure is important. If everything that is backed up is running in databases, then as mentioned earlier, that is an easy solution, but if it’s not, how do you orchestrate the recovery of these applications without interruption to users? If you have load balanced applications running, and you have to restore that application, but it doesn’t know the connection or session state, the end user is going to face issues.

If we look at the diagram, we can see that App 1 has been deployed twice across different hosts. We have multiple users accessing these applications through a load balancer. If App 1 on the right crashes and is then restarted without any application state awareness, User 2 will not simply reconnect to that application. That application won’t understand the connection and will more than likely ask the user to re-authenticate. Really frustrating for the user, and terrible for the company providing that service to the user. Now of course this can be mitigated with different types of load balancers and other software, but the challenge is real. This is the challenge for stateful containers. It’s not just about backing up data in the event of data corruption, it’s how to recover and operate a continuous service.

Stateless containers

Now with stateless containers its extremely easy. Taking the diagram above, the session data would be stored in a database somewhere. In the event of a failure, the application is simply redeployed and picks up where it left off. Exactly how containers were designed to work.

So, are stateful containers really happening?

When we think of containerized applications, we typically think about the new age, cloud native, born in the cloud, serverless [insert latest buzz word here] application, but when we dive deeper and look at the simplistic approach containers bring, we can understand what businesses are doing to leverage containers to reduce the complex infrastructure required to run these applications. It makes sense that lots of existing applications that require consistent state data are appearing everywhere in production.

Understanding how to orchestrate the recovery of stateful containers is what needs to be focused on, not whether they are happening or not.

The post Stateful containers in production, is this a thing? appeared first on Veeam Software Official Blog.


Stateful containers in production, is this a thing?

Considerations in a multi-cloud world

Source: Veeam

With the infrastructure world in constant flux, more and more businesses are adopting a multi-cloud deployment model. The challenges from this are becoming more complex and, in some cases, cumbersome. Consider the impact on the data alone. 10 years ago, all anyone worried about was if the SAN would stay up, and if it didn’t, would their data be protected. Fast forward to today, even a small business can have data scattered across the globe. Maybe they have a few vSphere hosts in an HQ, with branch offices using workloads running in the cloud or Software as a Service-based applications. Maybe backups are stored in an object storage repository (somewhere — but only one guy knows where). This is happening in the smallest of businesses, so as a business grows and scales, the challenges become even more complex.

Potential pitfalls

Now this blog is not about how Veeam manages data in a multi-cloud world, it’s more about how to understand the challenges and the potential pitfalls. Take a look at the diagram below:

Veeam supports a number of public clouds and different platforms. This is a typical scenario in a modern business. Picture the scene: workloads are running on top of a hypervisor like VMware vSphere or Nutanix, with some services running in AWS. The company is leveraging Microsoft Office 365 for its email services (people rarely build Exchange environments anymore) with Active Directory extended into Azure. Throw in some SAP or Oracle workloads, and your data management solution has just gone from “I back up my SAN every night to tape” to “where is my data now, and how do I restore it in the event of a failure?” If worrying about business continuity didn’t keep you awake 10 years ago, it surely does now. This is the impact of modern life. The more agility we provide on the front end for an IT consumer, the more complexity there has to be on the back end.

With the ever-growing complexity, global reach and scale of public clouds, as well as a more hands-off approach from IT admins, this is a real challenge to protect a business, not only from an outage, but from a full-scale business failure.

Managing a multi-cloud environment

When looking to manage a multi-cloud environment, it is important to understand these complexities, and how to avoid costly mistakes. The simplistic approach to any environment, whether it is running on premises or in the cloud, is to consider all the options. Sounds obvious, but that has not always been the case. Where or how you deploy a workload is becoming irrelevant, but how you protect that workload still is. Think about the public cloud: if you deploy a virtual machine, and set the firewall ports to any:any, (that would never happen would it?), you can be pretty sure someone will gain access to that virtual machine at some point. Making sure that workload is protected and recoverable is critical in this instance. The same considerations and requirements always apply whether running on premises or off premises.  How do you protect the data and how do you recover the data in the event of a failure or security breach?

What to consider when choosing a cloud platform?

This is something often overlooked, but it has become clear in recent years that organizations do not choose a cloud platform for single, specific reasons like cost savings, higher performance and quicker service times, but rather because the cloud is the right platform for a specific application. Sure, individual reason benefits may come into play, but you should always question the “why” on any platform selection.

When you’re looking at data management platforms, consider not only what your environment looks like today, but also what will it look like tomorrow. Does the platform you’re purchasing today have a roadmap for the future? If you can see that the company has a clear vision and understanding of what is happening in the industry, then you can feel safe trusting that platform to manage your data anywhere in the world, on any platform. If a roadmap is not forthcoming, or they just don’t get the vision you are sharing about your own environment, perhaps it’s time to look at other vendors. It’s definitely something to think about next time you’re choosing a data management solution or platform.

The post Considerations in a multi-cloud world appeared first on Veeam Software Official Blog.


Considerations in a multi-cloud world