Learn the challenges and best practices for deploying and managing Kubernetes across multiple cloud providers including orchestration, storage, security, observability and more.

Kubernetes Multi-Cloud: Guide & Best Practices

Like this Article?

Subscribe to our Linkedin Newsletter to receive more educational content

Subscribe now

Kubernetes has long been the standard for container orchestration. It's used in various settings, from home labs to production setups spanning private and public clouds. Organizations often need to provide specific levels of availability for their Kubernetes applications to meet SLAs (service-level agreements). Applications must therefore run in a highly-available manner, which entails replicating them across multiple data centers.

Furthermore, compliance and security requirements (like data localization) often force organizations to run operations from specific locations. Translated into Kubernetes terminology, this typically means running Kubernetes across multiple cloud environments, whether loosely coupled or federated.

This article will cover key considerations (summarized below) that organizations should address when planning to deploy multi-cloud Kubernetes environments.

Infrastructure provisioning Tools and approaches that support the multi-cloud concept
Orchestration and Kubernetes distributions Managing multiple Kubernetes clusters running in different clouds
Storage and Networking Establishing connectivity and storage persistence between various cloud providers
Application deployment Deploying Kubernetes applications into multi-cloud setups
Compliance and Security Arranging an abstraction layer, controlling and interpreting compliance and security requirements into vendor-specific platforms
Observability Tracking system and application metrics, logs, and events centrally
Cost of operations Visualizing and controlling multi-cloud deployment costs

Comprehensive Kubernetes cost monitoring & optimization

A closer look at multi-cloud Kubernetes

Running Kubernetes clusters locally or within a single cloud provider is often straightforward, particularly given the variety of tools and levels of automation available to consumers. However, things take a different turn when you run applications spanning several clouds. These deployments are known as cloud-agnostic or multi-cloud architecture, but it’s important to note that these two terms are not interchangeable.

Cloud-agnostic vs. multi-cloud

Cloud-agnostic systems are those that you can deploy to any cloud independently. In other words, your applications and underlying infrastructure are “lift-and-shift” and independent of the cloud platform. Organizations often pursue this to avoid so-called vendor lock-in. In reality, achieving complete independence from a cloud provider is impossible since the computing infrastructure, networking, and storage are tightly bound to the provider.

Consider the scenario of deploying a typical set of applications onto Kubernetes:

  • Applications are built and packaged into Docker containers, then pushed into a registry. This process is cloud-agnostic
  • Applications are run on Kubernetes and, from the perspective of application developers, are also cloud-agnostic
  • Applications leverage auxiliary services, like databases, message queues, or storage buckets. These can be made cloud-agnostic by running them on Kubernetes
  • Auxiliary services often rely on storage and networking, which may be abstracted but are still implemented by cloud-specific services like Amazon EBS volumes or Azure Storage volumes
  • A Kubernetes cluster is provisioned to every cloud environment. Still, they must rely upon cloud-provided compute instances (AWS EC2, Azure Compute Instances, Google Virtual Machines, etc.) to run their control plane and worker nodes

The graphics below shows a simplified, top-level representation of the processes involved in a cloud-agnostic stack.

Image shows a cloud-agnostic architecture (source)

Multi-cloud systems share some properties with cloud-agnostic ones, namely the need for the abstraction of applications, networking, and storage resources. The critical difference is that infrastructure and applications span multiple cloud environments simultaneously.

There are many reasons for needing a multi-cloud deployment, including:

  • Disaster recovery and increased availability: having your applications run on a multi-cloud platform means there is no “one point of failure.” Applications are unlikely to go down simultaneously, and traffic can be shifted from a failing platform
  • Regulatory and compliance requirements: data localization laws might mean that you must store data in a specific geographic location (which includes the cloud provider)
  • Services at the edge: running services close to end-users provides low latency, improved user experience, and reliability.
  • Business reasons: avoiding vendor lock-in, reducing costs, or driving innovation. The ability to run infrastructure in multiple clouds may also provide a tool for contract negotiations with individual cloud providers

Running multi-cloud Kubernetes almost always means having several federated or centrally managed control planes. Node pools are typically local, and traffic is distributed across the cloud environments.

Both cloud-agnostic and multi-cloud deployment options have their challenges, which we will discuss below.

K8s clusters handling 10B daily API calls use Kubecost

Challenges

Infrastructure provisioning

Securing cloud resources across single or multiple cloud providers in a predictable and repeatable way is a significant challenge. Infrastructure-as-Code (IaC) tools are helpful when addressing problems like these but can be complicated. Most IaC tools like Hashicorp's Terraform or cloud-specific ones like AWS CloudFormation excel within a single provider but can struggle with multi-cloud.

Instead, solutions like Pulumi come closer to the multi-cloud ideal by allowing engineers to describe infrastructure abstracted from the specifics of individual cloud providers. Regardless of your choice, implementing and managing code for each cloud provider remains necessary.

Orchestration and Kubernetes distributions

Once organizations deploy their resources and Kubernetes clusters, the challenge of managing (or orchestrating) becomes relevant. Orchestration of multi-cloud clusters is not provided out-of-the-box, so you must use a third-party solution.

Third-party orchestration solutions often come with custom Kubernetes distributions that implement or alter container runtimes, storage, and networking layers. Some of these orchestration solutions and distributions are tooled for multi-cloud environments (e.g., Rancher, Red Hat OpenShift, or Mirantis Kubernetes Engine). Others come from a single vendor (AWS EKS, GCP GKE, Azure AKS). Each has its benefits and drawbacks, which we will examine in future articles, so watch this space!

Storage and networking

The transient nature of containers can present challenges for application and infrastructure developers—for example, the necessity to persist storage across Kubernetes clusters spread over multiple clouds.

Thanks to Container Storage Interface, solutions like Portworx, Rook, or OpenEBS can provide users with ways to spread the storage across cloud environments. Keep in mind, there are several strict networking and storage performance prerequisites and complex configurations.

Multi-cloud requires Kubernetes resources to have network connectivity across the cloud environments. There are a few ways to extend networks beyond one cloud provider:

  • VPN tunneling: involves setting up tunnels between every cloud environment. VPN can create issues for egress traffic on both sides of the tunnel, such as congestion and security concerns (BGP vulnerabilities like prefix hijacking and route leaking)
  • Private connectivity: relies on telecom providers and on-premise routing, which involves connecting cloud providers in a hub. It's a costly solution with potential performance issues due to traffic being backhauled via data center infrastructure
  • Software-defined networks and virtual routers: virtual devices at the edge of cloud provider networks are probably the optimal solution but add complexity and maintenance overhead

Multi-cloud networking depends on the organization's infrastructure and financial and engineering capabilities. After that, it depends on the requirements of the business itself.

Application deployment

Based on the number of applications and their dependencies, deployment can sometimes become complex and require orchestration. Many approaches can be taken but vary based on your application packaging strategy and existing CI/CD processes.

Approaches can be split into two categories:

  • Use existing CI/CD pipelines to deliver applications: allows easy integration into the current development lifecycle. Key drawbacks are scalability and the maintenance toll it adds on engineers
  • GitOps: offers continuous deployment of applications to destination clusters. Controlled by source code and operated by the likes of ArgoCD, FluxCD, or Gitlab. The key benefit here is the ability to control everything from one source of truth while automatically keeping the deployments consistent. The main drawback is the need for yet another tool and its integration with existing CI/CD processes, and the diversity of cloud providers

Of course, both of the above suggestions assume your engineering teams are using Helm or Kustomize, have established code and package versioning, and have existing CI/CD processes in place.

Compliance and security

Addressing compliance and security in single-cloud environments can be a difficult task.. Organizations must have authentication and authorization in place, keep up with security vulnerabilities and patches, enforce security policies and harden environments. Multi-cloud environments require even more effort due to the sheer diversity of security and compliance implementations.

Multi-cloud setups can add further complications:

  • Authentication and authorization: the necessity to integrate with each cloud provider's authentication model. It's best to have a centralized solution decoupled from the cloud provider for account, role, and policy creation
  • Vulnerability patching: the necessity to keep infrastructure updated using the procedures from each provider
  • Security policies and environment hardening: enforcing the restriction of unsecured ports or traffic, securing APIs and establishing least privilege. Controls must be defined centrally, translated, and distributed to different cloud platforms
  • Multi-cloud storage and networking: the configuration and use of storage may involve encrypting data-at-rest and developing data loss prevention procedures. Secure multi-cloud networking required data encryption, advanced routing solutions, and network policy management

Observability

Observability is an essential part of a multi-cloud platform. The observability stack must be able to consume metrics, logs, events, and outages from different platforms while remaining scalable.

There are open-source (Prometheus, Thanos, Grafana) and enterprise (Datadog, NewRelic, Sematext) solutions that somewhat fit the bill, but engineering teams must still build custom dashboards and alerting mechanisms that meet your business and operational needs.

Cost of operations

A crucial topic when implementing multi-cloud systems is budget. Costs can be significantly more than single-cloud topologies, so keeping track of spending is a must.

SaaS solutions like Kubecost can integrate into diverse cloud platforms and consume rich metrics from Kubernetes clusters. Kubecost can then present costs and spending data, segregated per cluster or resource via user-friendly dashboards and reports.

Kubecost is an ideal tool to keep track of overheads and is a must-have for financial stakeholders.

Image shows Kubecost's main dashboard, which summarizes K8s costs, efficiency, and health (source)

Conclusion

Creating multi-cloud platforms and successfully running applications on them is no easy task. “Going multi-cloud” requires meticulous planning, technical proficiency, and collaboration, but the resulting application platform is often worth the effort.

Learn how to manage K8s costs via the Kubecost APIs

Continue reading this series