Commvault Unveils Clumio Backtrack - Near Instant Dataset Recovery in S3

// 10 Jun 2021

Optimizing Costs: Clumio’s Use of Kubernetes Ingress in Cloud-Native Architecture

Pavel Chekin
ShareTwitterfacebookLinkedin

For the Clumio Protect Backup Service, we use Amazon Elastic Kubernetes Service (EKS) to manage our Kubernetes clusters. The main benefit of using a managed Kubernetes is that you have tight integration with other services from your cloud provider. Specifically, when you create a new Kubernetes Service, a service controller, which is provided by AWS and installed in the cluster, it creates a new instance of Elastic Load Balancer (ELB) and configures it to forward traffic to the service. At the same time, that controller manages the whole life cycle of the corresponding load balancer: it provides a correct configuration for its health checks and keeps the load balancer target group up to date when you add or remove nodes from the cluster. Finally, it deletes the load balancer when you delete your Kubernetes Service from the cluster.

While it is very convenient to use Kubernetes Services backed by AWS load balancers, there are several considerations that we should be aware of.

The Clumio Cloud-Native Architecture

Our architecture is based on microservices, and each of our Kubernetes clusters has several dozens of these services. The number of services is especially noticeable for our development clusters, where we host as many isolated development environments as we can in a single cluster. In some development clusters, we have more than 300 services and counting. Needless to say, each AWS load balancer is not free and it would be unreasonable for us to have 300+ AWS load balancers just for a single development cluster. Another issue is the limit of load balancers each EKS cluster can support. This limit is related to the maximal number of the rules in EC2 Security Group. Specifically, our average EKS cluster could have no more than 60 classic load balancers. To solve that issue, we decided to use Kubernetes Ingress for most of our services. The Ingress mechanism gives us a way to share one ELB across multiple Services in an EKS cluster, which reduces resource utilization and cost.

Why Add Kubernetes Ingress

The creation of a new AWS load balancer typically takes some time. This delay is acceptable for most operations, however for certain changes, for example deploying multiple microservices at the same time, we wanted to minimize time required for their readiness. To address this use case, adding a new Kubernetes Ingress provides instant operation. Depending on the specific implementation, an Ingress controller needs to reload the new configuration similar to ingress Nginx controllers or apply the new configuration at runtime, similar to Envoy-based ingress controllers.

When working with services, you expect that each service has a DNS name, potentially an external one. With the ELB-per-service approach, you cannot create a DNS name until the ELB is created. With a single pre-allocated ELB for the ingress controller, it is possible to assign as many DNS aliases to that ELB in advance, so all our services become available as soon as we create Ingress definitions for them.

Some of our services expose REST API, but the vast majority of our services have gRPC endpoints. We needed an Ingress controller that supports both HTTPS and gRPC at the same time, and we wanted to terminate TLS on the Ingress controller. The latter was not a strict requirement, and we considered using Istio sidecars to terminate TLS directly in a microservice pod. However, we decided to use the Ingress controller to terminate TLS because it is simpler to manage and requires in general less overhead for the cluster. Another requirement we had is to be able to deploy several instances of the Ingress controller to the same cluster. A specific Ingress definition can have an annotation kubernetes.io/ingress.class to specify which ingress controller should be used.

We have tested several Ingress controllers and found it was surprisingly hard to find the one that works with all the requirements. Some controllers did not work correctly with gRPC, some did not support multiple controller instances in the same clusters. Finally, we found that NGINX Ingress controller works for us. We had to make only a small adjustment for its configuration to increase gRPC buffer size to make it work with our gRPC stack.

We constantly improve our infrastructure-level services to scale our infrastructure and optimize maintenance costs. Initially, we used a dedicated load balancer per service but very soon we hit the limit of load balancers per cluster. At that moment we updated our infrastructure to use hosted Ingress controllers for external connectivity for our services and we significantly reduced the number of AWS classic load balancers required for the product. Using AWS Application Load Balancer (ALB) was not an option at that moment, because ALB did not support gRPC. At the end of 2020, AWS announced that they have added support for end-to-end HTTP/2 and gRPC for Application Load Balancer. Now we can use the AWS Load Balancer controller, formerly known as “AWS ALB Ingress Controller”, which was donated to Kubernetes AWS Special Interest Group (SIG-AWS) to allow AWS and other SIG-AWS contributors to officially maintain the project. We are looking forward to using this Ingress controller for our infrastructure. Another focus of our interest is Kubernetes Gateway API, a new Kubernetes API that is in the process of development by Network Special Interest Group (SIG-Network). Compared to Ingress API, Gateway API exposes a more general API for proxying that can be used for more protocols than just HTTP, and models more infrastructure components to provide better deployment and management options for cluster operators. We are looking forward to trying future Kubernetes controllers implementing Gateway API.

Reducing Resource Utilization and Cost

The Kubernetes Ingress mechanism helps us to reduce resource utilization and cost. We are no longer constrained by EC2 and EKS limits with respect to the number of services deployed to a single managed Kubernetes cluster. Some of the open source Ingress controllers work with gRPC services out of the box but require additional verification if they work for a specific use case. Kubernetes community provides and constantly improves Kubernetes APIs, such as Ingress API or Gateway API, while commercial and open source vendors provide various implementations of such APIs. This allows choosing the right implementation for a specific use case and replacing implementations without major changes to the application level.