Argo CD is a deployment tool for Kubernetes that follows GitOps methodology. Besides being the best implementation of a GitOps controller, Argo CD also stands out with its enterprise and multi-tenancy features. SSO integration, flexible RBAC, and multi-cluster management allow running a single Argo CD instance to serve multiple development teams. This is extremely attractive since it reduces the operational overhead and provides great visibility for all the deployments inside the organization. However, no software can scale infinitely, and Argo CD is no exception.
The most popular scalability-related question we're getting from our customers is how much is too much? At which point will Argo CD start to struggle? The question
is indeed very important since it allows us to define the correct architecture that accounts for future growth (there's even a whole Special Interest Group dedicated to the question of Argo scalability on CNCF Slack -
#argo-sig-scalability) The answer might be tricky since it depends on usage factors
and will differ from one organization to another. Let's walk through the most important factors that affect Argo CD scalability.
The application controller is the heart of Argo CD. It's responsible for reconciling the desired state stored in Git against the actual state of the cluster. The number of managed Kubernetes clusters and, most importantly, the number of resources in those clusters determines how much memory and CPU is required for the application controller. As per our observations, the default configuration is enough to handle a dozen mid-size clusters, which is pretty good. If the number of clusters grows, you might need to give it more memory and CPU. When the number of clusters reaches hundreds, you will have to utilize sharding to run multiple controller instances and fine-tune some configurations to save money on the compute required to run the controller.
Accessing manifests stored in Git and, more importantly, generating manifests is another resource-intensive operation. This work is performed by the repo server.
The most expensive operation is the generation of manifests since it usually requires running
helm to generate the final manifests. As the number of repositories grows,
you might need to increase the number of repo server replicas to handle the load. The good news is that the repo server is stateless and can be scaled horizontally.
Finally, the number of Argo CD applications affects the performance of the presentation layer, the Argo CD UI, and the API server. The more applications you have, the more time it takes to load the UI. Argo CD comfortably handles hundreds of applications, gets a little slower when the number of applications reaches ~3,000, and starts to struggle when the number of applications is more than 5,000.
Given the above factors, we usually recommend running multiple Argo CD instances to account for future growth. The most typical approach is to run one Argo CD instance per team or department. This approach allows us to isolate teams from each other and provide a dedicated dashboard for each team. However, this is still a compromise since it introduces some management headaches and requires running multiple instances of Argo CD.
Akuity was designed, by the creators of the Argo Project, to address the above challenges and unlock the ultimate scalability of Argo CD. Our goal is to significantly push the limits of Argo CD and make it possible to run a single Argo CD instance for a huge organization. We tackled the backend bottlenecks first by introducing a unique agent-based architecture that allows running a dedicated application controller and repo server in each managed cluster. This approach significantly simplifies the scalability challenge since work is not naturally distributed between clusters. It's also cheaper since the resource requests of each controller can be tuned to match the size of the cluster. Often running the agent in each cluster is free since components utilize spare resources available in the cluster.
What about the frontend bottlenecks? We're happy to announce that we've found a solution. Akuity-hosted Argo CD got long-awaited server-side pagination which pretty much solves the problem and allows users to comfortably have tens of thousands of applications in a single Argo CD instance. To enable the feature upgrade on your Akuity-managed Argo CD instance, set it to the Akuity version using Akuity Portal:
v2.7.6-ak.2), matching your desired Argo CD version. The Akuity image is available for all Argo CD versions starting from v2.7.3.
Congrats! Server-side pagination is enabled, and you can take advantage of the enhanced Argo CD user experience.
Scaling any software is a complex task and requires testing in a production-like environment. This is exactly what we did. We've gathered the requirements of our customers and open-source community and tried to come up with numbers that would satisfy the most ambitious use cases: 1,000 clusters and 50,000 Argo CD applications. We've used K3S to simulate managed Kubernetes clusters and save money, which worked perfectly. We used a Kustomize-based set of manifests to utilize the repo server. Finally, we deployed 50 applications in each cluster which gave us 50,000 applications in total. The results are pretty impressive, and we are happy to conclude that Akuity can comfortably handle 1,000 clusters and 50,000 applications in a single Argo CD instance.
We are very pleased with the initial results and happy to offer enhanced Argo CD UI to our customers. Server-side pagination is a very valuable feature, and we are committed to contributing it back to the open source. Once we feel comfortable with the feature, we will open-source it and make it available to the community. Please give it a try and provide your feedback! Do you have a use case that requires pushing the limits of Argo CD even further? We would love to hear from you and help you to unlock the ultimate scalability of Argo CD!
GitOps is rapidly becoming the standard for managing cloud-native ecosystems with Kubernetes. Traditional IaC tools fell short with the rise of Kubernetes…...October 19, 2023
Kargo is a next-generation continuous delivery and application lifecycle orchestration platform for Kubernetes. It builds upon GitOps principles and integrates...October 10, 2023
GitOps principles exist to address the genuine problems of visibility and collaboration when working with a complex system like Kubernetes. They stress the…...