Using Argo CD to implement GitOps for Kubernetes appears simple. However, like any system, the ability to scale GitOps practices is highly dependent on the architecture you choose. This blog post will explore the three most common architectures when implementing Argo CD: single instance, per cluster instances, and a compromise between the two. I will break down the benefits and drawbacks of each one to help you decide what is appropriate for your situation.
The single control plane approach has one instance of Argo CD managing all clusters. This is a popular approach as it allows users to have a single view of their applications, providing a great developer experience.
In this architecture, there's only one server URL. This simplifies logging into the
argocd CLI and setting up API integrations. It also simplifies the operator experience with just one location for managing the configuration (e.g., repo credentials, users, API keys, CRDs, and RBAC policies).
If your organization is the type to delegate access based on the environment and you are concerned about having all of your applications under one instance, don't worry, you create the same boundaries using RBAC policies and AppProjects. The project defines what can be deployed where and the RBAC policy will define who can access the project.
However, there are downsides to this architecture. With a single control plane, you have a single point of failure. Your production deployments could be affected by the actions of other environments. Suppose the staging cluster became unresponsive, causing timeouts on the
kube-apiserver. This could lead to a high load on the application controller, affecting Argo CD performance for production Applications.
This architecture also requires you to stand up and maintain a dedicated management cluster that will host the Argo CD control plane and have direct access to all your other clusters. Depending on the location of the management cluster, this could involve publicly exposing them, which may raise security concerns.
The admin credentials (i.e., the kubeconfig) for all clusters are stored centrally on this single Argo CD instance (as secrets in the cluster). If a threat actor manages to compromise the management cluster or Argo CD instance, this will grant them widespread access.
It's also worth mentioning that the application controller must perform Kubernetes watches to gather data from each cluster. You may incur significant costs due to network traffic if the management cluster is in a different region than the other clusters.
Running a single instance means only one controller handles all of this load. Scaling will require tuning the individual Argo CD components (i.e., repo-server, application controller, api-server) as the number of clusters, applications, and repositories increases. Managing Application controller shards is an unpleasant experience that requires manually matching shard size to clusters.
Another typical architecture is to deploy Argo CD to each cluster. It's most common in organizations where each environment has one Kubernetes cluster. Effectively, you end up with an Argo CD instance for each environment which can simplify security and control.
This method is more secure since Argo CD runs within the cluster, meaning you don't need to expose the cluster API server to the external control plane. Beyond that, no central instance containing admin credentials for all the clusters. The security domain is limited to a single Argo CD instance. Any other credentials that Argo CD requires (e.g., repo credentials, notifications API keys) can be scoped to the cluster they are in instead of shared among them.
There's no longer a significant amount of network traffic leaving the cluster to the application controller in the management cluster. This could greatly reduce cloud costs for network traffic. You may incur additional costs due to the additional compute resources required to run every Argo CD component in each cluster.
Scaling is improved because each Argo CD instance only handles a single cluster, and the load is effectively distributed among the environments. However, when the cluster reaches a certain scale (number of applications and repositories), you may still need to tune the individual Argo CD components.
In the same way that the security domain is limited to a single Argo CD instance, the blast radius for outages is also contained. If one cluster is experiencing a significant load to the point that it could prevent application deployments, it will not go on to affect other clusters.
This architecture negatively affects the developer experience. There's the additional cognitive load of knowing which control plane to point to when using the Argo CD CLI or web interface. You can minimize this with a solid naming strategy and consistency in the server URLs (i.e., each cluster has its FQDN that matches its name, with Argo CD as a subdomain under that).
It's a painful experience for operators to manage many Argo CD instances. There is a different location to log in to for each cluster, which requires maintenance of RBAC policies and API keys for each one. For that matter, any configuration of Argo CD will need to be copied for each cluster. Be careful of drift, consistency is important. Lower environments should be as production-like as possible to represent the "real" production deployments.
This final architecture balances the previous two, running one Argo CD instance per logical group of clusters. This grouping could be per team, region, or environment. Whatever makes sense for your situation. You probably already have a way of grouping applications internally; this is a great place to start.
This architecture is beneficial when running multiple clusters per environment. It takes away the pain of maintaining too many instances of Argo CD. The RBAC, AppProject, and other configurations will likely be similar for all of the clusters managed by an instance. So the configuration duplication is reduced compared to running an instance for each cluster.
The groups partition the load, which distributes the burden on the application controller, repo server, and API server. It also allows you to limit the blast radius of what can be impacted by Argo, which is a great approach for security and reliability. The grouping isn't a perfect solution, though, since depending on the size of the clusters, it may still require tuning the individual Argo CD components.
The developer experience is improved compared to the instance per cluster architecture. Following an understood convention for the grouping will reduce the cognitive burden of knowing where to point their CLI and API Integrations for Argo CD.
This method still requires a management cluster to host the Argo CD instances. Fortunately, you can get away with using one cluster for all Argo CD instances by installing them as namespace-scoped.
The Akuity Platform provides many of the benefits of the instance per cluster model and the single instance architecture while eliminating most of the drawbacks. The hybrid agent architecture provided by the Akuity Platform enables you to scale to large enterprise use cases encompassing thousands of clusters and applications using a single control plane.
The agent runs inside the cluster and has outbound access back to the control plane. This architecture significantly reduces network traffic between the control plane and the cluster. The security concerns are gone since it does not require direct cluster access or admin credentials. It even enables an external Argo CD instance to connect to clusters where exposing them is challenging, like a cluster on your laptop.
The Akuity Platform simplifies the experience of operating Argo CD. There's no longer a need for a dedicated management cluster to host it. The Akuity Platform will host the instance and the custom resources. The automatic snapshotting and disaster recovery features of the Akuity Platform eliminate the single point of failure concern.
It provides the same visibility benefits as the single Argo CD instance architecture with a central location to view all of the organization's instances. The platform goes beyond the open-source offering by adding a dashboard for each instance that provides metrics on application health and sync histories. You can manage settings using wizards to craft configurations typically represented in complex YAML files, like notification services. It adds an audit log of all the activity across the Argo CD instances in your organization, making compliance reporting significantly more manageable.
No solution is really without compromise. When using a SaaS product, you opt to give up some control over the platform. At the same time, you can take advantage of Argo CD without maintaining the underlying infrastructure.
Of course, there's also the cost. It may seem like a big difference when comparing the cloud resource cost directly to the cost of an Argo CD SaaS offering. But considering the engineering hours spent maintaining, tuning, and securing the open-source offering, the difference may not be as significant as you think. Fortunately, any time spent learning to use Argo CD will not be lost. The Akuity Platform provides the same familiar interface (and APIs).
Open-source installations of Argo CD will typically be set up to manage themselves with an Application. Like any other Application, it will define how to deploy Argo CD into the cluster with all its configurations from a Git repo. A SaaS offering may limit how you can manage these settings, preventing Argo CD from managing itself. You may need to use the platform's dashboard, an API, or an infrastructure-as-code (IaC) tool. Akuity has solved this challenge with the introduction of Declarative Management for the Akuity Platform and the launch of the Akuity Terraform provider.
The number of Argo CD instances you will need depends on several factors, including the size and complexity of your environment, the number of users accessing the system, and the workloads the instances will handle. You can determine the optimal architecture and appropriate compromises by carefully considering these factors and your organization's situation.
As a rule of thumb, I'd recommend using a single Argo CD instance if you are new to Kubernetes and have a small number of clusters (=< 3) and applications (=< 100). If your organization implements a cluster per environment (e.g., dev, stage, prod), it typically makes sense to use one Argo CD instance per cluster.
Suppose you already have Argo CD in your environment and are hitting limitations on scaling. In that case, it may be time to transition to an instance per logical group or consider the Akuity Platform, which re-architectures Argo CD to be enterprise-ready and scalable by default.
Akuity aims to truly enhance the experience of using Argo. If you want insights on where to start with Akuity or Argo CD, please get in touch with me (Nicholas Morey) on the CNCF Slack. You can find me in the
#argo-* channels, and don't hesitate to send me a direct message.
We are thrilled to announce Kargo, a multi-stage application lifecycle orchestrator for continuously delivering and promoting changes through environments…...
The Akuity Platform offers cloud-hosted Argo CD that allows managing hundreds of Kubernetes clusters with no hassle of maintaining and scaling the control plane...