December 5, 2023
Hong Wang
Why We Created the Argo Project
Seeing the Argo Project reach graduation status in Cloud Native Computing Foundation in 2022 and now reaching incredible velocity of an Open Source project, I wanted to take this opportunity and remind myself and the community of the initial “Why?” behind our brainchild.
Long time before Argo made its rounds as one of the top open source CNCF projects in terms of velocity, there were substantial challenges connected to operationalizing application deployments on Kubernetes.
Kubernetes is pretty great out of the box, but back in 2016 we all felt we're going to be developing some custom tooling that fills the gap in Kubernetes and adds a deployment-focused layer of abstraction to it. We wanted to be able to deploy and manage all Kubernetes objects that make up an "application" as a single, atomic, unit. We already knew then that this would be really valuable for all businesses running on Kubernetes, since it would get developers to spend more time writing code and less time figuring out how to deploy and manage infrastructure.
The Argo Project was exactly this and was born out of practicality at Applatix. Being acquired by Intuit which is an end user company, gave us a real-world use case, and the opportunity to perfect the solution by developing it in the open. Since then, the open source approach has been a win-win situation for both Intuit and the community around Argo.
Let’s take a trip down memory lane and see the initial ‘whys’ and ‘hows’ behind the Argo Project.
How it all started - Applatix and Argo Workflows
Me, Jesse, and Alex (Akuity founders) met at Applatix - a fresh start-up at the forefront of building scalable production systems with containers and Kubernetes in both public and private clouds.
It was 2016 - the year of containers, microservices, and championing for the public cloud. Kubernetes was officially released the year earlier. We saw all these things on the table and felt that a big shift is coming in how applications are being deployed and maintained. Jenkins was being anachronistic even back then and the term DevOps gained enormous traction.
At Applatix we wanted to build a full DevOps solution. Think better-than-Jenkins experience but with containers and on a public cloud. In order to do that we had to decide which container orchestration solution to bet on. We could choose from Mesosphere, Docker Swarm, and Kubernetes.
We started with Mesosphere and had so many issues with it that when we learned about Kubernetes we knew that our solution had to be Kubernetes-native. What was our second observation was that Kubernetes is created for stateless workloads and it doesn’t support workflows out-of-the-box.
At that time Kubernetes community was discussing a workflow engine - they couldn’t make it into a primitive since it’s complex (ie. passing artifacts, working with data, etc.) and Kubernetes is more focused on computation (ie. CPU and memory, scheduling, etc.).
We had all the building blocks in place so that’s how we started Argo Workflows. However, we still felt that we can gather feedback quicker to iterate on the product as well as make it as Kubernetes-native as possible. In 2017 we decided to open-source Argo Workflows. This was also the year in which the Kubernetes community introduced the concept of a Custom Resource Definition (CRD). We’ve decided to use Kubernetes CRDs to introduce a new type of container-native jobs. This lead to a complete rewrite of Argo Workflows 2.0 as a CRD which enabled orchestrating parallel jobs, define workflows where each step is a task for a separate container, and model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a graph (DAG).
Argo CD Comes Into Being at Intuit
After meeting with Intuit at KubeCon 2017 we felt that something big is coming. They were looking for a team that could help them move their product portfolio onto Kubernetes and make it happen as seamlessly as possible. When they decided to buy Applatix we couldn’t wait to test Argo Workflows on such a large scale.
Still, we felt that there was a big piece of the puzzle missing. We wanted to solve another practical problem. When we looked only at the Intuit landscape we had so many Kubernetes clusters and so many namespaces to manage, but no tools to do that. It’s a common pattern across organizations that one application has several environments spread across different clusters with the consideration of the lifecycle promotion and development practice.
We figured out we need a tool that will handle this and at the same time onboard engineering teams to Kubernetes. At that time there simply was no open source tool that enabled you to manage multiple clusters from a single control plane, especially for the deployment pattern. That’s how Argo CD came to life.
What we focused on first is the developer experience that would:
- make engineers (ops and developers) understand Kubernetes despite its complexity
- teach the end user only the stuff he needs without the need of going in too deep
We needed multi-cluster support from day one to manage many environments across different clusters: development, staging, production (additionally for production, you may have three different regions to deploy) - and that’s just for one app or a microservice. You will definitely need a single control plane to make it easier to orchestrate all of these. That was the number one reason behind coming up with Argo CD.
The second one was to empower teams to work together - the platform and the application teams. This is why we’ve put emphasis on the GUI of the tool and decided to implement an application-centric view. Application resource (rather than namespace or cluster view) provides the best granularity for app developers, SREs, and DevOps engineers to work together and improve the application as well as its deployment and maintenance processes.
With Argo CD you can run your application deployments on auto-pilot, without the need of manually inspecting every change - the GitOps way. However, to limit the risk of failed deployments to minimum, you still can apply changes one-by-one by reviewing the real-time diff based on the live cluster status.
What’s also important is that in case of an incident in production, you can rollback to the previous version of the application with confidence in a short turnaround time, assure business continuity and customer satisfaction, while the technical team will inspect what went wrong and ship the fixed version when ready.
How Argo CD Improved Business Metrics and Delivered Business Value
What we were aiming at business-wise was to have great development velocity metrics (nowadays often referred to as the DORA metrics) - innovation has to happen frequently and to do that it has to be automated and visual, because if we won’t onboard developers quickly it will take around a year to just teach Kubernetes and then migrate everything to it.
To name just a few things Argo CD helps in:
- increasing visibility into Kubernetes infrastructure and helping teams navigate it
- visualizing how Kubernetes actually works
- introducing various views to understand what’s happening under the hood (ie. networking view)
- showing the connections between Kubernetes pods, clusters, apps,
- improving business metrics such as
- release frequency
- MTTD, MTTR (with Intuit being a fintech company, every downtime period actually costs millions of dollars)
- handling crisis situations without “throwing logs over the fence” (ie. with notifications to inform relevant teams and all the views in one place)
As Argo CD was crucial for a big team at Intuit, we also see now how it’s crucial for teams that are smaller and just want to adopt Kubernetes and improve their business metrics.
The missing piece - Argo Rollouts
As we shipped more and more to Kubernetes across the Intuit microservices landscape, utilizing the deployment generic workflows, we’ve noticed two things:
- a big percentage of incidents (around 50%) happened around the software release periods
- paying millions of dollars for observability tools led only to finding the causes of issues and not actively preventing them from happening or shortening the meantime to remedy (MTTR)
Here’s where the need of creating a fast feedback loop after triggering a release was really crucial. We talked with our application team about the idea of using the data matrix pattern for dry-running our software releases and they really loved it.
We’ve also decided to introduce the two substantial deployment strategies:
- blue-green (a gradual user traffic transfer from a previous version of an app to a nearly identical new release)
- canary (split the users into two groups where a small percentage will go to the canary while the rest stay on the old version)
With Argo Rollouts, when a business/product team wants to tiptoe into production and make sure everything works, it can quickly roll back to find and fix the issue in staging/QA and not lose money and decrease clients satisfaction.
Argo Rollouts was first championed by the payment product team at Intuit who wanted to make sure every single release is stable just to handle all the millions of transactions taking place inside their products.
After a successful implementation at Intuit (TurboTax, QuickBooks), Argo Rollouts is now powering a lot of large-scale application releases in the cloud - at Salesforce and Spotify, to just name a few.
Innovation on Top of Argo - Akuity Platform
While the Argo Project largely improved the lives of both business and platform teams, there are still many challenges that we want to face by building innovative solutions on top of this recently CNCF-graduated project.
With the Akuity Platform and its unique architecture we want to increase the velocity of teams even more and take it to a completely new level. Our “Argo CD as a managed service” offer is already meeting its early adopters and enabling them to quickly scale applications at an enterprise scale.
Still, we think that Argo is just another layer on top of Kubernetes. We believe there are many use cases that will become best practices as quickly as Argo became the recommended solution for Kubernetes application delivery.
Akuity is perfectly positioned to bring innovation on top of Argo and focus on what’s next. This includes creating new open source projects (have you heard about Kargo yet?) as well as delivering premium enterprise features to bridge the gap between GitOps and multi-step processes needed to satisfy business, compliance, security, and testing requirements.
Don’t hesitate to get in touch with me to share your pains, bottle necks, and ideas on how to level up cloud native deployment (and improve the sleep quality of every professional involved 😉). I’m also available on the CNCF Slack (argo-*
channels), LinkedIn, or you can just drop us a message.