Beyond Dashboards: AI Agents for GitOps Operations
Alexander Matyushentsev

Kubernetes is data-rich and cognitively demanding. Every cluster resource has its own YAML, logs, and events, and engineers spend significant time gathering that data, correlating it, and drawing actionable insights from it. The sheer amount of operational signal Kubernetes generates can be difficult to navigate and troubleshoot.
So when a deployment breaks, the data connecting that failure to its cause, a recent commit, a misconfigured resource limit, a Kargo promotion gone wrong, exists. It just requires an engineer to find it, across multiple tools, under pressure, often in the middle of the night.
Better dashboards haven't solved this. More runbooks haven't solved this. The issue is structural: existing tools surface data, but leave the reasoning entirely to the engineer.
How AI Agents Reduce Cognitive Load in K8s Troubleshooting
AI agents introduce something traditional observability and alerting tools never could: autonomous reasoning. Unlike dashboards or alert tools that surface data for humans to interpret, an AI agent reasons over that data directly, taking actions, investigating issues, and driving toward resolution without waiting to be asked at every step.
Teams adopting AI-powered SRE tooling are already reporting 50–70% faster incident resolution and 3x faster root cause identification. For GitOps teams, the opportunity is even greater. Every change is versioned. Every promotion is traceable. The deployment lineage is already there, it just needs an intelligence layer that knows how to use it.
Akuity Intelligence: Bringing AI Agents to GitOps
Akuity Intelligence brings powerful new AI capabilities to Platform Engineering and DevOps teams through two core offerings:
Akuity AI Agents: Automated troubleshooting and remediation to accelerate and safeguard deployments.
Insights Dashboards: Provides real-time visibility into the health and state of your GitOps operations.
The Akuity Platform manages customers' Argo CD and Kargo instances, providing platform teams with a complete view of applications, clusters, histories, metrics, and users. By embedding intelligence capabilities into real-time system state. The AI Agents can answer complex operational questions immediately, without prompting engineers to copy-paste logs or resource manifests. It makes DevOps work conversational, but with the precision and context of the underlying platform.
This post is a technical deep dive into Akuity AI Agents only - how the AI Agents work, their observability and control plane capabilities, and what they bring to Kubernetes operations at scale. We will do a deep dive into AI Dashboards in a future post.
How Akuity AI Agents Work
Built-In Observability Awareness
Because the AI Agent is tightly integrated with Argo CD and Kargo, it has access to a continuous stream of observability data - application statuses, workloads, events, audit logs, deployment histories, and promotion states. This isn’t a snapshot - it’s a living model continuously updated by the Akuity Platform, as Argo CD and Kargo sync, deploy, and promote.
When a workload becomes unhealthy, the AI Agent immediately knows:
Which Deployment degraded.
Which ReplicaSet failed to scale.
Whether the failure correlates with a recent Git commit or Kargo promotion.
There’s no setup, no manual data collection, and no context-switching between tools. This means engineers can skip straight to debugging and decision-making, rather than spending time gathering and correlating data across tools.
If an Argo CD application shows a degraded Deployment, the Agent is already aware of that state. The user can simply type “help,” and the Agent knows what’s wrong without needing explanation. It understands dependencies, related workloads, and recent configuration changes.

Scoped Reasoning: The Agent Knows What You’re Working On
Every interaction happens within a defined scope, an Argo CD application, a Kubernetes namespace, or a Kargo project.
Once context is set, the Agent automatically narrows the scope of reasoning to what matters. It knows which resources belong to that application, their recent history,and what issues are likely to arise.
An engineer working inside a specific Kargo project, for example, gets analysis scoped entirely to that project's promotion history, associated applications, and relevant cluster state, without manually filtering out noise from the rest of the platform.
The Akuity Platform extends this reasoning further by providing access to historical and behavioral data, how often certain failures occur, what resolved them in the past, and who made recent changes. That historical continuity lets the Agent reason about patterns over time, not just current state.
An Autonomous Collaborator, Not a Chatbot
The Akuity AI Agent is not a chatbot. It doesn’t operate in a request – response loop that ends after each message. Instead, it acts as an autonomous collaborator, capable of analyzing, acting, and following up asynchronously until a problem is resolved.
The Agent can perform everything an engineer can do via kubectl, Argo CD, or Kargo, including:
Fetching logs and inspecting resources.
Reviewing audit histories.
Modifying configurations.
Adjusting a Deployment’s memory limits and updating an image.
Tracking every action through the Akuity Platform’s audit trail.

This agency transforms the workflow. Engineers can start a conversation, let the Agent investigate in the background, and come back to recommendations or completed actions. It’s a new operational mode where AI works side by side with engineers, not as an interface, but as a teammate.
Specialized AI Agents for Different Workflows
Rather than a single, general-purpose assistant, the Akuity Platform introduces a collection of specialized agents, each focused on a distinct operational domain. This modular design mirrors how DevOps teams work: different tasks require different reasoning, data, and different levels of autonomy.
All agents share the same core framework and access to observability data. What differentiates them is their specialized user interface, each is tailored to a specific workflow, from deployment optimization, to incident response, and release promotion.
Deployment Advisor: Day-to-Day Operations
The Deployment Advisor assists engineers with day-to-day infrastructure tasks. It analyzes the state of an Argo CD application or Kubernetes namespace, identifies issues, and recommends configuration changes. Common use cases include tuning readiness probes, adjusting resource limits, or reviewing rollout health.
Beyond one-time analysis, the Advisor can also take on lightweight operational tasks.
Engineers can delegate actions such as “monitor this Deployment for the next thirty minutes and confirm it’s stable”.
The Advisor continuously monitors logs, events, and metrics, then reports back with a summary. By combining observability with task-oriented interaction, it makes troubleshooting more efficient, closing the feedback loop traditional dashboards can’t.
On-Call Agent: Incident Response
If the Deployment Advisor helps during diagnosis, the On-Call Agent helps during incidents. It continuously monitors infrastructure and responds proactively when issues occur.
When an incident is detected, for example, pods crashing due to out-of-memory errors - the Agent can immediately investigate, correlate causes, and start remediation. Its behavior is governed by runbooks defined within the Akuity Platform.
Akuity runbooks specify how to handle known classes of problems under different conditions. For example, if the incident occurs in a development environment, the Agent might be permitted to act freely - increasing resource limits or patching manifests automatically. In production, it might instead escalate to the on-call team, summarize findings in Slack, and request approval before taking action.

This flexible behavior model lets organizations define the right balance between autonomy and oversight, while keeping the engineer in control of intent and accountability.
3: Promotion Advisor: Release Intelligence
Kargo already orchestrates the promotion of code from development to production, connecting infrastructure and application layers. The Promotion Advisor builds on that relationship by bringing intelligence to the release process itself.
Before a promotion, the Advisor can:
Enumerate all commits associated with the release.
Analyze commit messages and code diffs.
Assess the potential impact of each change.
Produces a high-level summary and an inferred risk score, incorporating promotion history from other stages.
By analyzing promotions before they occur, the Advisor gives engineers a clear view of what’s about to be deployed and the associated risk.

This turns Kargo promotions into explainable, data-driven decisions. Engineers can ask the Advisor why a change might be risky, and get reasoning grounded in the actual code and deployment history. It’s a step toward automated change intelligence - where promotions are informed by context, not just status.
Akuity Platform’s Safety and Control Mechanism
The use of AI agents in infrastructure management is only valuable if it can be trusted. The same autonomy that makes the Agent effective can also cause damage if misused. The Akuity Platform enforces two layers of safety mechanisms to ensure that AI actions remain secure, auditable, and aligned with human intent.
Layer 1: Inherited Access Controls
The Agents never operate outside the permissions of the user who invoked them. They inherit the existing access-control model of Argo CD and Kargo. Users can only set the context of an application or project they already have permission to manage. The Agent operates under the same RBAC scope - it never exceeds the privileges of the requesting user.
Layer 2: Tool Policies
The platform adds a second safety layer of protection through tool policies. Every Agent action, from retrieving logs to patching a Deployment, is implemented as a tool. Tool policies define which tools are automatically allowed, which require explicit approval, and which are disabled altogether.

For example, read-only actions might be auto-approved, while state-changing actions like delete pod or scale deployment require human confirmation. This model ensures that organizations can tailor the Agent’s behavior to their risk tolerance, environment, and compliance requirements — while maintaining the efficiency benefits of AI assistance.
Get Started with Akuity Intelligence
Akuity Intelligence is available now in the Akuity Platform and can be explored through a Free Trial.
If you are new to Argo CD or Akuity - you can try our Akuity Intelligence by signing up for a Free Trial or Book a Demo with an Akuity engineer.
If you are an Akuity customer - contact your CSM or email sales@akuity.io to book a demo today.
This blog was written by Alexander Matyushentsev, Argo Co-Creator, Co-founder and Chief Architect at Akuity.

