Troubleshooting with Agent Skills
Kueue includes experimental agent skills that help an AI agent follow repeatable troubleshooting
runbooks for common workload investigations. The skills live in
cmd/experimental/agent/skills
and are intended for agents that can read repository instructions such as AGENTS.md.
Experimental
Agent skills andAGENTS.md support are experimental. They are not Kueue APIs, CLI commands,
or released binaries, and they do not provide backwards or forwards compatibility guarantees.
Agent behavior is non-deterministic, so a human must supervise the investigation and validate
the results before taking action.Before you begin
Before using these skills, make sure:
- You add an
@AGENTS.mdreference to your agent’s configuration file. - The agent can understand
AGENTS.mdand the files undercmd/experimental/agent/skills. - You have a local copy of the Kueue repository that includes the skills.
kubectlis configured for the cluster you want to inspect.- Your Kubernetes RBAC permissions allow reading the Workloads, parent jobs, Pods, events, and namespaces needed for the investigation.
RBAC affects the result. For example, an agent might be able to trace resources in one namespace but not resolve a preemptor Workload that lives in another namespace.
Available skills
| Skill | Use it when | Typical input | Typical output |
|---|---|---|---|
kueue-lineage | You need to trace ownership across Workload, Job, JobSet, Pod, Ray, Kubeflow, LeaderWorkerSet, Deployment, StatefulSet, or another supported job layer. | A resource kind, name, and namespace at any level of the ownership chain. | A tree from the Kueue Workload down to intermediate resources and Pods, with the starting resource marked when possible. |
kueue-who-preempted | You need to understand why a Workload was evicted or preempted, and which Workload or parent job caused it. | The victim Workload name and namespace. | The preemptor Workload, parent job, preemption reason, and preemptor/preemptee ClusterQueue paths when RBAC allows resolving them. |
Example prompts
To trace a resource lineage, ask the agent for the full Kueue lineage and provide the resource you are starting from:
Trace the Kueue lineage for Pod my-namespace/my-pod.
Trace the Kueue lineage for Job my-namespace/my-job and show the related Pods.
To investigate preemption, provide the victim Workload:
Find what preempted Workload my-namespace/job-my-job-19797.
Why was Workload team-b/job-job-b-victim-54490 evicted?
The agent should use the skill runbook to choose the right kubectl queries, parse the
resulting Workload status or events, and report the relevant resources back to you.
RBAC behavior
The kueue-lineage skill requires permission to read the resources in the lineage. Depending on
the starting point, this can include Workloads, Pods, Jobs, JobSets, Ray resources, Kubeflow
training jobs, Deployments, ReplicaSets, StatefulSets, or other supported job types in the same
namespace.
The kueue-who-preempted skill has two lookup paths:
- If you can list Workloads across all namespaces, the skill can search for the preemptor by
kueue.x-k8s.io/job-uidor by Workload UID. - If you only have namespace-scoped access, the skill can search the namespaces visible to you. When the preemptor lives in a namespace you cannot access, the skill should explain that an administrator needs to run the investigation with broader permissions.
Source of truth
The repository AGENTS.md
directs compatible agents to the skills index. The detailed runbooks are maintained with the
skills:
cmd/experimental/agent/skills/README.mdcmd/experimental/agent/skills/kueue-lineage.mdcmd/experimental/agent/skills/kueue-who-preempted.md
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.