The Problem Nobody Talks About: AI Agents Have No Shared Security Vocabulary
Cybersecurity teams deploying AI agents in 2024 face a problem that rarely surfaces in vendor pitches: every agent speaks a different dialect. A threat detection agent built on one framework defines “lateral movement analysis” differently than an incident response agent built on another. A vulnerability scanner trained on one skill set cannot hand off a clean, machine-readable assessment to a remediation agent that organizes its capabilities around an entirely different schema. The result is a stack of AI tools that cannot reliably talk to each other, cannot be audited against a common standard, and cannot be trusted when the alerts are real.
This is not a theoretical concern about future AI deployments. Security operations centers are running agentic systems right now. Enterprises pull in tools from multiple vendors, layer them over open-source projects, and expect them to operate as a coherent defense. Without a shared skill taxonomy, a CISO cannot answer a basic question: does our AI incident-response agent actually cover the skills our threat-detection agent assumes it covers? The gap between those two assumptions is where attacks survive.
The auditing problem compounds the operational one. Regulators and legal teams increasingly demand accountability for automated security decisions. When an AI agent misses a breach or escalates a false positive into a costly lockdown, the organization needs to demonstrate what the agent was designed to do and whether it performed to that design. A taxonomy of skills provides that audit trail. Without one, liability is murky and post-incident reviews produce arguments, not answers.
Mahipal, a Berlin-based cybersecurity researcher with an M.Sc. in AI Security, has been building at this exact intersection — developing multi-agent AI defense frameworks and tools like EmailGuard for phishing detection and TSAF for protocol vulnerability analysis. His work surfaces the same friction repeatedly: agents designed to think like security teams still lack the shared vocabulary that would let them coordinate like one. A standardized, granular skill taxonomy — one detailed enough to cover the real operational breadth of AI in security environments — is the missing foundation. The 754-skill framework he has assembled represents a serious attempt to build it.
What mukul975 Actually Built — and Why 754 Is a Meaningful Number
Mahipal, a Berlin-based cybersecurity researcher who publishes under the GitHub handle mukul975 and holds an M.Sc. in AI Security, has built something that doesn’t yet have a direct competitor: a catalogue of 754 discrete cybersecurity skills structured explicitly for assignment to and execution by AI agents.
The number 754 is not arbitrary padding. Each skill maps to at least one of five authoritative industry frameworks — MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND, and NIST AI RMF. That cross-mapping decision is what separates this taxonomy from a personal notes project. Security teams, compliance officers, and AI system architects can trace every catalogued skill back to frameworks their organizations already recognize and, in many cases, are already required to follow. The taxonomy inherits institutional credibility without inventing new terminology.
The inclusion of MITRE ATLAS and NIST AI RMF is the detail that matters most for anyone thinking about where AI security is heading. Both frameworks were designed to address threats that target or involve AI systems themselves — adversarial machine learning attacks, model poisoning, data manipulation at inference time. Legacy frameworks like MITRE ATT&CK were built around human-operated attack chains against conventional infrastructure. They were not designed to describe what happens when one AI agent attacks another, or when a large language model becomes the attack surface rather than the analyst. Mahipal’s taxonomy covers that ground explicitly.
Mahipal’s broader body of work reinforces that this taxonomy emerges from operational thinking, not academic abstraction. His projects include TSAF, a Translation Security Analysis Framework aimed at detecting protocol vulnerabilities before exploitation; multi-agent AI systems built to replicate the reasoning patterns of security teams; and EmailGuard, an AI-driven phishing detection tool. The 754-skill catalogue is not a side project — it reflects the same core question running through all of his work: what does a machine need to know to operate competently and safely inside a security environment?
The agentskills.io Standard: An Attempt to Do for AI Agents What OpenAPI Did for APIs
Mahipal, a cybersecurity researcher and open-source developer based in Berlin with an M.Sc. in AI Security, is proposing agentskills.io as a machine-readable standard for declaring exactly what security capabilities an AI agent possesses. The concept mirrors what OpenAPI accomplished for web services: instead of developers guessing what an API can do, they read a specification. With agentskills.io, an orchestration layer reads a structured skill profile and knows, with precision, what a given agent can handle across a taxonomy of 754 defined cybersecurity competencies.
The parallel to OpenAPI is functional, not decorative. OpenAPI eliminated the chaos of ad-hoc API integration by giving systems a shared vocabulary. Agentskills.io targets the same problem in multi-agent security environments, where today’s common alternative is prompt engineering — writing natural language instructions and hoping the agent interprets them correctly. A machine-readable skill declaration replaces that guesswork with something verifiable and comparable across agents from different vendors or teams.
The standard already declares compatibility with Claude, Anthropic’s AI platform, which signals that Mahipal is building for real deployment conditions rather than theoretical use. Compatibility with a major production LLM means developers can begin testing the standard against actual agent pipelines without waiting for some future ideal environment.
The practical payoff, if adoption follows, is dynamic task routing based on verified capability profiles. An orchestration layer managing a fleet of security agents — some specializing in network intrusion detection, others in phishing analysis or protocol vulnerability scanning — could assign incoming tasks by reading each agent’s agentskills.io declaration and matching it against task requirements. This removes the current dependency on human operators manually configuring which agent handles what, or on systems making assumptions based on model names alone.
Mahipal’s own project portfolio reinforces why he is building this: his work includes TSAF, a framework for detecting protocol vulnerabilities, multi-agent AI defense systems designed to operate like coordinated security teams, and EmailGuard, an AI phishing detection tool. Each project represents a distinct skill domain. Agentskills.io is, in part, an attempt to give that kind of domain specificity a standard format that any agent ecosystem can read.
What Most Coverage Is Missing: This Is Governance Infrastructure, Not Just a Dev Tool
Most coverage of Mahipal’s taxonomy treats it as an interesting GitHub release and moves on. That framing misses the actual significance.
A shared skill taxonomy for AI security agents is governance infrastructure. Before regulators, insurers, or enterprise CISOs authorize autonomous agents to operate in live security environments, they need a common vocabulary for what those agents can do, what boundaries they respect, and how their capabilities are audited. Right now, that vocabulary doesn’t exist in any standardized form. A 754-skill taxonomy, structured and publicly available, is the foundation that audit frameworks and compliance checklists get built on top of. The organizations writing AI governance policy in the next two years will need exactly this kind of artifact to point to.
Mahipal’s work also isn’t a standalone experiment. His research portfolio includes TSAF — the Translation Security Analysis Framework — built specifically to detect protocol vulnerabilities before attackers exploit them. He also builds multi-agent AI defense systems and AI-powered phishing detection tools under the EmailGuard project. These aren’t loosely connected side projects. They represent a coherent research agenda focused on AI-native security tooling, developed by someone with an M.Sc. in AI Security operating out of Berlin. The taxonomy sits inside that agenda as the connective tissue — the layer that gives different tools a shared language for communicating about capabilities and risks.
The open-source, sponsor-funded structure matters for a different reason. Proprietary vendor standards arrive with a business model attached, which limits adoption and invites skepticism about whose interests the standard actually serves. A community-developed taxonomy that evolves through public contribution carries legitimacy that no single vendor can manufacture. Contributors can propose additions, flag gaps, and pressure-test definitions against real operational scenarios. That process, run transparently on GitHub, is how de facto standards actually form in technical communities.
The combination — right problem, right timing, right development model — is why this deserves attention beyond the developer community. Security teams, compliance officers, and AI governance bodies should be watching it now, before the standard they end up using gets written for them by someone with a commercial stake in the outcome.
Risks and Open Questions the Community Should Be Asking
The taxonomy’s openness is its greatest strength and its most obvious liability. A structured, machine-readable catalog of 754 offensive and defensive cybersecurity skills — complete with attack patterns, tool mappings, and technique relationships — hands defenders a shared vocabulary. It hands adversaries a training curriculum. Anyone fine-tuning a malicious AI agent today faces the problem of assembling coherent, labeled security knowledge at scale. A well-organized public taxonomy solves that problem directly. The security community has not seriously reckoned with this tradeoff, and the conversation needs to start before adoption accelerates.
Mapping quality is the second pressure point. The taxonomy’s value to AI agents depends entirely on whether its linkages to authoritative frameworks like MITRE ATT&CK are accurate and deep. A shallow mapping — one that connects a skill to an ATT&CK technique by keyword proximity rather than semantic precision — can send an autonomous agent down the wrong response path with no visible warning. Security orchestration systems built on bad mappings don’t fail loudly. They fail quietly, executing the wrong playbook while logs show nominal operation. Validating 754 skill entries against ATT&CK, NIST controls, and CVE data is not a weekend task. Independent researchers produce rigorous work, but peer review at this scope requires institutional participation that hasn’t materialized yet.
The coordination problem is the hardest obstacle. For the taxonomy to function as a true lingua franca, AI platform providers like Anthropic, Google DeepMind, and OpenAI need to reference it. Security vendors — CrowdStrike, Palo Alto Networks, Microsoft Sentinel — need to map their tool capabilities against it. Standards bodies like NIST and CISA need to acknowledge it. A single independent researcher based in Berlin, however technically rigorous, cannot compel any of those organizations to align. The history of cybersecurity standards is littered with technically sound frameworks that stalled because no anchor institution committed early. STIX and TAXII took years to gain traction despite strong technical foundations and explicit government backing. This taxonomy has neither yet. That gap between technical quality and institutional adoption is where promising infrastructure goes to be quietly ignored.
Why Timing Matters: The Window to Shape This Standard Is Right Now
The window to establish a skill standard for AI security agents is open right now — and it will not stay open long.
Agentic AI frameworks are already inside enterprise security stacks. AutoGPT-style autonomous agents, vendor copilots from Microsoft and CrowdStrike, and custom multi-agent orchestration systems are being handed real security workflows — threat triage, vulnerability scanning, incident response — before the industry has agreed on what a capable, trustworthy AI security agent should actually know. That gap between deployment speed and governance maturity is exactly where premature standards get locked in. Whoever defines the skill vocabulary first shapes how security teams evaluate, procure, and constrain these systems for years.
Mahipal’s 754-skill taxonomy lands at a precise regulatory moment. NIST released Cybersecurity Framework 2.0 in early 2024, and the NIST AI Risk Management Framework followed its initial release in 2023. Both are live documents that organizations are actively mapping their practices against. A taxonomy built to reflect those frameworks — rather than older compliance thinking anchored to CSF 1.1 or pre-AI threat models — gives practitioners a direct bridge between what regulators expect and what AI agents need to demonstrate. That alignment is not accidental; it is what makes the taxonomy usable as an evaluation tool rather than just a research artifact.
The project lives on GitHub, which matters for a reason beyond convenience. Security standards earn credibility through adversarial scrutiny, not committee approval. An open repository invites the exact pressure a security-critical specification requires: researchers can fork it, red-team its categorizations, surface missing skill domains, and submit pull requests. That public iteration cycle compresses the validation timeline that normally takes years inside standards bodies. The security research community — the same community that pressure-tests CVEs, MITRE ATT&CK mappings, and protocol implementations in public — can apply that same rigor here.
Six months from now, the major agentic AI platforms will have made architectural decisions that bake in their own implicit skill models. The time to insert a rigorous, open, framework-aligned alternative is before those defaults calcify.