HiddenLayer APE

About the APE Taxonomy

"Prompt injection” has become a catch-all term for a wide range of attacks against generative AI systems, especially large language models. But taxonomically, it does too much work. It can refer to a delivery mechanism, an adversarial technique, a model behavior, an attacker objective, or an impact.

The HiddenLayer Adversarial Prompt Engineering (APE) Taxonomy was created to bring more structure to this space. It gives security practitioners, researchers, and organizations a shared language for describing adversarial prompts with more precision.

Community efforts like OWASP Top 10 for LLMs and MITRE ATLAS have helped establish prompt injection and jailbreaks as important AI security risks. APE complements those frameworks by going deeper at the prompt level: identifying the specific tactics, techniques, and prompt patterns used to manipulate generative AI systems.

How the Taxonomy Works

The taxonomy is built around a familiar abstraction model from cybersecurity: Tactics, Techniques, and Procedures. In APE, concrete procedures are represented as prompts or prompt sequences.

Prompts are the concrete artifacts: the strings of text, instructions, or input sequences presented to the system. Techniques are repeatable patterns abstracted from those prompts. Tactics are higher-level groupings of techniques based on the mechanism they exploit.

APE intentionally separates this observable layer from the objective layer. Tactics, techniques, and prompts describe what can be seen in the interaction. Impacts and objectives describe what the adversary appears to be trying to accomplish, which usually has to be inferred from the surrounding context and resulting system behavior.

This separation avoids brittle one-to-one mappings between method and motive. The same technique may be used to expose a system prompt, manipulate a workflow, trigger an unauthorized tool call, or produce prohibited content. Analysts can first tag the observable behavior, then infer objectives when the context supports it.

To learn more about the Adversarial Prompt Engineering (APE) Taxonomy, you can read the introductory blog and the version 3 update blog. To explore how the taxonomy is structured and maintained, visit https://github.com/hiddenlayerai/ape-taxonomy.

Key Concepts

Impacts & Objectives

Impacts

Broad categories of AI security consequences. APE maps impact to the traditional CIA triad: (Confidentiality, Integrity, and Availability).

Objectives

Specific outcomes the adversary is trying to achieve such as System Prompt Exposure, Task Redirection, or Denial of Wallet.

Subtypes

More granular variations of specific objectives. APE currently uses subtypes under Content Policy Violation to distinguish common policy-bound behaviors, such as Offensive Cyber Assistance, Phishing Assistance, and Self-Harm Facilitation.

Tactics Techniques & Prompts

Tactics, techniques, and prompts describe the observable structure of an adversarial prompt attack.

Tactics

High-level categories that group techniques by the mechanism they exploit, such as Obfuscation, Context Manipulation, or Token Manipulation.

Techniques

Specific, repeatable methods used within adversarial prompts. Techniques capture patterns that can be observed across concrete examples.

Prompts

Concrete strings of text or sequences of inputs presented to a generative AI system. They represent tangible evidence of an adversarial interaction.

AI Runtime Security

The same structure that makes the taxonomy useful for testing also makes it useful for defense. By mapping observed prompts and model behaviors to specific tactics, techniques, impacts, and objectives, security solutions can produce more meaningful detections and reduce ambiguity in alerts.

Instead of simply flagging “prompt injection,” organizations can understand what kind of adversarial behavior is being attempted and what outcome it may be trying to produce. That context enables more targeted mitigations, stronger policy enforcement, and better alignment between security operations and AI development teams.

Applied in AI Red Team Engagements

The HiddenLayer AI Red Team uses the APE Taxonomy as a foundation for rigorous and repeatable testing. Rather than relying only on ad hoc prompt crafting or automated tooling, the taxonomy helps define success criteria through impacts and objectives, while guiding exploration across a broad range of adversarial behaviors through tactics and techniques.

This helps ensure that testing covers not only obvious attack paths, but also more subtle combinations of tactics and techniques that may expose deeper vulnerabilities. The taxonomy also standardizes how findings are documented and communicated, allowing organizations to understand the nature of the risks identified and how they map to attacker objectives.