All posts

DAN Prompt Jailbreak Explained: How 'Do Anything Now' Attacks Work

DAN (Do Anything Now) is the most replicated persona-injection jailbreak in LLM history. Here's the mechanism, why it worked, what version evolution looked like, and what defenders need to know.
June 12, 2026
Prompt Injection vs. Jailbreak: The Distinction and the Defender's Stack

These two terms get used interchangeably and they shouldn't. A jailbreak attacks the model's safety; prompt injection attacks the application's trust boundary. They have different root causes, different blast radii, and different defenses.
May 22, 2026
Why Jailbreaks Work: Competing Objectives and Mismatched Generalization

Jailbreaks aren't a grab-bag of tricks — they exploit two structural failure modes of safety training. Understanding competing objectives and mismatched generalization explains why scaling alone won't fix them, and where the defender's leverage actually is.
May 22, 2026
ArtPrompt Post-Mortem: Why ASCII-Art Bypasses Worked

A defender-vs-attacker walkthrough of the ArtPrompt ASCII-art jailbreak. Where it slipped past safety training, which model families patched and how, and the encoding-class variants still landing in 2026.
May 10, 2026
Garak in 2026: what it's actually good for, what it isn't

An honest practitioner review of NVIDIA's Garak LLM vulnerability scanner — what its probes catch, where the noise is, and where it slots into a real red-team workflow.
May 10, 2026
Indirect Prompt Injection in LLM Agents: Shipped Failures

Tool-using LLM agents amplify every indirect prompt injection vector. A red-team walkthrough of the exploit classes that have landed against production agents, and the containment patterns that actually limit blast radius.
May 10, 2026
Model Behavior Fingerprinting: Identifying a Wrapped LLM

Before you can attack an LLM app effectively, you need to know what model is under the hood. A practitioner walkthrough of behavioral fingerprinting techniques that reliably identify base models, and the implications for both attackers and defenders.
May 10, 2026
Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe

Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where the defenses are honestly weak.
May 10, 2026
Multimodal jailbreaks: image and audio attack surfaces in 2026

Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic prompts, perturbed images, audio steganography — and what defenders are actually doing about them.
May 10, 2026
PAIR vs GCG vs TAP: Which Automated Jailbreak Framework to Run

A practitioner comparison of the three most-cited automated jailbreak frameworks: PAIR, GCG, and TAP. Threat model fit, compute cost, transferability, and what each is honestly good for in 2026.
May 10, 2026
Prompt Injection in IDE Coding Agents: Copilot and Cursor

Coding assistants read everything in your repo and increasingly act on it. A red-team walkthrough of the prompt-injection variants that have shipped against Copilot, Cursor, Continue, and Windsurf — and the patterns that actually limit blast radius.
May 10, 2026
Prompt Injection via Retrieved Documents: The RAG Attack Surface

How attacker-controlled content reaches the model through retrieval pipelines, the variants that still land against production RAG stacks, and the defender's realistic options.
May 10, 2026
Scoping an AI Red-Team Engagement: The Questions That Matter

A working methodology for scoping LLM red-team engagements — the threat-model conversation, surface inventory, success criteria, and the four scoping mistakes that produce useless deliverables. From a practitioner who's seen it go wrong.
May 10, 2026
System prompt extraction: the techniques that still leak in 2026

A red-team walkthrough of how system prompts get exfiltrated from production LLM apps — direct extraction, indirect inference, behavioral fingerprinting — and what actually keeps them hidden.
May 10, 2026
Jailbreak Technique Catalog: Working as of 2026 Q2

Which jailbreak technique classes still work against current production LLMs, what's been hardened, and the cost-of-attack trend. Indexed for practitioners.
May 6, 2026
What this site is for

Jailbreaks FYI covers offensive AI security from a working practitioner's perspective. Here's what we publish.
May 2, 2026