Working LLM jailbreak techniques, sourced and dated.
A practitioner reference for LLM jailbreak techniques. Working bypasses, model behaviors they exploit, the patches that did and didn't fix them — written for AI red teamers who need to know what's still landing today.
DAN Prompt Jailbreak Explained: How 'Do Anything Now' Attacks Work
DAN (Do Anything Now) is the most replicated persona-injection jailbreak in LLM history. Here's the mechanism, why it worked, what version evolution looked like, and what defenders need to know.
Recently updated
-
Prompt Injection vs. Jailbreak: The Distinction and the Defender's Stack
These two terms get used interchangeably and they shouldn't. A jailbreak attacks the model's safety; prompt injection attacks the application's trust boundary. They have different root causes, different blast radii, and different defenses.
-
Why Jailbreaks Work: Competing Objectives and Mismatched Generalization
Jailbreaks aren't a grab-bag of tricks — they exploit two structural failure modes of safety training. Understanding competing objectives and mismatched generalization explains why scaling alone won't fix them, and where the defender's leverage actually is.
-
ArtPrompt Post-Mortem: Why ASCII-Art Bypasses Worked
A defender-vs-attacker walkthrough of the ArtPrompt ASCII-art jailbreak. Where it slipped past safety training, which model families patched and how, and the encoding-class variants still landing in 2026.
-
Garak in 2026: what it's actually good for, what it isn't
An honest practitioner review of NVIDIA's Garak LLM vulnerability scanner — what its probes catch, where the noise is, and where it slots into a real red-team workflow.
-
Indirect Prompt Injection in LLM Agents: Shipped Failures
Tool-using LLM agents amplify every indirect prompt injection vector. A red-team walkthrough of the exploit classes that have landed against production agents, and the containment patterns that actually limit blast radius.
-
Model Behavior Fingerprinting: Identifying a Wrapped LLM
Before you can attack an LLM app effectively, you need to know what model is under the hood. A practitioner walkthrough of behavioral fingerprinting techniques that reliably identify base models, and the implications for both attackers and defenders.
-
Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe
Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where the defenses are honestly weak.
-
Multimodal jailbreaks: image and audio attack surfaces in 2026
Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic prompts, perturbed images, audio steganography — and what defenders are actually doing about them.
-
PAIR vs GCG vs TAP: Which Automated Jailbreak Framework to Run
A practitioner comparison of the three most-cited automated jailbreak frameworks: PAIR, GCG, and TAP. Threat model fit, compute cost, transferability, and what each is honestly good for in 2026.
Trusted by researchers across the AI security community
Jailbreaks FYI is part of a 26-site editorial network covering adversarial ML, AI governance, defensive tooling, and ops engineering — all open access.
Jailbreaks FYI — in your inbox
Working LLM jailbreak techniques, sourced and dated. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.