Tag

#jailbreaks

8 posts tagged jailbreaks.

technique-analysis

How LLM Jailbreaks Work: Techniques, Success Rates, and Defender Responses

A practitioner's breakdown of how LLM jailbreaks work — from roleplay conditioning and encoding tricks to multi-turn manipulation — with attack success rates from peer-reviewed research.
June 20, 2026
technique-analysis

DAN Prompt Jailbreak Explained: How 'Do Anything Now' Attacks Work

DAN (Do Anything Now) is the most replicated persona-injection jailbreak in LLM history. Here's the mechanism, why it worked, what version evolution
June 12, 2026
analysis

Why Jailbreaks Work: Competing Objectives and Mismatched Generalization

Jailbreaks aren't a grab-bag of tricks — they exploit two structural failure modes of safety training. Understanding competing objectives and mismatched
May 22, 2026
red-team

ArtPrompt Post-Mortem: Why ASCII-Art Bypasses Worked

A defender-vs-attacker walkthrough of the ArtPrompt ASCII-art jailbreak. Where it slipped past safety training, which model families patched and how, and
May 10, 2026
red-team

Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe

Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where
May 10, 2026
red-team

Multimodal jailbreaks: image and audio attack surfaces in 2026

Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic
May 10, 2026
red-team

System prompt extraction: the techniques that still leak in 2026

A red-team walkthrough of how system prompts get exfiltrated from production LLM apps — direct extraction, indirect inference, behavioral fingerprinting —
May 10, 2026
red-team

Jailbreak Technique Catalog: Working as of 2026 Q2

Which jailbreak technique classes still work against current production LLMs, what's been hardened, and the cost-of-attack trend. Indexed for practitioners.
May 6, 2026