What this site is for
Jailbreaks FYI covers offensive AI security from a working practitioner's perspective. Here's what we publish.
Jailbreaks FYI exists to cover offensive AI security with the same rigor a working AI red teamer would expect — and the same honesty about what does and doesn’t land in production.
What we publish:
Technical writeups of working attacks. Prompt injection variants, jailbreak techniques and the model behaviors they exploit, indirect injection through retrieved content, multi-modal attack chains, agent and tool-use abuse. Where possible, reproducible PoCs against open models. Closed models get attack patterns and behavioral analysis.
Adversarial ML, applied. Membership inference, model extraction, evasion attacks, training-data extraction, backdoors — focused on what’s exploitable in deployed systems, not theoretical bounds.
Red team methodology. Scoping AI engagements, building attack libraries, communicating findings to a model team that doesn’t speak security and a security team that doesn’t speak ML.
Tooling reviews. Honest takes on the offensive AI security tooling landscape — Garak, PyRIT, promptmap, the LLM-specific scanners — and what each is actually good for.
What we don’t publish:
- Press release rewrites
- Listicles
- Anything we can’t source to primary material
Bylines are pseudonymous. The work is the point. Tips, attack reports, and corrections to the editor.
Real content starts shortly.
Jailbreaks FYI — in your inbox
Working LLM jailbreak techniques, sourced and dated. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
DAN Prompt Jailbreak Explained: How 'Do Anything Now' Attacks Work
DAN (Do Anything Now) is the most replicated persona-injection jailbreak in LLM history. Here's the mechanism, why it worked, what version evolution looked like, and what defenders need to know.
Prompt Injection vs. Jailbreak: The Distinction and the Defender's Stack
These two terms get used interchangeably and they shouldn't. A jailbreak attacks the model's safety; prompt injection attacks the application's trust boundary. They have different root causes, different blast radii, and different defenses.
Why Jailbreaks Work: Competing Objectives and Mismatched Generalization
Jailbreaks aren't a grab-bag of tricks — they exploit two structural failure modes of safety training. Understanding competing objectives and mismatched generalization explains why scaling alone won't fix them, and where the defender's leverage actually is.