Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
- #red-team 11
- #llm-security 10
- #jailbreaks 6
- #prompt-injection 3
- #agents 2
- #indirect-injection 2
- #alignment 1
- #artprompt 1
- #ascii-art 1
- #audio-attacks 1
- #automated-jailbreaks 1
- #catalog 1
- #competing-objectives 1
- #consulting 1
- #copilot 1
- #crescendo 1
- #current-techniques 1
- #cursor 1
- #defense-in-depth 1
- #encoding-attacks 1
- #engagement-design 1
- #fingerprinting 1
- #frameworks 1
- #garak 1
- #gcg 1
- #generalization 1
- #guardrails 1
- #ide-agents 1
- #jailbreak 1
- #many-shot 1
- #meta 1
- #methodology 1
- #model-identification 1
- #multi-turn 1
- #multimodal 1
- #pair 1
- #post-mortem 1
- #prompt-extraction 1
- #rag 1
- #reconnaissance 1
- #red-team-tooling 1
- #safety-training 1
- #scanners 1
- #scoping 1
- #supply-chain 1
- #system-prompt-leak 1
- #tap 1
- #tool-use 1
- #tooling 1
- #vision-attacks 1
Categories
red-team 10 posts
- ArtPrompt Post-Mortem: Why ASCII-Art Bypasses WorkedA defender-vs-attacker walkthrough of the ArtPrompt ASCII-art jailbreak. Where it slipped past safety training, which model families patched and how, and the encoding-class variants still landing in 2026.
- Indirect Prompt Injection in LLM Agents: Shipped FailuresTool-using LLM agents amplify every indirect prompt injection vector. A red-team walkthrough of the exploit classes that have landed against production agents, and the containment patterns that actually limit blast radius.
- Model Behavior Fingerprinting: Identifying a Wrapped LLMBefore you can attack an LLM app effectively, you need to know what model is under the hood. A practitioner walkthrough of behavioral fingerprinting techniques that reliably identify base models, and the implications for both attackers and defenders.
- Multi-Turn Role-Play Attacks: Why One Safe Turn Gets UnsafeCrescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where the defenses are honestly weak.
- Multimodal jailbreaks: image and audio attack surfaces in 2026Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic prompts, perturbed images, audio steganography — and what defenders are actually doing about them.
- Prompt Injection in IDE Coding Agents: Copilot and CursorCoding assistants read everything in your repo and increasingly act on it. A red-team walkthrough of the prompt-injection variants that have shipped against Copilot, Cursor, Continue, and Windsurf — and the patterns that actually limit blast radius.
analysis 2 posts
- Prompt Injection vs. Jailbreak: The Distinction and the Defender's StackThese two terms get used interchangeably and they shouldn't. A jailbreak attacks the model's safety; prompt injection attacks the application's trust boundary. They have different root causes, different blast radii, and different defenses.
- Why Jailbreaks Work: Competing Objectives and Mismatched GeneralizationJailbreaks aren't a grab-bag of tricks — they exploit two structural failure modes of safety training. Understanding competing objectives and mismatched generalization explains why scaling alone won't fix them, and where the defender's leverage actually is.
tooling 2 posts
- Garak in 2026: what it's actually good for, what it isn'tAn honest practitioner review of NVIDIA's Garak LLM vulnerability scanner — what its probes catch, where the noise is, and where it slots into a real red-team workflow.
- PAIR vs GCG vs TAP: Which Automated Jailbreak Framework to RunA practitioner comparison of the three most-cited automated jailbreak frameworks: PAIR, GCG, and TAP. Threat model fit, compute cost, transferability, and what each is honestly good for in 2026.