Topics

Browse posts by category and tag — every topic we cover, with the latest pieces under each.

Tags

Categories

red-team 10 posts

ArtPrompt Post-Mortem: Why ASCII-Art Bypasses Worked

A defender-vs-attacker walkthrough of the ArtPrompt ASCII-art jailbreak. Where it slipped past safety training, which model families patched and how, and
Indirect Prompt Injection in LLM Agents: Shipped Failures

Tool-using LLM agents amplify every indirect prompt injection vector. A red-team walkthrough of the exploit classes that have landed against production
Model Behavior Fingerprinting: Identifying a Wrapped LLM

Before you can attack an LLM app effectively, you need to know what model is under the hood. A practitioner walkthrough of behavioral fingerprinting
Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe

Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where
Multimodal jailbreaks: image and audio attack surfaces in 2026

Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic
Prompt Injection in IDE Coding Agents: Copilot and Cursor

Coding assistants read everything in your repo and increasingly act on it. A red-team walkthrough of the prompt-injection variants that have shipped

analysis 2 posts

technique-analysis 2 posts

tooling 2 posts

Defensive AI 1 posts

Best LLM Guardrail Tools 2026: A Practitioner's Comparison

A technical comparison of the best LLM guardrail tools 2026 — NeMo Guardrails, LLM Guard, Lakera, Guardrails AI, Azure Content Safety, and more, with real benchmark data.