Tag
#red-team
11 posts tagged red-team.
- red-team
ArtPrompt Post-Mortem: Why ASCII-Art Bypasses Worked
A defender-vs-attacker walkthrough of the ArtPrompt ASCII-art jailbreak. Where it slipped past safety training, which model families patched and how, and the encoding-class variants still landing in 2026.
- tooling
Garak in 2026: what it's actually good for, what it isn't
An honest practitioner review of NVIDIA's Garak LLM vulnerability scanner — what its probes catch, where the noise is, and where it slots into a real red-team workflow.
- red-team
Indirect Prompt Injection in LLM Agents: Shipped Failures
Tool-using LLM agents amplify every indirect prompt injection vector. A red-team walkthrough of the exploit classes that have landed against production agents, and the containment patterns that actually limit blast radius.
- red-team
Model Behavior Fingerprinting: Identifying a Wrapped LLM
Before you can attack an LLM app effectively, you need to know what model is under the hood. A practitioner walkthrough of behavioral fingerprinting techniques that reliably identify base models, and the implications for both attackers and defenders.
- red-team
Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe
Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where the defenses are honestly weak.
- red-team
Multimodal jailbreaks: image and audio attack surfaces in 2026
Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic prompts, perturbed images, audio steganography — and what defenders are actually doing about them.
- red-team
Prompt Injection in IDE Coding Agents: Copilot and Cursor
Coding assistants read everything in your repo and increasingly act on it. A red-team walkthrough of the prompt-injection variants that have shipped against Copilot, Cursor, Continue, and Windsurf — and the patterns that actually limit blast radius.
- red-team
Prompt Injection via Retrieved Documents: The RAG Attack Surface
How attacker-controlled content reaches the model through retrieval pipelines, the variants that still land against production RAG stacks, and the defender's realistic options.
- red-team
Scoping an AI Red-Team Engagement: The Questions That Matter
A working methodology for scoping LLM red-team engagements — the threat-model conversation, surface inventory, success criteria, and the four scoping mistakes that produce useless deliverables. From a practitioner who's seen it go wrong.
- red-team
System prompt extraction: the techniques that still leak in 2026
A red-team walkthrough of how system prompts get exfiltrated from production LLM apps — direct extraction, indirect inference, behavioral fingerprinting — and what actually keeps them hidden.
- red-team
Jailbreak Technique Catalog: Working as of 2026 Q2
Which jailbreak technique classes still work against current production LLMs, what's been hardened, and the cost-of-attack trend. Indexed for practitioners.