Tag

#llm-security

13 posts tagged llm-security.

Defensive AI

Best LLM Guardrail Tools 2026: A Practitioner's Comparison

A technical comparison of the best LLM guardrail tools 2026 — NeMo Guardrails, LLM Guard, Lakera, Guardrails AI, Azure Content Safety, and more, with real benchmark data.
June 21, 2026
technique-analysis

How LLM Jailbreaks Work: Techniques, Success Rates, and Defender Responses

A practitioner's breakdown of how LLM jailbreaks work — from roleplay conditioning and encoding tricks to multi-turn manipulation — with attack success rates from peer-reviewed research.
June 20, 2026
technique-analysis

DAN Prompt Jailbreak Explained: How 'Do Anything Now' Attacks Work

DAN (Do Anything Now) is the most replicated persona-injection jailbreak in LLM history. Here's the mechanism, why it worked, what version evolution
June 12, 2026
analysis

Prompt Injection vs. Jailbreak: The Distinction and the Defender's Stack

These two terms get used interchangeably and they shouldn't. A jailbreak attacks the model's safety; prompt injection attacks the application's trust
May 22, 2026
analysis

Why Jailbreaks Work: Competing Objectives and Mismatched Generalization

Jailbreaks aren't a grab-bag of tricks — they exploit two structural failure modes of safety training. Understanding competing objectives and mismatched
May 22, 2026
tooling

Garak in 2026: what it's actually good for, what it isn't

An honest practitioner review of NVIDIA's Garak LLM vulnerability scanner — what its probes catch, where the noise is, and where it slots into a real
May 10, 2026
red-team

Indirect Prompt Injection in LLM Agents: Shipped Failures

Tool-using LLM agents amplify every indirect prompt injection vector. A red-team walkthrough of the exploit classes that have landed against production
May 10, 2026
red-team

Model Behavior Fingerprinting: Identifying a Wrapped LLM

Before you can attack an LLM app effectively, you need to know what model is under the hood. A practitioner walkthrough of behavioral fingerprinting
May 10, 2026
red-team

Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe

Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where
May 10, 2026
red-team

Multimodal jailbreaks: image and audio attack surfaces in 2026

Vision and audio inputs are a separate attack channel from text. A practitioner survey of multimodal jailbreaks that still land in 2026 — typographic
May 10, 2026
red-team

Prompt Injection via Retrieved Documents: The RAG Attack Surface

How attacker-controlled content reaches the model through retrieval pipelines, the variants that still land against production RAG stacks, and the
May 10, 2026
red-team

System prompt extraction: the techniques that still leak in 2026

A red-team walkthrough of how system prompts get exfiltrated from production LLM apps — direct extraction, indirect inference, behavioral fingerprinting —
May 10, 2026
red-team

Jailbreak Technique Catalog: Working as of 2026 Q2

Which jailbreak technique classes still work against current production LLMs, what's been hardened, and the cost-of-attack trend. Indexed for practitioners.
May 6, 2026