Garak in 2026: what it's actually good for, what it isn't
An honest practitioner review of NVIDIA's Garak LLM vulnerability scanner — what its probes catch, where the noise is, and where it slots into a real red-team workflow.
Garak (NVIDIA’s open-source LLM vulnerability scanner) is the tool most teams reach for first when someone asks “do we have an LLM scanner.” That’s not a complaint — it’s earned its place — but the gap between “we ran Garak” and “we red-teamed our LLM application” is wider than the marketing implies.
This is a practitioner review based on running Garak across a portfolio of production LLM apps and integrating its output into engagement reports. Where it shines, where it noises, and where it slots into a real workflow.
What Garak is
Garak is a probe-based scanner. It runs a library of canned attack templates (“probes”) against a target LLM endpoint, parses the responses with detectors, and reports which probes scored as hits. The probe library covers most named jailbreak families — DAN variants, encoding attacks, prompt injection ↗ patterns, leakage probes, refusal-bypass templates — plus generic functional checks for things like profanity output, PII leakage, and known-CVE-style behaviors.
Architecturally it’s a pytest for LLM-side vulns. Probes are plugins. Detectors are plugins. Reports are JUnit-style XML plus an HTML summary. This shape is exactly right; it’s the part that’s held up best as the LLM-security space has churned.
Where it earns the install
Coverage of named techniques. Garak’s probe library is the easiest way to test whether a target model resists the named, well-publicized attack families. If you need a one-line answer to “is this model vulnerable to PromptInject-style overrides?” or “does it leak under known continuation attacks?”, Garak gets you there with one command. For internal red teams supporting product orgs that ship multiple models, this is real value.
CI-friendly output. The JUnit XML output drops cleanly into CI pipelines. Several teams I’ve worked with run Garak nightly against staging endpoints, gating model upgrades on probe pass rates. The output isn’t precise enough to be a release gate on its own, but it’s useful as a regression detector — if today’s run hits 12 probes the previous version didn’t, something changed in the alignment.
Probe authoring. Writing new probes is straightforward. You inherit a base class, supply a prompt template, and pick or write a detector. Most internal red teams I know end up with a private fork or a custom probes directory holding the team’s working attack catalog. The framework’s value is, at that point, the harness around your own attacks.
Honest defaults. Garak doesn’t oversell. The README says “this is a scanner; it finds known things; novel attacks are not its job.” That’s right and refreshing.
Where it gets in the way
Detector false positives. The detectors are the soft underbelly. Many are substring-based (“does the output contain the phrase X”). Modern frontier models often refuse a probe while including the forbidden phrase as part of the refusal text (“I can’t help with how to X”). Garak scores this as a hit. The result is reports with a 20–40% false positive rate against frontier models on certain probe classes (continuation, encoding, profanity) unless you go in and tune detectors. We spent more time tuning detectors than authoring probes on our last engagement.
Static probe corpus. The probe library is updated when the maintainers update it. The frontier of jailbreak research is updated daily on Discord. A Garak run in 2026 is testing against jailbreaks that were public in 2024. This is fine for a baseline check; it is not red-teaming. The team that runs Garak quarterly and ships is doing baseline due diligence, not security testing.
No multi-turn modeling. Garak’s probe model is single-turn. Most of what’s actually landing in 2026 is multi-turn — Crescendo, role-play escalation, gradual context manipulation. We covered this class in multi-turn role-play attacks and it is genuinely Garak’s weakest area. Some probes attempt to simulate multi-turn but the conversation state model is thin. You will not catch multi-turn vulnerabilities with Garak alone.
RAG and agent harnesses are out of scope. Garak tests the model. The injection vectors that matter in production today — RAG ingestion, tool-use redirection, indirect injection — require harnessing the application as the test surface, not the model endpoint. You can stitch Garak into that with effort; out of the box it doesn’t do it. For the application-side classes, see prompt injection via retrieved documents and indirect prompt injection in LLM agents.
Cost discipline. A full Garak run with all probes against a frontier model endpoint is many thousands of requests. We’ve seen single-run API bills in the high hundreds of dollars against gpt-4-class endpoints when someone forgot to scope the probe set. There’s a --probes filter; use it religiously.
Where Garak slots into a real workflow
A workable pattern I’ve seen across multiple engagements:
-
Baseline pass at engagement kickoff. Run a scoped Garak probe set against the target. The output is a sanity check: does this model resist the named, well-known attacks? Surprises here (“the customer’s model is vulnerable to vanilla DAN”) inform scoping and reset client expectations.
-
Regression harness for the engagement window. As you find novel attacks during manual red-teaming, write them as Garak probes. By end of engagement you have a CI-runnable corpus that the client can ship into their own pipeline.
-
Detector tuning is part of the deliverable. The tuned detector set for the client’s specific stack (their system prompt, their model, their output filters) is more valuable to them long-term than the raw findings. Hand over the tuned
garak.configand the custom detectors with the report. -
Pair with application-layer harnesses. Don’t try to test RAG injection or agent behavior with Garak alone. Use it for the model-side baseline; build separate harnesses for the application layer. PyRIT, promptfoo, and bespoke pytest harnesses each fill specific niches that Garak doesn’t.
The honest verdict
Garak is the right tool for one specific job: “does this model resist the named, public jailbreak families when tested in isolation?” It is not the right tool for “is this LLM application secure.” A team that conflates the two is delivering a checkbox, not a security assessment.
For the broader landscape — what’s actually landing in production, where the open research is, what attack classes Garak doesn’t cover — see our Q2 2026 technique catalog and the framework comparison post for the optimizer-driven attack frameworks (which Garak does not implement).
Garak is a baseline. Keep using it. Don’t ship without doing the rest.
For more context, adversarial ML research ↗ covers related topics in depth.
Sources
Jailbreaks FYI — in your inbox
Working LLM jailbreak techniques, sourced and dated. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Indirect Prompt Injection in LLM Agents: Shipped Failures
Tool-using LLM agents amplify every indirect prompt injection vector. A red-team walkthrough of the exploit classes that have landed against production agents, and the containment patterns that actually limit blast radius.
Model Behavior Fingerprinting: Identifying a Wrapped LLM
Before you can attack an LLM app effectively, you need to know what model is under the hood. A practitioner walkthrough of behavioral fingerprinting techniques that reliably identify base models, and the implications for both attackers and defenders.
Multi-Turn Role-Play Attacks: Why One Safe Turn Gets Unsafe
Crescendo, Many-Shot, and gradual context manipulation. How multi-turn jailbreaks evade single-turn classifiers, what's still landing in 2026, and where the defenses are honestly weak.