Garak in 2026: what it's actually good for, what it isn't

Garak (NVIDIA’s open-source LLM vulnerability scanner) is the tool most teams reach for first when someone asks “do we have an LLM scanner.” That’s not a complaint — it’s earned its place — but the gap between “ran Garak” and “red-teamed the LLM application” is wider than the marketing implies.

This review draws on Garak’s documentation, the project’s research paper, and published practitioner reports covering its use against production LLM apps. Where it shines, where it noises, and where it slots into a real workflow.

What Garak is

Garak is a probe-based scanner. It runs a library of canned attack templates (“probes”) against a target LLM endpoint, parses the responses with detectors, and reports which probes scored as hits. The probe library covers most named jailbreak families — DAN variants, encoding attacks, prompt injection ↗ patterns, leakage probes, refusal-bypass templates — plus generic functional checks for things like profanity output, PII leakage, and known-CVE-style behaviors.

Architecturally it’s a pytest for LLM-side vulns. Probes are plugins. Detectors are plugins. Reports are JUnit-style XML plus an HTML summary. This shape is exactly right; it’s the part that’s held up best as the LLM-security space has churned.

Where it earns the install

Coverage of named techniques. Garak’s probe library is the easiest way to test whether a target model resists the named, well-publicized attack families. If you need a one-line answer to “is this model vulnerable to PromptInject-style overrides?” or “does it leak under known continuation attacks?”, Garak gets you there with one command. For internal red teams supporting product orgs that ship multiple models, this is real value.

CI-friendly output. The JUnit XML output drops cleanly into CI pipelines. Published write-ups describe teams running Garak nightly against staging endpoints, gating model upgrades on probe pass rates. The output isn’t precise enough to be a release gate on its own, but it’s useful as a regression detector — if today’s run hits more probes than the previous version did, something changed in the alignment.

Probe authoring. Writing new probes is straightforward. You inherit a base class, supply a prompt template, and pick or write a detector. Per the documentation and community accounts, many internal red teams end up with a private fork or a custom probes directory holding the team’s working attack catalog. The framework’s value is, at that point, the harness around your own attacks.

Honest defaults. Garak doesn’t oversell. The README says “this is a scanner; it finds known things; novel attacks are not its job.” That’s right and refreshing.

Where it gets in the way

Detector false positives. The detectors are the soft underbelly. Many are substring-based (“does the output contain the phrase X”). Modern frontier models often refuse a probe while including the forbidden phrase as part of the refusal text (“I can’t help with how to X”). Garak scores this as a hit. Reviewers report elevated false positive rates against frontier models on certain probe classes (continuation, encoding, profanity) unless the detectors are tuned, and note that detector tuning can consume more effort than authoring the probes themselves.

Static probe corpus. The probe library is updated when the maintainers update it. The frontier of jailbreak research is updated daily on Discord. A Garak run in 2026 is testing against jailbreaks that were public in 2024. This is fine for a baseline check; it is not red-teaming. The team that runs Garak quarterly and ships is doing baseline due diligence, not security testing.

No multi-turn modeling. Garak’s probe model is single-turn. Most of what’s actually landing in 2026 is multi-turn — Crescendo, role-play escalation, gradual context manipulation. This class is covered in multi-turn role-play attacks and it is genuinely Garak’s weakest area. Some probes attempt to simulate multi-turn but the conversation state model is thin. You will not catch multi-turn vulnerabilities with Garak alone.

RAG and agent harnesses are out of scope. Garak tests the model. The injection vectors that matter in production today — RAG ingestion, tool-use redirection, indirect injection — require harnessing the application as the test surface, not the model endpoint. You can stitch Garak into that with effort; out of the box it doesn’t do it. For the application-side classes, see prompt injection via retrieved documents and indirect prompt injection in LLM agents.

Cost discipline. A full Garak run with all probes against a frontier model endpoint is many thousands of requests. Practitioner reports describe single-run API bills reaching the high hundreds of dollars against frontier-class endpoints when the probe set is left unscoped. There’s a --probes filter; use it religiously.

Where Garak slots into a real workflow

A workable pattern reported across multiple engagements:

Baseline pass at engagement kickoff. Run a scoped Garak probe set against the target. The output is a sanity check: does this model resist the named, well-known attacks? Surprises here (“the customer’s model is vulnerable to vanilla DAN”) inform scoping and reset client expectations.
Regression harness for the engagement window. As you find novel attacks during manual red-teaming, write them as Garak probes. By end of engagement you have a CI-runnable corpus that the client can ship into their own pipeline.
Detector tuning is part of the deliverable. The tuned detector set for the client’s specific stack (their system prompt, their model, their output filters) is more valuable to them long-term than the raw findings. Hand over the tuned garak.config and the custom detectors with the report.
Pair with application-layer harnesses. Don’t try to test RAG injection or agent behavior with Garak alone. Use it for the model-side baseline; build separate harnesses for the application layer. PyRIT, promptfoo, and bespoke pytest harnesses each fill specific niches that Garak doesn’t.

The honest verdict

Garak is the right tool for one specific job: “does this model resist the named, public jailbreak families when tested in isolation?” It is not the right tool for “is this LLM application secure.” A team that conflates the two is delivering a checkbox, not a security assessment.

For the broader landscape — what’s actually landing in production, where the open research is, what attack classes Garak doesn’t cover — see our Q2 2026 technique catalog and the framework comparison post for the optimizer-driven attack frameworks (which Garak does not implement).

Garak is a baseline. Keep using it. Don’t ship without doing the rest.

For more context, adversarial ML research ↗ covers related topics in depth.

Garak in 2026: what it's actually good for, what it isn't

What Garak is

Where it earns the install

Where it gets in the way

Where Garak slots into a real workflow

The honest verdict

Sources

Jailbreaks FYI — in your inbox

Related

How LLM Jailbreaks Work: Techniques, Success Rates, and Defender Responses

DAN Prompt Jailbreak Explained: How 'Do Anything Now' Attacks Work

Indirect Prompt Injection in LLM Agents: Shipped Failures

Comments