Where engines agree.
Where they don't.
Sixteen scanners, 769 repos, 890 scans. The security-relevant engines agreed with each other on 19–26% of flagged repositories. That is the case for never trusting a single tool's "all clear".
We ran 16 different security scanners against 769 MCP server repositories. The engines agreed on almost nothing. Six of sixteen produced any findings at all. Among those six, the security-relevant pairs co-flagged the same repo only 19–26% of the time.
How the comparison was set up.
For every submitted URL, the system clones the repo and dispatches it to engines running three at a time, in isolated Docker containers. No engine has access to another engine's output. Results are combined into a weighted trust score.
Engine weight buckets
1.0— High-signal security tools (Semgrep, OSV Scanner)0.7— Solid general-purpose tools (Bandit, detect-secrets)0.5— Moderate signal (custom YARA)0.3— Noisy tools0.0— Informational only (Syft, ScanCode) — produce findings, don't reduce score
Six engines did all the flagging.
| Engine | Scans flagged | What it detects |
|---|---|---|
| Syft | 671 | SBOM (informational, weight 0.0) |
| OSV Scanner | 485 | Known CVEs in dependencies |
| detect-secrets | 351 | Hardcoded secrets, API keys, tokens |
| Custom YARA | 329 | MCP-specific threat patterns |
| MCP Guardian | 296 | MCP protocol abuse patterns |
| Bandit | 118 | Python security anti-patterns |
The remaining 10 engines found nothing. Zero. Not because the repos are clean — because most engines were built for general-purpose code, not for the specific threat model of AI tool-calling systems. The MCP-specific tooling layer is young and shallow.
Three categories, partial overlap.
Scanner
secrets
Guardian
shared
Engine co-occurrence
| Engine pair | Co-flagged scans | % of 890 |
|---|---|---|
| OSV Scanner + Syft | 442 | 49.7% |
| detect-secrets + Syft | 301 | 33.8% |
| Custom YARA + Syft | 265 | 29.8% |
| MCP Guardian + Syft | 254 | 28.5% |
| detect-secrets + OSV Scanner | 230 | 25.8% |
| Custom YARA + MCP Guardian | 202 | 22.7% |
| Custom YARA + detect-secrets | 181 | 20.3% |
Syft appears in the top four pairs because it flags nearly everything (SBOM generation runs on any repo with dependencies). Remove Syft and the real pattern emerges: the security-relevant engines (OSV, detect-secrets, YARA, MCP Guardian) co-flag repos about 170–230 times out of 890 scans — 19 to 26% agreement.
The Custom YARA + MCP Guardian pair is the most telling. These are the two engines specifically built for MCP threat patterns, and they agree on 22.7% of scans. For the rest, one catches what the other misses. That is exactly why multi-engine scanning matters.— from the analysis
Real scores for widely-deployed tools.
All data from static analysis of public GitHub repositories.
| Repository | Score | Engines flagged | Notes |
|---|---|---|---|
| Cline | 0.9 | 5 | VS Code AI coding assistant |
| Zed | 2.3 | 4 | Code editor with AI features |
| Letta AI | 2.5 | 4 | Long-term memory for agents |
| Block's Goose | 2.7 | 5 | Open-source agent framework |
| LiteLLM | 3.2 | 5 | LLM API proxy |
| TruffleHog | 3.3 | 5 | Secret-scanning tool |
| Continue | 3.5 | 4 | VS Code AI extension |
| Trivy | 4.5 | 4 | Container security scanner |
Two caveats. Larger repos score worse — more code and more dependencies mean more findings. And these scores reflect supply-chain risk as much as code quality: a repo can score 2.0 because three transitive dependencies have unpatched CVEs the maintainers may not even know about.
Constraints we'd like to lift.
- Static analysis only. No runtime behavior, no protocol interaction, no prompt-injection testing.
- Point-in-time snapshots. Scores change as deps get patched and engines improve.
- GitHub repos only. A deployed MCP server may have different configurations or network policies.
- Engine coverage is uneven. 6 of 16 engines produced findings. As MCP-specific tooling matures, the picture shifts.
Scan your MCP server now →
Sixteen engines, sixty seconds. Free, no account, no credit card.