What 16 engines found in 2,900 MCP servers.
Six thousand four hundred ninety-four scans. Sixteen independent engines. One uncomfortable headline number: 91% of repos in our index were flagged by at least one tool. Here is the full breakdown — distribution, detection rates, and what cross-engine agreement actually tells you.
91.4% of MCP server repositories have at least one security finding. We ran 6,494 scans across 2,896 unique repos with 16 independent engines over the past month. The average trust score was 7.54 / 10. Thirteen repos scored below 5.0. Half landed between 5 and 7.
This is not a malware problem. It is a hygiene problem. Most repos in the yellow zone are maintained, functional, and used in production. They carry known CVEs in their dependency chains, secrets in commit history, or MCP tool permissions broader than what the tool actually needs. One scanner would call most of them clean.
How the numbers were generated.
Sample. 2,896 unique repositories from public MCP registries, official MCP directories on GitHub, and community submissions. Scanned between March 1 and April 2, 2026. Roughly 13× larger than our first report.
Engines. 16 independent security engines, each in its own Docker container. No engine sees another engine's output. The full engine list covers five categories:
- Vulnerability scanning — Trivy, Grype, OSV Scanner, npm audit, pip-audit
- Secret detection — detect-secrets, Gitleaks
- Static analysis — Bandit, Semgrep, Checkov
- MCP-specific — Custom YARA rules, MCP Guardian
- Supply chain (informational) — Syft, ScanCode, Cisco AIBOM
Scoring formula
# For each finding f produced by engine e:
score -= severity_weight[f] * engine_weight[e]
# Capped per-engine to prevent any single tool dominating:
deduction[e] = min(deduction[e], 3.0)
# Final score clamped to [0.0, 10.0]:
final = max(0.0, min(10.0, 10.0 - sum(deduction)))
Three informational engines (Syft, ScanCode, Cisco AIBOM) produce findings but do not reduce the trust score. Full methodology lives at /docs/scoring.
Where the 2,896 repos landed.
| Zone | Score range | Repos | Percentage |
|---|---|---|---|
| Red | 0.0 — 4.0 | 13 | 0.2% |
| Yellow | 5.0 — 7.0 | 3,231 | 49.7% |
| Green | 8.0 — 10.0 | 3,250 | 50.1% |
The yellow zone carries the main finding. These 3,231 repos are not abandoned or broken. Most are actively maintained. They carry CVEs with published advisories, secrets committed to source, or MCP-specific configuration issues.— from the dataset notes
Detection rates, per engine.
The three vulnerability scanners have the highest detection rates. Dependency vulnerabilities are common, CVE databases are extensive, and most repos pull in dozens of transitive dependencies.
| Engine | Detection rate | Category |
|---|---|---|
| Trivy | 76.5% | Vulnerability scanning |
| OSV Scanner | 54.9% | Vulnerability scanning |
| Grype | 49.8% | Vulnerability scanning |
| detect-secrets | 43.6% | Secret detection |
| Custom YARA | 31.8% | MCP-specific threats |
| MCP Guardian | 22.6% | MCP-specific checks |
| Gitleaks | 22.3% | Secret detection |
| Bandit | 12.9% | Static analysis (Python) |
detect-secrets flagged 43.6% of repos with an average of 99.9 findings per flagged repo. That number is high because it uses entropy-based pattern matching that catches non-standard secret formats. Many of those are false positives. Gitleaks (22.3%, pattern-based rules) is more precise but misses atypical formats. Running both tells you more than running either alone.
Six engines returned zero findings on this sample. This is expected. npm audit only runs on Node.js projects with a package-lock.json. pip-audit requires Python dependency manifests. Semgrep needs matching code patterns. A zero detection rate means the engine's rules did not match this sample, not that the engine failed.
The strongest signal: tools agreeing.
| Engines flagging | Scans | % of flagged |
|---|---|---|
| 1 engine only | 891 | 15.0% |
| 2 engines | 791 | 13.3% |
| 3 engines | 1,333 | 22.5% |
| 4 engines | 1,272 | 21.4% |
| 5 engines | 752 | 12.7% |
| 6 engines | 593 | 10.0% |
| 7 engines | 268 | 4.5% |
| 8–9 engines | 40 | 0.7% |
85% of flagged repos are flagged by two or more engines. That is the argument for multi-engine scanning. A single tool catches what it was built to catch. Run sixteen and the blind spots become visible.
Top co-occurring engine pairs: OSV + Trivy (3,167 scans), Grype + Trivy (3,154), Grype + OSV (2,877). These three form the backbone of high-confidence vulnerability detection across the dataset.
Two repos worth flagging.
Cline · score 3.5
Eight engines flagging independently — the highest engine count in the dataset. Findings span dependency vulnerabilities across multiple package ecosystems, detected secrets in source, and MCP-specific configuration issues. Cline is one of the most popular AI coding assistants. Developers using it daily should be aware.
Apache APISIX MCP · score 3.5
Seven engines flagging. Known CVEs in the dependency chain, configuration patterns that expose internal endpoints, broad permission scopes in MCP tool definitions. APISIX is a widely deployed API gateway; its MCP integration inherits the parent project's large attack surface.
What this report does not say.
- Static analysis only. We scan source, dependencies, configuration. We do not execute tools or observe runtime behavior.
- Sample bias. Repos from public registries tend to be better maintained than internal or ad-hoc MCP servers. The broader ecosystem likely scores lower.
- Three LLM-powered engines (requiring external API keys) were disabled during this scan period. Adding them would likely shift some scores down.
- Engine weights are our judgment call. A different weighting produces different scores. The full weights are at /docs/scoring.
Scan your MCP server now →
Sixteen engines, sixty seconds. Free, no account, no credit card.