We scanned 216 MCP server repositories with 16 independent security engines. The average trust score was 7.15 out of 10. Five repos scored below 4.0. Ninety-one scored above 8.0. The remaining 125, a 57% majority, landed in the yellow zone between 5.0 and 7.0. These repos are functional and maintained. They also carry security issues that single-engine scanners miss.

Key findings

  • 57% of repos scored yellow (5.0-7.0), 41% green (8.0-10.0), 2.3% red (0-4.0)
  • Only 7.2% passed all 16 engines with zero findings
  • 72.4% were flagged by 3 or more engines independently
  • detect-secrets flagged 41.6% of repos for hardcoded credentials or API keys
  • Custom YARA rules (MCP-specific) flagged 38.1% for threats general-purpose scanners do not check for

Methodology

Sample. 216 unique repositories sourced from public MCP registries, official MCP directories on GitHub, and community submissions. Scanned between March 1 and 9, 2026.

Engines. 16 independent security engines, each running in its own sandboxed Docker container. No engine has access to another engine's output. The full engine list covers five categories:

  • Vulnerability scanning: Trivy, Grype, OSV Scanner, npm audit, pip-audit
  • Secret detection: detect-secrets, Gitleaks
  • Static analysis: Bandit, Semgrep
  • MCP-specific: Custom YARA rules, MCP Guardian
  • Supply chain (informational): Syft (SBOM), ScanCode (licenses), Cisco AIBOM

Scoring. Each scan starts at 10.0. For each finding, the score is reduced by severity_weight × engine_weight. The maximum deduction per engine is capped at 3.0, which prevents a single noisy tool from dominating the result. Three informational engines (Syft, ScanCode, Cisco AIBOM) produce findings for documentation but do not reduce the trust score. Final score is clamped to 0.0-10.0. Full details at docs/scoring.

Score distribution

Zone Score range Repos Percentage
Red 0.0 - 4.0 5 2.3%
Yellow 5.0 - 7.0 125 56.6%
Green 8.0 - 10.0 91 41.2%

The yellow zone carries the main finding. These 125 repos are not abandoned or obviously dangerous. Most are actively maintained, several by well-known organizations. They carry dependency vulnerabilities, exposed secrets in commit history, or MCP-specific configuration issues. A single security scanner would classify most of these as passing.

At the other end, only 16 repos (7.2%) passed all 16 engines with zero findings. That number sets the ceiling for how clean the ecosystem is today. It includes repos with minimal codebases where there was simply less surface area to scan.

Engine detection rates

Engine Detection rate Category
Trivy 81.0% Vulnerability scanning
Syft 73.5% SBOM generation (informational)
OSV Scanner 55.8% Vulnerability scanning
Grype 50.4% Vulnerability scanning
detect-secrets 41.6% Secret detection
Custom YARA 38.1% MCP-specific threats
MCP Guardian 35.4% MCP-specific checks
Gitleaks 27.9% Secret detection
Bandit 16.8% Static analysis (Python)

The three vulnerability scanners (Trivy, Grype, OSV Scanner) have the highest detection rates. Dependency vulnerabilities are common and well-cataloged. CVE databases are extensive, and most repos pull in dozens of transitive dependencies.

detect-secrets flagged 41.6% of repos. Gitleaks flagged 27.9%. The gap reflects different detection methods: detect-secrets uses entropy-based pattern matching that catches non-standard secret formats, while Gitleaks relies primarily on known patterns and regex rules.

The MCP-specific engines find problems that general tools cannot. Custom YARA rules, written for MCP threat patterns like overly broad tool permissions and suspicious tool descriptions, flagged 38.1%. MCP Guardian caught 35.4%. These findings do not appear in any general-purpose security scanner because those tools were not designed to look for them.

Seven engines returned zero findings across the entire sample. This is expected, not a failure. npm audit only runs on Node.js projects with a package-lock.json. pip-audit requires Python dependency manifests. Semgrep needs matching code patterns. A zero detection rate means the sample did not trigger that engine's rules.

Cross-engine patterns

When the three vulnerability scanners agree (Trivy, OSV Scanner, and Grype all flagging the same repo), the finding is almost always a confirmed supply chain issue with published CVE advisories. These are the highest-confidence results in the dataset.

When Custom YARA flags a repo that vulnerability scanners do not, the issue is MCP-specific. Common patterns include tools that request filesystem or network access beyond what their description implies, tool descriptions containing instruction-like language that could influence agent behavior, and configuration that exposes internal state to tool callers.

72.4% of repos were flagged by three or more independent engines. This cross-engine agreement is the core value of multi-engine scanning. A single tool catches what it is designed to catch. Running 16 in parallel surfaces patterns that no individual engine would flag on its own.

Notable findings

Apache APISIX MCP server scored 3.5 out of 10 with 7 engines flagging. Findings include known CVEs in the dependency chain, configuration patterns that expose internal endpoints, and broad permission scopes in MCP tool definitions. APISIX is a widely deployed API gateway. Its MCP integration inherits the parent project's large attack surface.

Cline scored 3.5 out of 10 with 8 engines flagging independently, the highest engine flag count in the sample. Findings span dependency vulnerabilities across multiple package ecosystems, detected secrets in source, and MCP-specific configuration issues. Cline is a popular AI coding assistant, making this finding directly relevant to developers who use it daily.

Microsoft Semantic Kernel MCP integration scored 4.4 with 7 engines flagging. The primary driver is a large dependency tree with known vulnerabilities. Larger codebases with more dependencies have more exposure by construction. The score reflects dependency risk, not code quality.

Damn Vulnerable MCP Server scored 4.5. This is an intentionally vulnerable project built for security research and training. It appears in public MCP registries without a clear warning label. We included it deliberately: if a multi-engine security scanner does not flag an intentionally vulnerable project, the scanner has a problem.

Limitations

  • 216 repos is a sample, not the full MCP ecosystem. Results reflect what is listed on public registries and GitHub, not privately deployed servers.
  • All analysis is static. We scan source code, dependencies, and configuration. We do not execute tools or observe runtime behavior.
  • Three informational engines (Syft, ScanCode, Cisco AIBOM) produce findings for documentation but do not affect the trust score.
  • Three LLM-powered engines (requiring external API keys from OpenAI, Groq, and Anthropic) were not active during this scan period. Adding them would likely shift some scores downward.
  • The sample skews toward repos on public registries, which tend to be better maintained than internal or one-off MCP server implementations. The real ecosystem may score lower.

What you can do

Scan before connecting. Paste any MCP server URL, GitHub repo, npm package, or PyPI package into MCPAmpel. Results in under 60 seconds. Free, no account required.

Monitor your dependencies. If your product connects to MCP servers, set up continuous monitoring to catch score changes when new vulnerabilities are disclosed or dependencies are updated.

Automate in CI/CD. The MCPAmpel GitHub Action scans repositories on every pull request. Set a minimum trust score threshold and fail builds that drop below it.

Map to compliance frameworks. For companies under NIS2 or similar regulations, MCP servers are part of your software supply chain. MCPAmpel generates SBOM exports and compliance documentation. See OWASP MCP Top 10 coverage.

Scan an MCP server now, for free.

Start scanning