The scoring math, published in full

01 — Premise

A scoring methodology that lives only in a marketing PDF is not a methodology. It is a brand. This article is the source of truth. Every number on the site, every badge, every bar in every chart traces back to the formula below.

If you can't reproduce a score on your own machine, it doesn't count as a verdict.— Creed 01 / 03

02 — Formula

The whole thing in nine lines.

# Per-engine deduction (capped at 3.0)
def engine_deduction(findings, engine):
    raw = sum(severity_w[f.severity] * engine_w[engine] for f in findings)
    return min(raw, 3.0)

# Final trust score
def trust_score(scan):
    deductions = sum(engine_deduction(scan[e], e) for e in ENGINES)
    return max(0.0, min(10.0, 10.0 - deductions))

That's it. There is no machine-learning model, no per-server tuning, no learned ensemble. We trust auditable arithmetic over benchmark-tuned vibes.

03 — Weights

Severity weights (CVSS-aligned).

Severity	CVSS range	Weight
Critical	9.0 — 10.0	1.00
High	7.0 — 8.9	0.60
Medium	4.0 — 6.9	0.30
Low	0.1 — 3.9	0.10
Info	—	0.00

Engine weights (signal quality).

Bucket	Weight	Engines
High signal	1.0	OSV Scanner · Semgrep · Trivy
Solid general-purpose	0.7	Bandit · detect-secrets · Grype · Gitleaks
Moderate	0.5	Custom YARA · MCP Guardian · Checkov
Noisy	0.3	npm audit · pip-audit (verbose modes)
Informational	0.0	Syft · ScanCode · Cisco AIBOM

04 — Light

From score to light.

05710

RED

AMBER

GREEN

Light	Score	Meaning · README badge
Red	0.0 — 4.9	Do not connect without remediation. Critical findings present.
Amber	5.0 — 7.0	Connect with caution. Known issues; review before granting credentials.
Green	7.1 — 10.0	No high-severity findings detected by 16 independent engines.

05 — Why cap

Why the per-engine cap is 3.0.

Without a cap, a single noisy engine could drive any score to zero. detect-secrets averages ~100 findings per flagged repo, most of which are entropy false-positives. Allowing it to deduct unbounded would make every repo with a base64 string score 0/10. The cap forces engine results to compete: a repo only goes red when multiple, independent tools agree.

The cap of 3.0 is calibrated so that:

Five engines flagging at full severity drives a score to red.
Two engines flagging at high severity puts a repo in amber.
One noisy engine cannot move a repo more than 30% down the scale.

06 — Reproduce

Reproduce any score locally.

# Clone, install, score
git clone https://github.com/diemoeve/mcpampel
cd mcpampel
uv sync
uv run mcpampel score --repo <url> --json

The output is a JSON document with every engine's findings, every applied weight, and the final score arithmetic shown step-by-step. If your number differs from the website's, that is a bug — file it.

Scan your MCP server now →

Sixteen engines, sixty seconds. Free, no account, no credit card.

Open the scanner → Full spec at /docs/scoring

AuthorNikita Frikh-Khar · Dresden

Last updated2026-04-08 · v1.4

Reproducegithub.com/diemoeve/mcpampel

Cite asFrikh-Khar, N. (2026). The MCPAmpel scoring spec.