Skip to content
POST · 2026-02-18 · 6 min read · Methodology spec

The scoring math, published in full.

Every weight, every cap, every nonlinearity. If you cannot reproduce a score on your own machine, it does not count as a verdict — so here is the spec, the table, and the code.

FIG. 05 · TRUST-SCORE FORMULA — RED · AMBER · GREEN
01 — Premise

A scoring methodology that lives only in a marketing PDF is not a methodology. It is a brand. This article is the source of truth. Every number on the site, every badge, every bar in every chart traces back to the formula below.

If you can't reproduce a score on your own machine, it doesn't count as a verdict.— Creed 01 / 03
02 — Formula

The whole thing in nine lines.

# Per-engine deduction (capped at 3.0)
def engine_deduction(findings, engine):
    raw = sum(severity_w[f.severity] * engine_w[engine] for f in findings)
    return min(raw, 3.0)

# Final trust score
def trust_score(scan):
    deductions = sum(engine_deduction(scan[e], e) for e in ENGINES)
    return max(0.0, min(10.0, 10.0 - deductions))

That's it. There is no machine-learning model, no per-server tuning, no learned ensemble. We trust auditable arithmetic over benchmark-tuned vibes.

03 — Weights

Severity weights (CVSS-aligned).

SeverityCVSS rangeWeight
Critical9.0 — 10.01.00
High7.0 — 8.90.60
Medium4.0 — 6.90.30
Low0.1 — 3.90.10
Info0.00

Engine weights (signal quality).

BucketWeightEngines
High signal1.0OSV Scanner · Semgrep · Trivy
Solid general-purpose0.7Bandit · detect-secrets · Grype · Gitleaks
Moderate0.5Custom YARA · MCP Guardian · Checkov
Noisy0.3npm audit · pip-audit (verbose modes)
Informational0.0Syft · ScanCode · Cisco AIBOM
04 — Light

From score to light.

05710
RED
AMBER
GREEN
LightScoreMeaning · README badge
Red0.0 — 4.9Do not connect without remediation. Critical findings present.
Amber5.0 — 7.0Connect with caution. Known issues; review before granting credentials.
Green7.1 — 10.0No high-severity findings detected by 16 independent engines.
05 — Why cap

Why the per-engine cap is 3.0.

Without a cap, a single noisy engine could drive any score to zero. detect-secrets averages ~100 findings per flagged repo, most of which are entropy false-positives. Allowing it to deduct unbounded would make every repo with a base64 string score 0/10. The cap forces engine results to compete: a repo only goes red when multiple, independent tools agree.

The cap of 3.0 is calibrated so that:

  • Five engines flagging at full severity drives a score to red.
  • Two engines flagging at high severity puts a repo in amber.
  • One noisy engine cannot move a repo more than 30% down the scale.
06 — Reproduce

Reproduce any score locally.

# Clone, install, score
git clone https://github.com/diemoeve/mcpampel
cd mcpampel
uv sync
uv run mcpampel score --repo <url> --json

The output is a JSON document with every engine's findings, every applied weight, and the final score arithmetic shown step-by-step. If your number differs from the website's, that is a bug — file it.

Scan your MCP server now

Sixteen engines, sixty seconds. Free, no account, no credit card.

AuthorNikita Frikh-Khar · Dresden
Last updated2026-04-08 · v1.4
Cite asFrikh-Khar, N. (2026). The MCPAmpel scoring spec.