Public calibration

Results you can inspect, not a victory chart.

Twenty public repositories test whether ShipVitals executes real project commands and avoids turning absent visual or independent evidence into false confidence.

Current result

20repositories audited
0 / 0P0 and P1 findings
74-89evidence-capped scores

The repositories are mature public projects, so zero release blockers is plausible. The useful signal is that documentation examples and vendored files no longer become false P0 findings, while missing UI proof still limits confidence.

Coverage

CategoryCountObserved cap
CLI, API, library11Independent review
Landing or content2Visual and independent proof
SaaS or Next.js2Visual and independent proof
Marketplace or extension3Visual and independent proof
Agent or automation2Independent review

What the scores mean

A score of 74 does not mean a UI project is poor. It means static files and deterministic commands cannot establish visual behavior. A score of 89 does not mean a library is nearly perfect. It means the available checks passed while independent review remains absent.

ShipVitals records these boundaries in each report. Projects are listed for calibration only and are not affiliated with or endorsed by ShipVitals.

Reproduce the suite

npm run benchmark:real

The manifest stores repository URLs, categories, expected commands, and audit context. Each result directory contains report.json, run.json, and a Markdown summary. Inspect the full benchmark table.