Public eval · Morpheus AI inference network
Hypnex Bench
leaderboard.
Reproducible eval suite that runs each LLM on the public Morpheus inference API (api.mor.org/api/v1) against a fixed coding, math, and JSON-adherence probe set. Open MIT, audit trail in latest.json.
| # | Model | Pass | Coding | Math | JSON | p50 | p95 | Tokens |
|---|---|---|---|---|---|---|---|---|
First canonical bench run pending This page will populate after the first | ||||||||
Run it yourself
pip install hypnex-bench
HYPNEX_API_KEY=mor_xxx hypnex-bench run
hypnex-bench leaderboard
A full run is ~19 probes per model (~$0.20 of MOR for the live LLM set). Get an API key at app.mor.org.
What's measured
- Coding — 6 HumanEval-style probes; we exec the model's Python and assert correctness.
- Math — 8 GSM8K-style word problems; deterministic numeric extraction.
- JSON — 5 strict-schema probes; parseability + key/value match.
- Latency — p50 / p95 wall-clock, including network.
Hypnex is community-built and not affiliated with the Morpheus AI Foundation. Probe sets are intentionally small + verifiable; for canonical claims, swap in the official suites — the runner architecture is the same.