Public eval · Morpheus AI inference network

Hypnex Bench
leaderboard.

Reproducible eval suite that runs each LLM on the public Morpheus inference API (api.mor.org/api/v1) against a fixed coding, math, and JSON-adherence probe set. Open MIT, audit trail in latest.json.

Models

Probes run

Generated

pending — first run TBD

live

#	Model	Pass	Coding	Math	JSON	p50	p95	Tokens
First canonical bench run pending This page will populate after the first `hypnex-bench run` against `api.mor.org`. Run it yourself with `pip install hypnex-bench && hypnex-bench run`.

Run it yourself

pip install hypnex-bench
HYPNEX_API_KEY=mor_xxx hypnex-bench run
hypnex-bench leaderboard

A full run is ~19 probes per model (~$0.20 of MOR for the live LLM set). Get an API key at app.mor.org.

What's measured

Coding — 6 HumanEval-style probes; we exec the model's Python and assert correctness.
Math — 8 GSM8K-style word problems; deterministic numeric extraction.
JSON — 5 strict-schema probes; parseability + key/value match.
Latency — p50 / p95 wall-clock, including network.

Hypnex is community-built and not affiliated with the Morpheus AI Foundation. Probe sets are intentionally small + verifiable; for canonical claims, swap in the official suites — the runner architecture is the same.

Hypnex Bench leaderboard.

Run it yourself

What's measured

Hypnex Bench
leaderboard.