A benchmark job moves like a storm front. It builds, it rages, it passes — and it leaves a record.
pending → running → done
↘ error
| State | What it means |
|---|---|
pending |
The herald has been dispatched. The storm has not yet begun. |
running |
The trial is underway. progress updates after each test completes. |
done |
The storm has passed. result holds the full chronicle. |
error |
Something broke the trial. error contains what went wrong. |
import time
import requests
BASE = "http://127.0.0.1:8000"
# Dispatch the herald
resp = requests.post(f"{BASE}/jobs/run", params={"mode": "quick"})
job_id = resp.json()["job_id"]
print(f"Job started: {job_id}")
# Wait for the storm to pass
while True:
job = requests.get(f"{BASE}/jobs/{job_id}").json()
p = job["progress"]
print(f" [{p['done']}/{p['total']}] {p['current_test']} — {job['status']}")
if job["status"] in ("done", "error"):
break
time.sleep(15)
if job["status"] == "done":
benchmarks = job["result"]["benchmarks"]
for b in benchmarks:
print(f" {b['name']}: {b['avg_tps']} TPS")
else:
print(f"Failed: {job['error']}")
While status == "running", the progress field is updated after each benchmark test completes:
{
"done": 3,
"total": 7,
"current_test": "Context sweep — 8,192 tokens"
}
total is set once the test plan is forged — after scan, before the first inference. done increments by 1 after each test finishes.
When status == "done", result carries the full structured chronicle:
{
"run_at": "2025-06-01T14:22:01Z",
"mode": "quick",
"model": "hermes-3-llama-3.1-8b-q8_0",
"server": {
"name": "llama.cpp (llama-server)",
"port": 8080,
"api_base": "http://localhost:8080"
},
"benchmarks": [
{
"test_id": "baseline_short",
"name": "Baseline (short prompt)",
"category": "baseline",
"ctx_size": 2048,
"max_tokens": 64,
"n_runs": 3,
"successful_runs": 2,
"avg_tps": 82.5,
"best_tps": 83.0,
"avg_elapsed_s": 0.812,
"avg_ttft_s": 0.118,
"gpu_stats": {
"peak_vram_used_mb": 8240,
"avg_vram_used_mb": 8180,
"peak_temp_c": 71.0,
"avg_temp_c": 70.2,
"peak_power_w": 218.0,
"avg_util_pct": 94.0
},
"warmup_run": { "success": true, "tps": 31.2, "elapsed_s": 2.6 },
"runs": [
{ "success": true, "tps": 82.1, "elapsed_s": 0.814, "ttft_s": 0.121 },
{ "success": true, "tps": 83.0, "elapsed_s": 0.810, "ttft_s": 0.114 }
]
}
],
"profile_snapshot": {
"gpus": [ "..." ],
"cpu": { "..." },
"ram": { "..." }
}
}
Jobs live in memory — a server restart clears the list. But the storm’s record is always written to disk before a job transitions to done:
data/last_run.json — the full resultdata/chronicle.jsonl — flattened rows appended to the growing recordThe herald list may be gone. The chronicle is not.
If the trial breaks — no server detected, OOM on the first test, exception in the engine — the job falls to error:
{
"status": "error",
"error": "No LLM server detected or benchmark produced no results.",
"finished_at": "2025-06-01T14:22:04Z"
}
Common causes:
| Error | Cause |
|---|---|
No LLM server detected |
Server not running, or answering on an unexpected port |
connection_refused |
Server crashed during the trial — likely OOM |
timeout |
Inference took > 120 seconds — context too large or model too slow for that context |