← All features

Benchmarks

Measure before you switch

Benchmark retrieval models and code embedding candidates against fixed corpora, real GNO code slices, and pinned public OSS slices before changing defaults. GNO's research loops are built around measurable wins, not vibes.

Use cases
Evaluating a new embedding or reranker before shipping
Comparing candidates against a stable baseline
Generating per-collection model recommendations with receipts

What it gives you

  • Regression-first retrieval evaluation
  • Canonical benchmark corpora for stable comparisons
  • Real GNO code slice benchmark for product-shaped signal
  • Public OSS slices for generalization checks
  • Autonomous candidate search with bounded budgets
  • Per-collection model recommendations backed by results

Try it yourself

Representative commands and entry points. Full reference lives in the documentation.

bun run eval:hybrid
bun run bench:code-embeddings --candidate bge-m3-incumbent --write
bun run research:embeddings:autonomous:search --dry-run

Keep reading

Related features and docs.