Guides

Troubleshooting

Common issues and their fixes — from missing SQLite to model download failures.

First stop: gno doctor

gno doctor

gno doctor runs every health check GNO knows how to run and prints a report. Nine out of ten problems get diagnosed right here. Read its output carefully before anything else.

For embedding freshness, look for the embedding-fingerprint check. It reports the current fingerprint, pending/stale chunks, legacy empty-fingerprint vectors, and mixed stored fingerprint groups.

Vector search fails on macOS

Symptom: gno vsearch or hybrid queries fail with a SQLite extension loading error.

Cause: the stock Apple SQLite doesn’t support loading extensions. GNO needs the Homebrew build.

brew install sqlite3
gno doctor

Models fail to download

Symptom: the first gno query or gno ask stalls during a model pull.

Solutions:

Results feel stale after editing

Re-scan the index:

gno update

Or run gno daemon in the background for continuous indexing.

Results feel off after switching models

Vector-based search uses the embedding model that was active when the index was built. Doctor now makes this visible through the embedding-fingerprint check. After switching embeddings, re-embed:

gno doctor
gno embed # re-embed stale/pending chunks
gno embed notes # or one collection

gno embed retries transient embedding failures within the same run. If doctor still reports stale, legacy, or mixed vectors after a normal embed, force a full refresh:

gno embed --force

gno embed --force uses the same same-run retry path. If a run still fails, rerun with verbose output so GNO prints sample failures and the retry hint:

gno --verbose embed --force

Check which GPU backend is detected

Before forcing a backend, see what node-llama-cpp actually detects. This works on Linux, Windows, and macOS and reports the active GPU backend (CUDA, Vulkan, or Metal), available VRAM, and which prebuilt binary is in use:

bunx --bun node-llama-cpp inspect gpu

(npx --no node-llama-cpp inspect gpu works too.) If CUDA or Vulkan shows as available here but GNO still runs on CPU, the backend is being selected or initialized incorrectly — see below. If it is not available here, the GPU driver or toolchain is the problem, not GNO.

Windows model startup hangs

Symptom: gno index, gno embed, or gno doctor appears stuck while loading node-llama-cpp, often around a Vulkan backend load test.

GNO now defaults to prebuilt local-model backends, times out backend initialization, and retries CPU on Windows when automatic GPU backend selection fails. Run diagnostics first:

gno doctor

To force CPU mode for a constrained Windows machine, run:

GNO_LLAMA_GPU=false gno embed --yes

If CPU embedding still consumes too much memory, keep the adaptive default or set an explicit small context pool. On CPU-only systems, GNO defaults to one context on low-memory Windows machines and at most two contexts elsewhere:

GNO_EMBED_CONTEXTS=1 gno embed --yes

Advanced CPU tuning is available when you want to trade throughput, memory, and per-context parallelism:

GNO_EMBED_CONTEXTS=2 GNO_EMBED_THREADS=4 gno embed --yes
GNO_EMBED_CONTEXT_SIZE=512 gno embed --yes

Only opt into source builds when you intentionally have Visual Studio Build Tools and a working native toolchain:

GNO_LLAMA_BUILD=autoAttempt gno doctor

Still stuck?