Guides
Common issues and their fixes — from missing SQLite to model download failures.
gno doctorgno doctor runs every health check GNO knows how to run and prints a report. Nine out of ten problems get diagnosed right here. Read its output carefully before anything else.
For embedding freshness, look for the embedding-fingerprint check. It reports the current fingerprint, pending/stale chunks, legacy empty-fingerprint vectors, and mixed stored fingerprint groups.
Symptom: gno vsearch or hybrid queries fail with a SQLite extension loading error.
Cause: the stock Apple SQLite doesn’t support loading extensions. GNO needs the Homebrew build.
brew install sqlite3
gno doctorSymptom: the first gno query or gno ask stalls during a model pull.
Solutions:
gno models pull explicitly to see real-time progress and errors.GNO_NO_AUTO_DOWNLOAD=1 and download model files by hand into the cache.gno models clean <preset>.gno models pull --force.Re-scan the index:
gno updateOr run gno daemon in the background for continuous indexing.
Vector-based search uses the embedding model that was active when the index was built. Doctor now makes this visible through the embedding-fingerprint check. After switching embeddings, re-embed:
gno doctor
gno embed # re-embed stale/pending chunks
gno embed notes # or one collectiongno embed retries transient embedding failures within the same run. If doctor still reports stale, legacy, or mixed vectors after a normal embed, force a full refresh:
gno embed --forcegno embed --force uses the same same-run retry path. If a run still fails, rerun with verbose output so GNO prints sample failures and the retry hint:
gno --verbose embed --forceBefore forcing a backend, see what node-llama-cpp actually detects. This works on Linux, Windows, and macOS and reports the active GPU backend (CUDA, Vulkan, or Metal), available VRAM, and which prebuilt binary is in use:
bunx --bun node-llama-cpp inspect gpu(npx --no node-llama-cpp inspect gpu works too.) If CUDA or Vulkan shows as available here but GNO still runs on CPU, the backend is being selected or initialized incorrectly — see below. If it is not available here, the GPU driver or toolchain is the problem, not GNO.
Symptom: gno index, gno embed, or gno doctor appears stuck while loading node-llama-cpp, often around a Vulkan backend load test.
GNO now defaults to prebuilt local-model backends, times out backend initialization, and retries CPU on Windows when automatic GPU backend selection fails. Run diagnostics first:
gno doctorTo force CPU mode for a constrained Windows machine, run:
GNO_LLAMA_GPU=false gno embed --yesIf CPU embedding still consumes too much memory, keep the adaptive default or set an explicit small context pool. On CPU-only systems, GNO defaults to one context on low-memory Windows machines and at most two contexts elsewhere:
GNO_EMBED_CONTEXTS=1 gno embed --yesAdvanced CPU tuning is available when you want to trade throughput, memory, and per-context parallelism:
GNO_EMBED_CONTEXTS=2 GNO_EMBED_THREADS=4 gno embed --yes
GNO_EMBED_CONTEXT_SIZE=512 gno embed --yesOnly opt into source builds when you intentionally have Visual Studio Build Tools and a working native toolchain:
GNO_LLAMA_BUILD=autoAttempt gno doctorgno doctor output and the exact command you ran.