AI Answers Without the Cloud

Get cited, grounded answers from your own documents using local language models. GNO runs everything on your machine - no API keys, no data sharing, no subscriptions.

Key Benefits

  • 100% local processing
  • No API keys required
  • Cited answers from your docs
  • Multiple model presets (slim, balanced, quality)

Example Commands

gno ask 'your question' --answer gno models use balanced gno models pull

Get Started

Ready to try Local LLM Answers?

How It Works

GNO uses local language models via node-llama-cpp to generate answers grounded in your documents.

Ask Questions, Get Cited Answers

gno ask "What was decided about the API design?" --answer

GNO will:

  1. Search your documents using hybrid search
  2. Retrieve relevant chunks
  3. Generate an answer citing specific documents
  4. Return the answer with source references

Model Presets

Choose the right balance of speed and quality:

Preset Speed Quality Use Case
slim Fast Good Default, quick lookups
balanced Medium Good Slightly larger model
quality Slower Best Complex questions
gno models use slim
gno models pull

Remote GPU Server Support

Run on lightweight machines by offloading inference to a GPU server on your network:

# ~/.config/gno/config.yaml
models:
  activePreset: remote-gpu
  presets:
    - id: remote-gpu
      name: Remote GPU Server
      embed: "http://192.168.1.100:8081/v1/embeddings#bge-m3"
      rerank: "http://192.168.1.100:8082/v1/completions#reranker"
      gen: "http://192.168.1.100:8083/v1/chat/completions#qwen3-4b"

Works with any OpenAI-compatible server (llama-server, Ollama, LocalAI, vLLM). No CORS configuration needed—just point to your server.

Configuration guide →

No Cloud Required

Everything runs on your machine (or your network):

  • Models downloaded once, run locally
  • Optional: offload to GPU server on LAN
  • No API keys or subscriptions
  • Works completely offline
  • Your data never leaves your network