What it gives you
- Markdown with frontmatter extraction
- PDF text extraction
- Office documents (DOCX, XLSX, PPTX)
- Plain text files
- Automatic structural first-pass chunking for ts/js/python/go/rust
- Automatic language detection across 30+ languages
- Content deduplication via mirror hashing
- Incremental indexing with SHA-256 change detection