Skip to content

acumenix/haystackdb

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HaystackDB

Minimal but performant Vector DB

Features

  • Binary embeddings by default (soon int8 reranking)
  • JSON filtering for queries
  • Scalable, distributed architecture for use with multi replica deployments
  • Durable (WAL), persistent data, mem mapped for fast access in the client

Benchmarks

On a MacBook with an M2, 1024 dimension, binary quantized.

FAISS is using a flat index, so brute force, but it's in memory. Haystack is storing the data on disk, and also brute forces.

TLDR is Haystack is ~10x faster despite being stored on disk.

100,000 Vectors
Haystack — 3.44ms
FAISS    — 29.67ms

500,000 Vectors
Haystack — 11.98ms
FAISS    - 146.50ms

1,000,000 Vectors
Haystack — 22.65ms
FAISS    — 293.91ms

Roadmap

  • Quickstart Guide
  • Quality benchmarks (this is in progress)
  • Int8 reranking
  • Better queries with more than simple equality (this is done now)
  • Full text search
  • Better insertion performance with batch B+Tree insertion (could probably be further improved, but good for now)
  • Point in time backups/rollback
    • currently this is destructive (ie you cannot return forward after you go backwards), so a nondestructive version is next on the todo list.
  • Cursor based pagination
  • Schema migrations
  • Vector Kmeans clustering with centroid similarity for improved search perf

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 99.7%
  • Dockerfile 0.3%