HaystackDB

Minimal but performant Vector DB

Features

Binary embeddings by default (soon int8 reranking)
JSON filtering for queries
Scalable, distributed architecture for use with multi replica deployments
Durable (WAL), persistent data, mem mapped for fast access in the client

Benchmarks

On a MacBook with an M2, 1024 dimension, binary quantized.

FAISS is using a flat index, so brute force, but it's in memory. Haystack is storing the data on disk, and also brute forces.

TLDR is Haystack is ~10x faster despite being stored on disk.

100,000 Vectors
Haystack — 3.44ms
FAISS    — 29.67ms

500,000 Vectors
Haystack — 11.98ms
FAISS    - 146.50ms

1,000,000 Vectors
Haystack — 22.65ms
FAISS    — 293.91ms

Roadmap

Quickstart Guide
Quality benchmarks (this is in progress)
Int8 reranking
~~Better queries with more than simple equality~~ (this is done now)
Full text search
~~Better insertion performance with batch B+Tree insertion~~ (could probably be further improved, but good for now)
~~Point in time backups/rollback~~
- currently this is destructive (ie you cannot return forward after you go backwards), so a nondestructive version is next on the todo list.
Cursor based pagination
Schema migrations
Vector Kmeans clustering with centroid similarity for improved search perf

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
benches		benches
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HaystackDB

Features

Benchmarks

Roadmap

About

Releases

Packages

Languages

License

acumenix/haystackdb

Folders and files

Latest commit

History

Repository files navigation

HaystackDB

Features

Benchmarks

Roadmap

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages