A natural language search engine for your personal notes, transactions and images
- Features
- Demo
- Architecture
- Setup
- Use
- Upgrade
- Troubleshoot
- Miscellaneous
- Performance
- Development
- Credits
- Natural: Advanced natural language understanding using Transformer based ML Models
- Local: Your personal data stays local. All search, indexing is done on your machine*
- Incremental: Incremental search for a fast, search-as-you-type experience
- Pluggable: Modular architecture makes it easy to plug in new data sources, frontends and ML models
- Multiple Sources: Search your Org-mode and Markdown notes, Beancount transactions and Photos
- Multiple Interfaces: Search using a Web Browser, Emacs or the API
Khoj_Incremental_Search_Demo_0.1.5.mp4
- Install Khoj via pip
- Start Khoj app
- Add this readme and khoj.el readme as org-mode for Khoj to index
- Search "Setup editor" on the Web and Emacs. Re-rank the results for better accuracy
- Top result is what we are looking for, the section to Install Khoj.el on Emacs
- The results do not have any words used in the query
- Based on the top result it seems the re-ranking model understands that Emacs is an editor?
- The results incrementally update as the query is entered
- The results are re-ranked, for better accuracy, once user hits enter
pip install khoj-assistant
khoj
- Enable content types and point to files to search in the First Run Screen that pops up on app start
- Click configure and wait. The app will load ML model, generates embeddings and expose the search API
- Khoj via Web
- Open http://localhost:8000/ via desktop interface or directly
- Khoj via Emacs
- Khoj via API
- See the Khoj FastAPI Swagger Docs, ReDocs
pip install --upgrade khoj-assistant
- Symptom: Errors out complaining about Tensors mismatch, null etc
- Mitigation: Disable
image
search using the desktop GUI
- Mitigation: Disable
- Symptom: Errors out with "Killed" in error message in Docker
- Fix: Increase RAM available to Docker Containers in Docker Settings
- Refer: StackOverflow Solution, Configure Resources on Docker for Mac
- The beta chat and search API endpoints use OpenAI API
- It is disabled by default
- To use it add your
openai-api-key
via the app configure screen - Warning: If you use the above beta APIs, your query and top result(s) will be sent to OpenAI for processing
- Semantic search using the bi-encoder is fairly fast at <50 ms
- Reranking using the cross-encoder is slower at <2s on 15 results. Tweak
top_k
to tradeoff speed for accuracy of results - Filters in query (e.g by file, word or date) usually add <20ms to query latency
- Indexing is more strongly impacted by the size of the source data
- Indexing 100K+ line corpus of notes takes about 10 minutes
- Indexing 4000+ images takes about 15 minutes and more than 8Gb of RAM
- Once khoj-ai#36 is implemented, it should only take this long on first run
- Testing done on a Mac M1 and a >100K line corpus of notes
- Search, indexing on a GPU has not been tested yet
git clone https://github.com/debanjum/khoj && cd khoj
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
- Copy the
config/khoj_sample.yml
to~/.khoj/khoj.yml
- Set
input-files
orinput-filter
in each relevantcontent-type
section of~/.khoj/khoj.yml
- Set
input-directories
field inimage
content-type
section
- Set
- Delete
content-type
andprocessor
sub-section(s) irrelevant for your use-case
khoj -vv
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
# To Upgrade To Latest Stable Release
# Maps to the latest tagged version of khoj on master branch
pip install --upgrade khoj-assistant
# To Upgrade To Latest Pre-Release
# Maps to the latest commit on the master branch
pip install --upgrade --pre khoj-assistant
# To Upgrade To Specific Development Release.
# Useful to test, review a PR.
# Note: khoj-assistant is published to test PyPi on creating a PR
pip install -i https://test.pypi.org/simple/ khoj-assistant==0.1.5.dev57166025766
git clone https://github.com/debanjum/khoj && cd khoj
- Required: Update docker-compose.yml to mount your images, (org-mode or markdown) notes and beancount directories
- Optional: Edit application configuration in khoj_docker.yml
docker-compose up -d
Note: The first run will take time. Let it run, it's mostly not hung, just generating embeddings
docker-compose build --pull
- Install Conda [Required]
- Install Exiftool [Optional]
sudo apt -y install libimage-exiftool-perl
git clone https://github.com/debanjum/khoj && cd khoj
conda env create -f config/environment.yml
conda activate khoj
- Copy the
config/khoj_sample.yml
to~/.khoj/khoj.yml
- Set
input-files
orinput-filter
in each relevantcontent-type
section of~/.khoj/khoj.yml
- Set
input-directories
field inimage
content-type
section
- Set
- Delete
content-type
,processor
sub-sections irrelevant for your use-case
python3 -m src.main -vv
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
cd khoj
git pull origin master
conda deactivate khoj
conda env update -f config/environment.yml
conda activate khoj
pytest
- Multi-QA MiniLM Model, All MiniLM Model for Text Search. See SBert Documentation
- OpenAI CLIP Model for Image Search. See SBert Documentation
- Charles Cave for OrgNode Parser
- Org.js to render Org-mode results on the Web interface
- Markdown-it to render Markdown results on the Web interface
- Sven Marnach for PyExifTool