search
Semantic file search using FastEmbed and sqlite-vec
Installation
This section is for people who just want to use search. For developers, see the development section below.
uv
Run uv tool install git+https://git.unnamed.website/search/
. For GPU support, run uv tool install --overrides <(echo "onnxruntime; sys_platform == 'never'") git+https://git.unnamed.website/search/[GPU_VENDOR]
using one of intel
, nvidia
, or amd
. You may need to install drivers for your OS such as intel-compute-runtime
and if dependency resolution fails, add the flag -p 3.12
or some other Python version. For AMD, add the flag -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.1/
. Python packaging is such a gross mess.
pipx
Run pipx install git+https://git.unnamed.website/search/
. This doesn't come with GPU support since that requires some uv black magic.
Usage
This program uses a client-server architecture to watch directories with inotify and keep the model in memory so the client doesn't have to wait several seconds to load the model. It uses file inodes and modification times to avoid unnecessary re-indexing.
Run search-server DIRS_TO_INDEX
to start the server. Make sure you don't include nested directories or weird stuff will happen. The server currently only indexes images although more modalities may be supported in the future. There are probably some weird race condition bugs if you modify a lot of files at the same time.
Then run search-client SEARCH_TEXT NUM_RESULTS
to get a list of the most similar files, or search for a path instead of text for a file to similar file search. You can pass the list of results to an image viewer such as Gwenview to view image results, although note that Gwenview doesn't preserve the order of the images. Alternatively, add a third parameter to search-client
and it will create a temporary directory containing symlinks to the search results and return the directory name.
Check out TIDY (for Android) and rclip for similar projects, although this one is probably fastest!
Development
This project uses uv. Clone this repo and run uv sync --extra GPU_VENDOR
. Use uv run search-server
and uv run search-client
to run your cloned versions of the code.
If you don't like uv, you can simply do pip install -e .
and run the code using search-server
and search-client
, although this won't use the GPU.
TODO
- Come up with a less generic name
- Test portability
- GUI
- Investigate race condition bugs and stress test this more
- Detect if
DIRS_TO_INDEX
contains nested dirs - Unload the image embedding model to save RAM unless doing a file to file search
- Write a systemd
.service
and.socket
for running the server with socket activation - Parallelize indexing? Not sure if this will help. I might need to batch embedding queries instead, but that sounds like a huge pain to implement.
- Fix the Python packaging to be less janky
- Weird bugs if an image file gets modified to no longer be an image and vice versa?
- Inodes are only unique in a single filesystem, so indexing dirs on multiple FSes won't work currently