This is how to do simple semantic search on low end hardware using free and open source software. We’ll be using clip.cpp’s image-search and CLIP models to do that.

Before we begin, a quick explaination:

  • CLIP is a neural network by OpenAI which maps text to images
  • ggml is a file format for storing neural networks
  • clip.cpp is a dependency-free implementaion of CLIP

Now you know that, Let’s go!

First, we’ll clone and build the clip.cpp repository

git clone https://github.com/monatis/clip.cpp.git
cd clip.cpp

# Build
mkdir build
cmake -DCLIP_BUILD_IMAGE_SEARCH=ON ..
make

Next, we’ll download the clip-vit-base-patch32_ggml-model-f16.gguf model from https://huggingface.co/mys/ggml_clip-vit-base-patch32/tree/main into clip.cpp/build/models

mkdir models
cd models
wget https://huggingface.co/mys/ggml_clip-vit-base-patch32/resolve/main/clip-vit-base-patch32_ggml-model-f16.gguf?download=true
cd ..

Note that this model repository is for use with clip.cpp, as per the README

Then we will make a directory for our images for the AI to search in clip.cpp/build/images

mkdir images

You will then need to fill this directory with your images

After that, we’ll build a database from those images

./image-search-build -m ./models/clip-vit-base-patch32_ggml-model-f16.gguf ./images/

Finally we will test it out by searching for an image

./image-search apple

Note that the lower score the better

References

  1. clip.cpp image search https://github.com/monatis/clip.cpp/tree/main/examples/image-search