I’ve been hearing more about running local chat bots, but everyone seems to talk about how to tweak one. I’ve yet to see somewhere people explain how to install one, so this is it.
I’m gonna be using llama.cpp for the interface, and open_llama_3b for the model. Also, this tutorial will be limited to linux only.
Prerequisites
-
knowledge of linux, commands
-
no need for a powerful GPU, (I myself am using an onboard graphics chip for this)
-
Have Python 3 installed
Check with
python -V -
Have git lfs installed
Check with
git lfsIf it is not installed, do
git lfs install
Install
First clone the llama.cpp repository
git clone https://github.com/ggerganov/llama.cpp
While that is running, also clone the open_llama_3b repository
# Installs repository, excluding any large files
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/openlm-research/open_llama_3b
# After that finishes, download the large files with this
cd open_llama_3b
git lfs pull
Building
Next you have to build the llama.cpp project
cd llama.cpp
make
This will generate executables in the project root directory (llama.cpp/),
notably: main, quantize
Converting model
In order to use open_llama_3b with llama.cpp, you have to
convert it to the ggml FP16 format
First, create a python virtual environment
cd llama.cpp
python -m venv .venv
Then activate it
source .venv/bin/activate
Install dependencies
python -m pip install -r requirements.txt
Then convert the open_llama_3b model
python convert.py ../path/to/open_llama_3b
After that move the converted file ggml-model-f16.gguf to
llama.cpp/models/3B
mkdir ./models/3B
mv ../path/to/open_llama_3b/ggml-model-f16 ./models/3B/
Quantization
To run the model, you need to prepare it by quantization
./quantize ./models/3B/ggml-model-f16.gguf ./models/3B/ggml-model-q4_0.gguf q4_0
Run the model
Finally, run the model
# Running the model with prompt
./main -m ./models/3B/ggml-model-f16.gguf -n 128 --repeat-penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
Notes
All credits go to the authors of llama.cpp and open_llama_3b. Any mistakes or errors are solely mine.