llamacpp it not that hard, the most complicated part IMO is installing and configuring docker to use the GPU, specially the docker compose side of things can be confusing and there are differences to use AMD or NVIDIA, if you are running a linux box, but that part is the same for ollama.
Llamacpp does have it's own webUI altough you can use files on it for context yet, you need something like open-webui.
Windows IDK, they do have a Windows executable.
Rudi Servo
2025-02-13 16:05:17 +0000 UTC
I use llamacpp on a 7900XTX, a 7B coding model, a 14B chat model and nomic 1.5 embedding, usualy Q4_L or Q4_M and it's pushing the 24GB Vram to the max, although I do have 3 models running.
Getting this to work with VsCode or neovim is the challenge.
TabbyML is a nice project, you can use it with llamacpp, ollama, lmstudio, or just let it run on the GPU (it has llamacpp builtin, but a bug with embeddings and issues with ROCm pushed me to try continue.dev
IMO Tabby is better to an extend, the integration with VsCode and neovim is out of the box, indexing and context really works well.
Continue.dev does not have a neovim plugin and I am still searching for an alternative to tabby.
Rudi Servo
2025-02-13 15:58:05 +0000 UTC
I'm usually not here for videos this early so it may be normal, but I am not seeing the links mentioned in the video.