diff --git a/src/content/llama/a-history-of-llamas.md b/src/content/llama/a-history-of-llamas.md index 380e1fd..ad0ba61 100644 --- a/src/content/llama/a-history-of-llamas.md +++ b/src/content/llama/a-history-of-llamas.md @@ -102,7 +102,7 @@ VRAM, which meant many home computers could now run 4-bit quantized 7B models! Previously, most enthusiasts would have to rent cloud GPUs to run their "local" llamas. Quantizing into GGUF is a very expensive process, so [TheBloke](https://huggingface.co/TheBloke) on Huggingface emerges the defacto -source for pre-quantized llamas. +source for [pre-quantized llamas](../quantization). Based on LLaMa, the open source [llama.cpp](https://github.com/ggerganov/llama.cpp) becomes the leader of local