Running LLM locally

For those entrenched in the loop, this might be old news, but it remains crucial for those still traversing the learning curve.

Q: How can I execute a Language Model (LM) on my local machine?

There’s a valid query as to why one would even consider this. Sure, there are readily available sandbox environments online like Google Colab, but my focus lies in locally running and fine-tuning models to avoid data transfers over the wire.

Typically, I’d resort to writing Python code, but that becomes cumbersome when experimenting with new models without dealing with quantization complexities or hunting for pre-quantized versions. So, let’s sideline coding.

Several options exist; I’ve found two quite satisfactory thus far:

  1. LLM Studio
  2. Ollama

Both integrate smoothly into other processes using Lama Index or Langchain. Recently, I experimented with running the LLM 3 model locally.

LLM Studio struggled to load the model, whereas Ollama succeeded. Ollama’s simplicity is appealing, but LLM Studio shines when you crave customizable settings.

In practice, having both is prudent. Each compensates where the other falls short. For swift operations, Ollama prevails, but for nuanced control, LLM Studio excels.

Why settle for one when both are freely available?

My experience with both on my NVIDIA 3080 has been seamless, marking a gratifying shift from manual model loading via code to a swift command handling the heavy lifting.

Now, the real debate: Lama Index or Langchain? Let the flame wars commence.

You can leverage ollama to create and utilize your language model. While the hugging face transformer library is an option, the easiest method by far is employing ollama. It does require pairing with either lama index or langchain, and installing these libraries is necessary. However, the usage code couldn’t be more straightforward.

# using langchain
from langchain_community.llms import Ollama
llm = Ollama(model="llama3")
llm.invoke("Why is the sky blue?")

# using lama index
from llama_index.llms.ollama import Ollama
llm = Ollama(model="llama3")
llm.complete("Why is the sky blue?")