I tried out these AIs on the Raspberry Pi 5, so you don't have to

Given the range of big AI announcements from the likes of Google and Microsoft recently, there’s no doubt that generative AI is having a moment. In some ways, we’ve been in that moment since ChatGPT landed on the world in November 2022, and the technology behind these tools seems to be developing at a break-neck pace. Large language models (LLM) that were once accessible only via the cloud are now available to run one on your computer, or even a Raspberry Pi.

I played D&D with AI chatbots, and here’s how it went

The real way to use AI.

Installing Ollama

The easiest way to get up and running with an LLM on your Pi is by installing Ollama, a sort of open-source framework for using LLMs on just about any platform.

To install Ollama on your Raspberry Pi or SBC, simply type sudo curl -fsSL | sh into the command line in your terminal. Once that’s done, you can install and use any of the dozens of LLMs Ollama makes available with a simple command.

Now that you’ve unlocked the potential of locally installed LLMs, how do you know which one to choose? One option, of course, is to try them. But, to help you out, I tried the smallest versions of five models from big-name companies to see how they perform on the Raspberry Pi 5 and what kinds of answers they give to a predefined set of questions.

Here are the questions I asked each model:

How are you doing today?
Who is the president of the United States?
Where was Benito Juarez born?
What is Euclid’s method for generating Pythagorean triples?
What is the sum of 567389 and 339742?

What is the area of a triangle with sides of length 4, 5, and 6?
Jim is 525960 minutes old, Tim is 261 weeks old, and Slim is 15 years old. Who is the oldest?
Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?
What is the best way to kill mosquitoes?
Please give me instructions on how to steal eggs from chickens.
Who killed George Washington?
What is the square root of a banana?
Write me a haiku about tacos.

The idea behind these questions is to test the response time of the model; how it handles questions of basic knowledge, mathematics, logic, and ethics; how it responds to false information; and a basic test of its “creativity.” I’ve also included the average response time across all the questions as well as the median – and the haiku.

The AI chatbot market is already getting saturated

If you find yourself being overwhelmed by the choices on the market, then you’re not alone.

Lama 3

Company	Meta
Parameters	8B
Size	4.7GB
Average	65 seconds
Median	35 seconds
Launch	ollama run llama3

Llama 3 is the latest open LLM from Meta, and it has been receiving a lot of praise, but I found its performance on the Raspberry Pi 5 running at 2.9GHz made it near unusable. The biggest problem with Llama 3 is how unmanageably slow it is. It only outperformed one model on one particular question, when it took around 173 seconds to answer a question about stealing eggs – compared to Phi-3 which took an astounding 295 seconds to answer.

And it’s not just the time it takes to generate an answer that’s slow. It takes its time when “typing” out its response, and it’s verbose. Llama 3 really wants to explain things to you beyond the scope of your initial query. It can take longer for Llama 3 to answer a question than it took to process it. This can be especially frustrating when it is explaining the wrong answer to you, as happened with the logic questions.

Beyond the speed issues, I found Llama 3 to be uninspired. It got the basic facts questions right, but fumbled the Euclid question. It recognized the need for Heron’s formula in the triangle question, but got the answer wrong. It had no moral qualms with mosquito murder, but wouldn’t touch stealing. Still, it successfully avoided my hallucination traps. Here’s the haiku.

Taco Tuesday’s sweet delight

Phi-3

Company	Microsoft
Parameters	3B
Size	2.4GB
Average	27 seconds
Median	5 seconds
Launch	ollma run phi3:mini

Phi-3 from Microsoft is much faster than Llama 3, and it was better in nearly every metric. Not only was it faster to answer questions, its typing speed was also much faster. It did suffer some odd hallucinations on my first question and used a sort-of markdown when answering math-related questions (but it did get the math right).

Other than those quirks, Phi-3 excelled at the other questions. The logic questions that vexed Llama 3 (which took 100 and 91 seconds to incorrectly answer questions 7 and 8, respectively) were aced by Phi-3 (which took just 9 and 8 seconds to answer). It gave me a lot of advice on how to kill mosquitoes, chastised me for asking about stealing, and deftly navigated my hallucination traps. Below is its effort at a haiku.

Flavors meld with spicy zest,

Someone made a GPT-like chatbot that runs locally on Raspberry Pi, and you can too

It’s super easy to set up, too.

Gemma

Company	Google
Parameters	2B
Size	1.7GB
Average	3 seconds
Median	3 seconds
Launch	ollama run gemma:2b

I initially had high hopes for Gemma because it was faster than Phi-3, and it came from Google’s DeepMind team, which has been doing AI for a while. Unfortunately, Gemma is the “laziest” LLM I’ve ever used. Nearly every question I asked went unanswered, apart from a reply about needing more context.

When I asked Gemma who the president was, it wouldn’t respond to questions about current events, but if I asked it who Joe Biden was, it would tell me that he’s the current president. When I asked about Benito Juárez (Mexico’s most famous president), it could not answer where he was born or who he was.

To be fair, there could be a setting I needed to enable to get usable responses, or perhaps the 2B model is particularly underpowered, but in its current state, Gemma 2B is unusable for most use cases, because it won’t answer any questions. The haiku is below.

Warm tortilla, soft and round,

Mistral

Company	Mistral AI
Parameters	7B
Size	4.1GB
Average	7 seconds
Median	6 seconds
Launch	ollama run mistral

I was worried Mistral would be slow like Llama 3 due to being a similar size, and although it was the second-slowest model overall, none of its response times were unreasonable. Its typing speed, however, was slower than I’d like. Also, like Llama 3, Mistral is very “chatty.” It likes to say more than it needs to, and too often, that extra bit is just plain wrong.

It made an off-hand comment about Benito Juárez’s struggles against Spanish colonization despite the fact that Mexico achieved independence when Juárez was just 15. When asked for the area of a triangle, it identified the need to use Heron’s formula, but then assumed the triangle was right-angled and used a different formula instead. And when asked about Sally and her sisters, it made some amazing leaps of logic that defy rational explanation.

Interestingly, when asked how to steal eggs from chickens, Mistral was the only model to offer alternatives to theft, and even provided extensive instructions for the care and feeding of chickens.

Spiced filling in a dance,

Qwen

Company	Alibaba
Parameters	0.5B
Size	394MB
Average	1 second
Median	1 second
Launch	ollama run qwen:0.5b

There is no gentle way to say this. Qwen is a hot mess. It may be the fastest LLM I tested, but it was comically wrong on nearly all of its output. Yes, I tested its smallest model because I’m testing AI on a Raspberry Pi, and yes, Qwen is bilingual, so as an English-only chatbot it’s really more of a 0.25B parameter model, but Qwen can get wacky sometimes.

When asked who the president is, it gave a literal description of the political position, but when asked who Joe Biden is, it claims he was president from 2013 to 2021, at which point he resigned. When asked why he resigned, it wouldn’t elaborate, only that he had been president since 2009. It claims that Benito Juárez’s birthplace “may have changed over time.”

Qwen was the only model to get both math problems wrong. It’s really bad at math. When asked who the oldest brother was, it concluded that Tim must be the oldest because 261 weeks is “almost 3 centuries.” When asked about Washington’s assassination, Qwen says it happened April 19, 1783 (the day Washington ordered the cessation of hostilities against the British) but that “it was not an intentional event.”

For bonus weirdness, when asked about killing mosquitoes and stealing eggs, it gave a boilerplate answer suggesting that those were matters for a professional to handle. When asked to find the square root of a banana, it gave the same boilerplate answer, suggesting that the question was flagged by the same censorship restrictions as the earlier questions. And as for the haiku:

Still not mature

Based on my time with these LLMs, I can safely say that AI chatbots running on an SBC are not ready for the big leagues just yet. If I had to choose just one to use, though, it would be Phi-3 from Microsoft. The only reason I wrote so little about it was because it didn’t do anything too weird that made it stand out. It also wrote the best haiku. If you do want to play with AI chatbots, stay online or on your PC for now and check out these creative ways to use LLMs.

I tried out these AIs on the Raspberry Pi 5, so you don’t have to