Build Your Raspberry Pi Voice Assistant Using Gemini API (LLM)

Imagine having a personal AI assistant powered by Google’s Gemini running on a Raspberry Pi. With the Pi listening to your voice and an advanced LLM in the cloud, you can ask questions or control devices and get spoken answers back. In this tutorial we’ll show you how to set up speech-to-text, send your queries to the Gemini API (a large language model), and speak out the responses. By the end, your Pi will answer questions via speaker using the Gemini LLM. This approach combines a compact Raspberry Pi with cloud AI which offers more privacy and control than commercial assistants. As one hobbyist noted, “I tried this on a Pi 4 and it worked great. The Gemini integration responded naturally to my voice commands.”

Requirements & Parts List

First, gather the hardware and software you’ll need. Here’s a quick checklist:

Raspberry Pi board (Pi 4 or Pi 5 with 2-8GB RAM). A Pi 4/5 with 2+ GB RAM and 64-bit OS is recommended (32-bit builds may run into memory limits).
Power supply and microSD card. A 5V/3A USB-C (for Pi 4/5) and a 16+ GB SD card with Raspberry Pi OS (64-bit).
USB microphone. A plug-and-play USB mic or a microphone HAT. Ensures clear speech capture
Speaker or headset. A USB speaker or a 3.5mm speaker for audio output (the Pi 4’s headphone jack can also work).
(Optional) Camera module. A Pi Camera can enable vision-based triggers or image queries. For example, one project used the camera to detect a person to start the conversation.
(Optional) Push-button or LED. For a wake button or status indicator. (Instead of a wake word, you could press a button to start listening.)

On the software side:

Raspberry Pi OS (64-bit) or similar Linux. We’ll use Raspberry Pi OS (formerly Raspbian) 64-bit on a Pi 4/5.
Python 3.10+ and required libraries. We’ll use Python to glue everything. Key packages include google-generativeai (Gemini SDK), SpeechRecognition, gTTS or similar for text-to-speech, and audio libraries (PyAudio, pygame, etc.). For example, one Gemini assistant project installs

pip3 install google-generativeai speechrecognition gtts pygame gpiozero

Google Gemini API key. You’ll need a free API key from Google AI Studio. See Using Gemini API keys for details.
Microphone and speaker drivers. Make sure your Pi recognizes the USB audio devices.
(Optional) VNC/SSH client. For remote setup, install an SSH or VNC client on your PC.

Here we see a Pi 4 with a camera and USB speaker. This example hardware setup will let us both capture video (if we use it for triggers) and play the assistant’s voice answers. For more on Pi hardware projects, check our Weather Station Project guide (which covers GPIO wiring). You’ll notice the microphone (either USB or HAT) must be plugged in and tested (see Setup below). The speaker should be connected and set as the audio output in Raspberry Pi OS.

Raspberry Pi Setup

Flash the OS and update. Use Raspberry Pi Imager or Balena Etcher to write Raspberry Pi OS (64-bit) to the SD card. Enable SSH and Wi-Fi (if headless) in the advanced options. Boot the Pi with monitor/keyboard or headless SSH. Then run:

sudo apt update && sudo apt upgrade -y

This ensures the latest packages. One guide notes that the Pi 4 should run a 64-bit OS for heavy tasks to avoid memory errors.

Configure interfaces and audio. Run sudo raspi-config. Under Interface Options, enable Audio and (if using) the Camera. Reboot if needed. Then check audio devices:

arecord -l      # list recording devices (microphone)
aplay -l        # list playback devices (speaker)

If your mic isn’t listed, try sudo apt install -y alsa-utils and alsamixer to unmute the mic input. Similarly, test the speaker output:

speaker-test -t wav -c 1

You should hear white noise. Adjust volumes with alsamixer. These steps are similar to other Pi voice project. (In one Instructable, they use arecord to record and aplay to play a test file).

Install Python tools. Install Python and pip if not present:

sudo apt install -y python3 python3-pip python3-venv

Then create a virtual environment for isolation:

python3 -m venv ~/assistant-env
source ~/assistant-env/bin/activate

Now install the required libraries with pip. For example:

pip install google-generativeai speechrecognition gtts pygame

(You may need python3-dev or libasound2-dev if PyAudio is used by speech_recognition. If so, run sudo apt install -y portaudio19-dev.)

Set the Gemini API key. Open your ~/.bashrc (or .zshrc) and add:

export GEMINI_API_KEY="<YOUR_API_KEY_HERE>"

Save and run source ~/.bashrc to apply. The Gemini SDK will automatically pick up GEMINI_API_KEY (or GOOGLE_API_KEY) from the environment. You only need to do this once. If you prefer, you can also pass the key directly in code (see Google’s docs for examples), but using an env var is more secure. For details, see the official guide to Gemini API keys.

Optional) Additional setup. If you plan to use voice detection libraries (wake words) or local speech-to-text, you might install extras like pvporcupine or OpenAI’s faster-whisper. Also, install git if you want to clone examples.

With these steps, your Raspberry Pi should be up-to-date and ready. Your microphone and speaker should be working, and Python has the necessary libraries. Next, we’ll implement the voice assistant logic.

Wake Word Detection and Audio Capture

We want the Pi to listen for a wake word or a button press before sending audio to the LLM. This avoids sending every sound to the cloud. A simple approach is to use a library like Porcupine or Snowboy for offline wake-word spotting. (Note: Snowboy is deprecated, but there are forks and alternatives.) These run on-device to detect a trigger phrase. For example, Picovoice’s free tier lets you define “hey Pi” or similar.

In this guide, to keep things simpler, we’ll demonstrate with a push-to-talk button or continuous listen (voice activation). For continuous capture, you can use Python’s speech_recognition library:

import speech_recognition as sr

r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say your command...")
    audio = r.listen(source)
    command = r.recognize_google(audio)  # uses Google Web STT
    print("You said:", command)

This listens until you stop speaking, then uses Google’s speech-to-text (internet) to transcribe. (Alternatively, you could use open-source models like Whisper or Vosk for offline STT.) Be sure to test the microphone input first. One Instructable recommends recording with arecord as a test; we did that aboveinstructables.com.

“Our assistant remains idle until it hears the wake word or a button press,” as an example. You could implement: if "assistant" is in command, proceed to query Gemini. Or simply always send the latest utterance. The key is capturing the user’s speech into text.

Using Google Gemini API

With audio captured as text, the next step is to send it to Gemini and get a reply. Google provides the google-generativeai Python SDK. After installing it, use code like:

import google.generativeai as genai

genai.configure(api_key=os.environ.get('GEMINI_API_KEY'))
model = genai.GenerativeModel("gemini-2.5-pro")  # or "gemini-2.0-flash"
response = model.generate_content("What is the capital of France?")
print(response.text)

The configure call reads your environment variable key. You can choose models like gemini-2.0-flash or the newer gemini-2.5-pro. For simple Q&A, generate_content works well: we pass a prompt string and it returns an object whose .text is the answer. For conversation, you could also use the chat interface (model.start_chat() and chat.send_message()), but single queries suffice for most voice commands.

query = "Hey Gemini, tell me a joke about computers."
answer = model.generate_content(query)
print("Gemini says:", answer.text)

Behind the scenes, this calls the Gemini API. Google’s official docs show similar code samples. For example, generating text:

from google.generativeai import GenerativeModel

genai.configure(api_key="YOUR_KEY")
model = GenerativeModel("gemini-2.5-pro")
response = model.generate_content("The opposite of hot is")
print(response.text)  # prints "cold"

This is how you integrate the LLM. The response can be a full sentence or paragraph. We then take response.text and send it to our TTS engine.

Check out our more Raspberry Pi Project Ideas and their Step-by-step guide.

Coding the Assistant: Speech‑to‑Text, LLM, Text‑to‑Speech

Below is a simplified outline of the Python code that ties everything together. You might put this in assistant.py or similar.

import os
import google.generativeai as genai
import speech_recognition as sr
from gtts import gTTS
import pygame

# Initialize Gemini
genai.configure(api_key=os.environ['GEMINI_API_KEY'])
model = genai.GenerativeModel("gemini-2.5-pro")

# Function: listen and transcribe
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Listening...")
    audio = r.listen(source)             # listen to the mic
try:
    command = r.recognize_google(audio)  # use Google Web STT
    print("You said:", command)
except sr.UnknownValueError:
    print("Could not understand audio")
    command = ""

# Send to Gemini and get response text
if command:
    response = model.generate_content(command)
    answer = response.text
    print("Assistant:", answer)

    # Convert text to speech
    tts = gTTS(answer)
    tts.save("reply.mp3")
    pygame.mixer.init()
    pygame.mixer.music.load("reply.mp3")
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy():
        pass  # wait until done speaking

This script does the following:

Speech-to-text: Uses speech_recognition (with the Google STT engine) to convert the mic audio into a Python string. In practice, recognize_google() requires internet, but you can replace it with an offline model if desired.
Call Gemini: We call model.generate_content(command) to get the AI’s reply. No explicit “system prompt” is shown here, but Gemini typically assumes the role of a helpful assistant. You can prepend instructions if needed.
Text-to-speech: We use gTTS (Google Text-to-Speech) to create an MP3 from the reply text, then play it with pygame. You could also use local engines like espeak or pyttsx3. Many Gemini projects use gtts because it sounds quite natural.

Notice we initialized pygame.mixer to play audio. The loop while ... waits until speaking is finished. You could replace this with any method of playing an audio file on the Pi (e.g. calling mpg123 reply.mp3).

This code is a basic template. In a real assistant, you’d wrap the listening and response in a loop, add error handling, and maybe a wake-word loop. But the above shows the core idea. For reference, one GitHub example of a Gemini voice assistant on Pi lists similar imports and setup.

View all stories

This Budget Smartphone is breaking all the rules!

Poco M7 Plus 5G: Big Battery, Big Display!

Google’s Brand-New Smartphones Are Launching This Month!

Nothing Headphone (1) Launched in India!

This New Smartphone has FAN in build!

This New Launched Smartphone is Breaking All Rules

Today’s Great Deals on Earbuds – upto 80% Off!

These Smartwatches has upto 80% Off!

Best Tech Tools for Students in 2025!

Affordable Coding Laptops Under $500

Nothing Phone 3 vs OnePlus 13 – What You MUST Know Before Buying!

5 Best Books to Learn Python Before College!

Best Laptops with Student Discounts

Poco F7 vs iQoo Neo 10: Brutally Honest Comparison

Best Smartwatches under $100

Deployment: Auto-start and Enclosures

Once your voice assistant code is working, you’ll want it to run automatically on boot and to package it neatly. Here are some tips:

Auto-start on boot: Create a systemd service or use crontab. For example, save your script as /home/pi/assistant.py and add a cron job:

crontab -e
@reboot /usr/bin/python3 /home/pi/assistant.py &

This runs the Python script on startup. Alternatively, write a *.service file in /etc/systemd/system and enable it with systemctl enable.
Case/Enclosure: Put the Pi and audio gear in a proper case. A simple plastic case with openings for USB and camera is fine. Make sure the microphone isn’t obstructed and the speaker’s sound can exit. For example, a “RasTech” case or any Pi box works. Optionally mount the button on the case. For aesthetics, attach LED indicators (from /usr/share/icons or via gpiozero) to show when listening or speaking.
Stability: Use a good power supply (especially if using camera and USB audio). A shaky power can cause reboots under load. If using a Pi 4/5, enable memory split to allocate enough GPU memory for audio. For instance, you might give 16-32MB to the GPU (in raspi-config under Performance).
Networking: Ensure the Pi is online at boot (if using Gemini or any cloud API). If using Wi-Fi, configure the wpa_supplicant.conf or prefer Ethernet.

By following these steps, the assistant script will launch on startup and run headless. You can then simply speak to it after power-on. For a polished product, attach the Pi and speaker to a dedicated enclosure and label the button (if used) with “Talk” or the wake phrase.

FAQs

Why isn’t my Raspberry Pi detecting the USB microphone?

Check with arecord -l. If no input shows, ensure the mic is plugged in properly, install alsa-utils, and unmute it in alsamixer. Also verify power supply is sufficient—underpowered Pi can disable USB devices.

Gemini API gives an error: “API key invalid or missing.”

Ensure your API key is active in Google AI Studio. Store it as an environment variable: export GEMINI_API_KEY=your_key and restart the shell or script.

The assistant plays no sound after getting the Gemini response.

Confirm the speaker is working via aplay test.wav. Check if pygame.mixer.init() ran successfully, and ensure reply.mp3 exists. Volume may need adjustment via alsamixer.

Why is speech recognition not working or returns blank?

Poor mic quality, background noise, or wrong device index may cause it. Test using arecord. Also check recognize_google() quota; replace with Vosk or Whisper for offline STT if needed.

How do I make the assistant start automatically on boot?

Use crontab -e and add @reboot /usr/bin/python3 /home/pi/assistant.py &. Alternatively, create a systemd service with your environment variables for stable background execution.