Back to guide

How to Use Ollama with Odysseus AI

Last updated: June 3, 2026

Odysseus is not an AI model.

It's a workspace and interface. Like a browser needs websites, Odysseus needs a model backend to do anything useful. This page explains your options.

Model Backends

Odysseus supports multiple inference backends. Pick one based on your hardware and priorities.

Ollama

Recommended for beginners

Easy local model serving. Install, pull a model, connect to Odysseus. Handles quantization and GPU offloading automatically.

vLLM

High-performance inference for NVIDIA GPUs. Best throughput for serving multiple users. Production-grade.

llama.cpp

CPU-optimized inference. Works without a GPU. Slower but runs on almost anything, including Raspberry Pi.

OpenRouter

No local hardware needed

Cloud API aggregator. Access 100+ models (Claude, GPT-4, Gemini, open-source) without local hardware. Pay per token.

OpenAI API

Use GPT-4o and other OpenAI models directly. Requires an API key.

Model Cookbook

Odysseus includes a built-in Model Cookbook with 270+ models. The Hardware Scanner detects your GPU, RAM, and storage, then recommends models that will actually run on your system. One-click download and serve, no terminal commands needed.

Find it in the Odysseus UI under Settings or the model selector.

Connecting Ollama to Odysseus

Step 1. Install Ollama

curl -fsSL https://ollama.ai/install.sh | sh

macOS/Windows: Download from ollama.com

Step 2. Pull a model

ollama pull llama3.2

Takes a few minutes depending on model size and your connection.

Step 3. Make Ollama accessible to Docker

If Odysseus runs in Docker, Ollama needs to listen on all interfaces:

OLLAMA_HOST=0.0.0.0 ollama serve

Skip this if both Odysseus and Ollama run natively (not in Docker).

Step 4. Add Ollama in Odysseus settings

Open Odysseus, go to Settings, and add a new model provider with the Ollama endpoint:

Docker Desktop (macOS/Windows):http://host.docker.internal:11434
Docker on Linux:http://<host-ip>:11434
Native (no Docker):http://localhost:11434

Recommended Models by Hardware

8GB VRAM

Llama 3.2 3BPhi-3 MiniGemma 2 2B

Good for basic chat. Expect 10-20 tokens/sec.

16GB VRAM

Llama 3.1 8BMistral 7BCodeLlama 13B (Q4)

Comfortable for daily use and code assistance.

24GB+ VRAM

Llama 3.1 70B (Q4)Mixtral 8x7BDeepSeek V2

Full capability with large quantized models.

No GPU

llama.cpp (small models)OpenRouter (cloud)

CPU inference is slow. Cloud APIs are the practical option.

See hardware requirements for a full breakdown with GPU model examples.

Using OpenRouter (Cloud Models)

OpenRouter is the easiest way to use Odysseus without local hardware. It aggregates 100+ models from multiple providers behind one API key.

Step 1. Create an OpenRouter account

Sign up at openrouter.ai and generate an API key.

Step 2. Add to Odysseus

In Odysseus Settings, add OpenRouter as a provider. Paste your API key. You'll get access to Claude, GPT-4, Gemini, Llama, Mistral, and many more.

Step 3. Pick a model and chat

Select any model from the model picker. Pricing is per-token and varies by model. Many open-source models have free tiers.

Model support changes frequently. Check the official GitHub repository for the latest supported backends and Cookbook updates.

Related Guides