---
title: "OpenAI-compatible & local models"
description: "Use OpenAI, Ollama, OpenRouter, Together, Groq, Fireworks, vLLM, LM Studio, or anything that speaks the OpenAI Chat Completions API."
slug: guides/providers/openai-compatible-and-local
section: guides
subsection: providers
canonical_url: https://cerevisor.com/docs/guides/providers/openai-compatible-and-local
last_verified: 2026-05-18
last_verified_version: "1.2.0"
updated_at: 2026-05-18T15:08:18.053416+00:00
---

Cerevisor's `openai-compatible` provider type talks to **any** endpoint that implements OpenAI's Chat Completions API (`/v1/chat/completions`, `/v1/models`). That's a long list.

## Common targets

| Service | Base URL | Auth |
|---|---|---|
| **OpenAI** | `https://api.openai.com/v1` | API key from platform.openai.com |
| **Ollama** (local) | `http://localhost:11434/v1` | None (any string accepted) |
| **OpenRouter** | `https://openrouter.ai/api/v1` | API key from openrouter.ai |
| **Together** | `https://api.together.xyz/v1` | API key from together.ai |
| **Groq** | `https://api.groq.com/openai/v1` | API key from console.groq.com |
| **Fireworks** | `https://api.fireworks.ai/inference/v1` | API key from fireworks.ai |
| **vLLM** (self-hosted) | `http://<your-vllm-host>:8000/v1` | None or custom |
| **LM Studio** (local) | `http://localhost:1234/v1` | None |

You can have **multiple** OpenAI-compatible entries simultaneously: one for OpenAI, one for Ollama, one for OpenRouter, etc. Each lives as its own library entry.

## Setup

1. **Settings → Providers → + Add provider → OpenAI-compatible.**
2. **Name**: friendly label (e.g. "Ollama (local)", "OpenRouter").
3. **Base URL**: the endpoint above.
4. **API key**: paste it, or leave blank if the service doesn't require one.
5. **Test connection.** Cerevisor calls `/v1/models` to confirm.
6. **Default model**: pick from the dropdown populated by the `/v1/models` response.
7. **Save.**

## Ollama specifics

Ollama is the smoothest local path. Steps:

1. Install [Ollama](https://ollama.com).
2. Pull a model: `ollama pull llama3.1` (or `qwen2.5`, `mistral`, etc.).
3. Make sure Ollama is running (`ollama serve` if not already started).
4. Add the provider in Cerevisor as above, with `http://localhost:11434/v1`.

Cerevisor picks up every model you've pulled. The model dropdown updates each time you open the picker (with a 5-minute cache).

### Performance with Ollama

Local model latency varies enormously by hardware. Cerevisor doesn't try to optimize for local; it sends a standard chat completion request and waits. For multi-agent workflows on a single machine, expect serial behavior even when waves "run in parallel" (the model is the bottleneck, not the orchestration).

For best local performance:

- Pull a model sized for your GPU/RAM.
- Use Haiku-class roles (researcher, reviewer) when possible; they're more tolerant of small models.
- Prefer **Sequential** execution mode in Settings → Workflow (you're not actually getting parallelism benefit locally).

## OpenRouter specifics

OpenRouter proxies dozens of providers behind one API. Great for trying many models without separate accounts. The model list returned from `/v1/models` is long, use the picker's search to filter.

Cost reporting works (OpenRouter returns token usage and a cost estimate). Cerevisor uses their cost when available.

## OpenAI itself

Pretty obvious, but: yes, you can plug OpenAI's hosted API in as an openai-compatible entry. The endpoint is `https://api.openai.com/v1` and the auth is your platform.openai.com API key.

GPT-4, GPT-4o, o1, o3, GPT-5, and all their variants work. Cerevisor doesn't do anything OpenAI-specific; it treats them as generic Chat Completions endpoints.

## Cost on local providers

Local providers (Ollama, vLLM, LM Studio) report token counts but no cost, Cerevisor reports `$0.00` for these runs. The status bar shows `(local)` instead of a dollar amount.

## Common errors

| Error | What it means | Fix |
|---|---|---|
| **ECONNREFUSED localhost:11434** | Ollama isn't running. | Start Ollama (`ollama serve` or open the desktop app). |
| **404 /v1/models** | Endpoint doesn't implement the OpenAI API. | Double-check the base URL — some providers use `/v1/openai` or another prefix. |
| **400 Bad request: stream not supported** | Endpoint doesn't support streaming for this model. | Rare. Switch to a different model on the same endpoint. |
| **Model not found** | The model name in the request doesn't match what the endpoint serves. | Refresh the model list and pick from the dropdown. |
