OpenAI-compatible & local models

Use OpenAI, Ollama, OpenRouter, Together, Groq, Fireworks, vLLM, LM Studio, or anything that speaks the OpenAI Chat Completions API.

Cerevisor's openai-compatible provider type talks to any endpoint that implements OpenAI's Chat Completions API (/v1/chat/completions, /v1/models). That's a long list.

Common targets

Service	Base URL	Auth
OpenAI	`https://api.openai.com/v1`	API key from platform.openai.com
Ollama (local)	`http://localhost:11434/v1`	None (any string accepted)
OpenRouter	`https://openrouter.ai/api/v1`	API key from openrouter.ai
Together	`https://api.together.xyz/v1`	API key from together.ai
Groq	`https://api.groq.com/openai/v1`	API key from console.groq.com
Fireworks	`https://api.fireworks.ai/inference/v1`	API key from fireworks.ai
vLLM (self-hosted)	`http://<your-vllm-host>:8000/v1`	None or custom
LM Studio (local)	`http://localhost:1234/v1`	None

You can have multiple OpenAI-compatible entries simultaneously: one for OpenAI, one for Ollama, one for OpenRouter, etc. Each lives as its own library entry.

Setup

Settings → Providers → + Add provider → OpenAI-compatible.
Name: friendly label (e.g. "Ollama (local)", "OpenRouter").
Base URL: the endpoint above.
API key: paste it, or leave blank if the service doesn't require one.
Test connection. Cerevisor calls /v1/models to confirm.
Default model: pick from the dropdown populated by the /v1/models response.
Save.

Ollama specifics

Ollama is the smoothest local path. Steps:

Install Ollama.
Pull a model: ollama pull llama3.1 (or qwen2.5, mistral, etc.).
Make sure Ollama is running (ollama serve if not already started).
Add the provider in Cerevisor as above, with http://localhost:11434/v1.

Cerevisor picks up every model you've pulled. The model dropdown updates each time you open the picker (with a 5-minute cache).

Performance with Ollama

Local model latency varies enormously by hardware. Cerevisor doesn't try to optimize for local; it sends a standard chat completion request and waits. For multi-agent workflows on a single machine, expect serial behavior even when waves "run in parallel" (the model is the bottleneck, not the orchestration).

For best local performance:

Pull a model sized for your GPU/RAM.
Use Haiku-class roles (researcher, reviewer) when possible; they're more tolerant of small models.
Prefer Sequential execution mode in Settings → Workflow (you're not actually getting parallelism benefit locally).

OpenRouter specifics

OpenRouter proxies dozens of providers behind one API. Great for trying many models without separate accounts. The model list returned from /v1/models is long, use the picker's search to filter.

Cost reporting works (OpenRouter returns token usage and a cost estimate). Cerevisor uses their cost when available.

OpenAI itself

Pretty obvious, but: yes, you can plug OpenAI's hosted API in as an openai-compatible entry. The endpoint is https://api.openai.com/v1 and the auth is your platform.openai.com API key.

GPT-4, GPT-4o, o1, o3, GPT-5, and all their variants work. Cerevisor doesn't do anything OpenAI-specific; it treats them as generic Chat Completions endpoints.

Cost on local providers

Local providers (Ollama, vLLM, LM Studio) report token counts but no cost, Cerevisor reports $0.00 for these runs. The status bar shows (local) instead of a dollar amount.

Common errors

Error	What it means	Fix
ECONNREFUSED localhost:11434	Ollama isn't running.	Start Ollama (`ollama serve` or open the desktop app).
404 /v1/models	Endpoint doesn't implement the OpenAI API.	Double-check the base URL — some providers use `/v1/openai` or another prefix.
400 Bad request: stream not supported	Endpoint doesn't support streaming for this model.	Rare. Switch to a different model on the same endpoint.
Model not found	The model name in the request doesn't match what the endpoint serves.	Refresh the model list and pick from the dropdown.

Back to docs