OpenAI-compatible & local models
Use OpenAI, Ollama, OpenRouter, Together, Groq, Fireworks, vLLM, LM Studio, or anything that speaks the OpenAI Chat Completions API.
Cerevisor's openai-compatible provider type talks to any endpoint that implements OpenAI's Chat Completions API ( /v1/chat/completions , /v1/models ). That's a long list. Common targets Service Base URL Auth OpenAI https://api.openai.com/v1 API key from platform.openai.com Ollama (local) http://localhost:11434/v1 None (any string accepted) OpenRouter https://openrouter.ai/api/v1 API key from openrouter.ai Together https://api.together.xyz/v1 API key from together.ai Groq https://api.groq.com/openai/v1 API key from console.groq.com Fireworks https://api.fireworks.ai/inference/v1 API key from fireworks.ai vLLM (self-hosted) http://<your-vllm-host>:8000/v1 None or custom LM Studio (local) http://localhost:1234/v1 None You can have multiple OpenAI-compatible entries simultaneously: one for OpenAI, one for Ollama, one for OpenRouter, etc. Each lives as its own library entry. Setup Settings → Providers → + Add provider → OpenAI-compatible. Name : friendly label (e.g. "Ollama (local)", "OpenRouter"). Base URL : the endpoint above. API key : paste it, or leave blank if the service doesn't require one. Test connection. Cerevisor calls /v1/models to confirm. Default model : pick from the dropdown populated by the /v1/models response. Save. Ollama specifics Ollama is the smoothest local path. Steps: Install Ollama . Pull a model: ollama pull llama3.1 (or qwen2.5 , mistral , etc.). Make sure Ollama is running ( ollama serve if not already started). Add the provider in Cerevisor as above, with http://localhost:11434/v1 . Cerevisor picks up every model you've pulled. The model dropdown updates each time you open the picker (with a 5-minute cache). Performance with Ollama Local model latency varies enormously by hardware. Cerevisor doesn't try to optimize for local; it sends a standard chat completion request and waits. For multi-agent workflows on a single machine, expect serial behavior even when waves "run in parallel" (the model is the bottleneck, not the orchestration). For best local performance: Pull a model sized for your GPU/RAM. Use Haiku-class roles (researcher, reviewer) when possible; they're more tolerant of small models. Prefer Sequential execution mode in Settings → Workflow (you're not actually getting parallelism benefit locally). OpenRouter specifics OpenRouter proxies dozens of providers behind one API. Great for trying many models without separate accounts. The model list returned from /v1/models is long, use the picker's search to filter. Cost reporting works (OpenRouter returns token usage and a cost estimate). Cerevisor uses their cost when available. OpenAI itself Pretty obvious, but: yes, you can plug OpenAI's hosted API in as an openai-compatible entry. The endpoint is https://api.openai.com/v1 and the auth is your platform.openai.com API key. GPT-4, GPT-4o, o1, o3, GPT-5, and all their variants work. Cerevisor doesn't do anything OpenAI-specific; it treats them as generic Chat Completions endpoints. Cost on local providers Loca