Browse
→ AI & Models
→ Cerebras MCP
Cerebras MCP
MCP server for the Cerebras Inference API — the world's fastest AI inference engine. Run Llama and other open models at 1,800+ tokens/second for latency-sensitive agentic workloads.
MCP unverified
Integration
| Transport | stdio |
| Auth | api-key |
| Endpoint | npx -y @cerebras/mcp-server |
| Install | npx -y @cerebras/mcp-server |
Use Cases
| 01 | Run Llama 3 and other open models at ultra-low latency |
| 02 | Power latency-sensitive agent loops with 1800+ token/s throughput |
| 03 | Route cost-sensitive workloads to fast open-model inference |
Tags
inference llm llama fast-inference open-models cerebras
Machine-readable: /api/servers.json
· JSON-LD schema embedded in <head>