Browse
→ AI & Models
→ MCP LLM Eval
MCP LLM Eval
Local MCP server that packages LLM evaluation gates as reusable CI/CD primitives. Run datasets against models, score with LLM-as-judge, enforce quality thresholds.
MCP unverified
Integration
| Transport | stdio |
| Auth | api-key |
| Endpoint | uvx mcp-llm-eval |
Use Cases
| 01 | Enforce LLM output quality thresholds in CI/CD pipelines |
| 02 | Run eval datasets against multiple models and compare |
| 03 | Use LLM-as-judge scoring for automated quality checks |
Tags
evaluation llm ci-cd quality-gates testing
Machine-readable: /api/servers.json
· JSON-LD schema embedded in <head>