A real
220 upvotestime strategy game that AI agents can play
Explore the best ai model benchmarking & evaluation AI tools. We've curated 26 tools to help you find the right solution.
The highest rated ai model benchmarking & evaluation tools
Let 200 models debate your question
March Madness Model Evals
LLM Benchmark for Multi-Step Verifiable Reasoning
The pytest for LLMs with 22 built-in assertions
I built a game where domain experts try to break frontier AI
I built a game where domain experts try to break frontier AI
world LLM performance on Apple Silicon
A daily battle royale between frontier LLMs
Competitive Snake Game for LLMs
Detect and remediate hallucinations in any LLM application.
Interactive visual guide based on Karpathy's lecture
Detect errors, biases, and privacy issues, track LLM performance, receive alerts, and analyze root-causes in real-time.
A foundational, 65-billion-parameter large language model by Meta. #opensource
The next generation of Meta's open source large language model. #opensource
Detect and remediate hallucinations in any LLM application.
Microsoft's recent blog post explores the unexpected capabilities of the Phi-2 small language models. Despite their compact size, these models demonstrate impressive performance in natural language pr
wiki LLM-compiled knowledge bases with multi-agent research v0.0.20
Compare answers from Grok 2, GPT-4, Claude 3.5, Gemini, Gemini 1.5 Flash, Meta Llama 3.1 405B
Plain-English mental model for LLM apps, tools and agents
Generate datasets, fine-tune LLMs, and evaluate models effortlessly.
Detect and remediate hallucinations in any LLM application.
Discover Athina AI pricing, reviews, and alternatives. Updated for April 2026.
Discover EduLLM pricing, reviews, and alternatives. Updated for April 2026.