A real
time strategy game that AI agents can play
Explore the best ai model benchmarking & evaluation AI tools. We've curated 28 tools to help you find the right solution.
time strategy game that AI agents can play
171 LLMs from Transformer (2017) to GPT-5.3 (2026)
171 LLMs from Transformer (2017) to GPT-5.3 (2026)
Let 200 models debate your question
March Madness Model Evals
LLM Benchmark for Multi-Step Verifiable Reasoning
The pytest for LLMs with 22 built-in assertions
I built a game where domain experts try to break frontier AI
I built a game where domain experts try to break frontier AI
world LLM performance on Apple Silicon
A daily battle royale between frontier LLMs
Competitive Snake Game for LLMs
Compare answers from Grok 2, GPT-4, Claude 3.5, Gemini, Gemini 1.5 Flash, Meta Llama 3.1 405B
Plain-English mental model for LLM apps, tools and agents
Detect and remediate hallucinations in any LLM application.
Generate datasets, fine-tune LLMs, and evaluate models effortlessly.
Detect and remediate hallucinations in any LLM application.
Detect and remediate hallucinations in any LLM application.
Interactive visual guide based on Karpathy's lecture
10× LLM offers a cutting-edge approach to processing natural language tasks at an unprecedented speed. With the promise of "LLM at the speed of thought," ...
Detect errors, biases, and privacy issues, track LLM performance, receive alerts, and analyze root-causes in real-time.
LLMs battle it out trading futures
Microsoft's recent blog post explores the unexpected capabilities of the Phi-2 small language models. Despite their compact size, these models demonstrate impressive performance in natural language pr
wiki LLM-compiled knowledge bases with multi-agent research v0.0.20
A foundational, 65-billion-parameter large language model by Meta. #opensource
The next generation of Meta's open source large language model. #opensource
Discover Athina AI pricing, reviews, and alternatives. Updated for April 2026.
Discover EduLLM pricing, reviews, and alternatives. Updated for April 2026.