ZeroGPU
The compute efficient layer for AI inference, routing high-volume tasks to small and nano models to cut cost and latency.
The compute efficient layer for AI inference, routing high-volume tasks to small and nano models to cut cost and latency.

OpenAI-compatible API
Drop-in chat and responses endpoints that work with existing OpenAI client code.
Small and nano model routing
Routes routine tasks to specialised small language models instead of frontier models.
Edge-powered inference
Distributed inference across an edge network with cloud fallback for scale and lower latency.
Cost calculator
An interactive tool to estimate spend and compare ZeroGPU model rates against providers such as GPT-5.4.
Project-level API keys
Per-project keys with usage analytics for tracking consumption.
Task-specific workloads
Built for classification, extraction, summarisation, PII detection, fraud detection, content moderation and query routing.
Monetize Your App SDK
An SDK that lets app developers earn revenue by having their users' idle devices fulfil network inference jobs.
AdTech teams classifying user intent and mapping IAB categories in real time for ad targeting.
Trust and safety teams moderating millions of posts with toxicity detection, NSFW filtering and policy checks.
Compliance and security teams redacting PII and detecting prompt injection or jailbreak attempts at scale.
Fintech teams running fraud detection, transaction risk scoring and trading sentiment analysis.
Document-heavy businesses extracting structured data from invoices, contracts and forms via OCR.
AI agent developers using small models for tool selection, query routing and multi-step reasoning cheaply.