I built a game where domain experts try to break frontier AI screenshot

What is I built a game where domain experts try to break frontier AI?

This is a competitive platform where domain experts submit questions designed to expose weaknesses in frontier AI models like GPT-4 and Claude. You create questions within your field of expertise, test them against current AI systems, and get paid when your question successfully stumps the model in a way that's verified by the platform. The goal is to systematically document where advanced AI fails in real-world professional contexts. It's useful for researchers, subject matter experts, and organisations wanting to understand AI limitations before deployment, whilst also creating a public record of frontier model weaknesses.

Key Features

Expert question submission

create and submit domain-specific questions designed to test AI model limits

Multi-model testing

questions are evaluated against several frontier AI models

Verification system

submitted failures are checked by the platform to confirm legitimacy

Payment for verified fails

receive compensation when your question successfully breaks a model

Failure documentation

contribute to a growing public record of where AI systems struggle

Freemium access

free tier for question submission with paid rewards for verified results

Pros & Cons

Advantages

  • Direct way to contribute to AI safety and transparency research
  • Earn money by finding genuine model failures in your area of expertise
  • Helps identify real-world gaps before AI is widely deployed in your field
  • Simple submission process without needing technical knowledge
  • Creates publicly documented evidence of AI limitations

Limitations

  • Payment amounts and verification criteria are not clearly specified upfront
  • Limited to written questions; may not capture all types of failures (visual, audio, etc.)
  • Success depends on finding genuinely novel failures that haven't been discovered before

Use Cases

Medical professionals identifying diagnostic scenarios where AI gives incorrect advice

Lawyers finding legal interpretation errors in AI responses

Academics testing AI reasoning in specialist research areas

Engineers discovering safety oversights in technical explanations

Content creators finding factual errors in their industry domains