What is jj?
Key Features
Success rate measurement
tracks whether AI agents correctly complete Jujutsu version control tasks
Execution time tracking
records how long each task takes to complete with high precision
Multi-model comparison
evaluate performance across different AI coding models
Standardised benchmark tasks
consistent test scenarios based on real Jujutsu workflows
Public results repository
access performance data and historical comparisons online
Pros & Cons
Advantages
- Provides objective, measurable data on AI model performance rather than anecdotal results
- Specifically designed for Jujutsu, giving accurate assessment of version control task handling
- Free access to benchmark results allows researchers and developers to make informed decisions
- Clear metrics (success rate and execution time) make it easy to compare models directly
Limitations
- Limited to Jujutsu version control tasks; results may not transfer to other version control systems
- Requires understanding of Jujutsu to interpret results meaningfully
- Benchmark scope may not cover all real-world version control scenarios your team encounters
Use Cases
Evaluating which AI coding assistant works best with Jujutsu-based development workflows
Comparing performance improvements across different versions of an AI model
Assessing whether a newly trained AI agent meets performance thresholds for production use
Research into how AI models handle version control operations and workflow automation
Making purchasing or adoption decisions between competing AI coding tools