What is jj?
Key Features
Success rate measurement
tracks whether AI agents complete Jujutsu tasks correctly
Execution time tracking
records how long tasks take to complete, enabling performance comparison
Standardised test suite
provides consistent benchmarks for evaluating different AI models
Jujutsu-specific evaluation
focuses on version control operations rather than general coding
Performance comparison tools
allows side-by-side analysis of different models and their results
Pros & Cons
Advantages
- Fills a gap by benchmarking AI performance on Jujutsu specifically, which has limited evaluation data compared to Git
- Provides both accuracy and speed metrics, giving a fuller picture than success rate alone
- Free access to benchmark results makes it useful for researchers and developers evaluating models
- Helps identify which AI models work best with modern version control systems
Limitations
- Only benchmarks Jujutsu tasks, so results don't apply to Git workflows or other version control systems
- Limited to evaluating AI agents on version control; doesn't cover broader coding tasks
- Benchmark scope may be too narrow for teams primarily using other version control systems
Use Cases
Selecting an AI coding assistant for a team using Jujutsu version control
Evaluating whether a new AI model is ready for production use with Jujutsu workflows
Comparing performance improvements across different versions of an AI coding model
Research into how well current AI agents handle modern version control systems
Testing custom or fine-tuned AI models on Jujutsu-specific tasks before deployment