What is jj?
Key Features
Performance metrics
Measures success rate and execution time for AI agents on version control tasks
Jujutsu compatibility
Specifically benchmarks tasks using the Jujutsu version control system rather than traditional Git
Multi-model evaluation
Compares performance across different AI coding models and agents
High-precision measurement
Provides detailed accuracy and timing data for task completion
Standardized testing
Offers consistent, reproducible benchmark scenarios for fair model comparison
Public results dashboard
Displays comparative performance data for transparency and accessibility
Pros & Cons
Advantages
- Fills a niche gap by benchmarking AI performance on Jujutsu specifically, an underserved area
- Provides objective, quantifiable metrics for comparing AI coding agents rather than subjective assessments
- Free access to benchmark results enables researchers and developers to make informed tool selection decisions
- High-precision timing and success tracking allows for detailed performance analysis
Limitations
- Limited to Jujutsu version control tasks; results may not generalize to other VCS or coding domains
- Benchmark scope may be narrow compared to broader AI coding evaluation platforms
- Dependent on the breadth and representativeness of included test scenarios
Use Cases
AI model developers evaluating their agents' version control capabilities on Jujutsu workflows
Teams comparing different AI coding assistants to select the best fit for Jujutsu-based repositories
Researchers studying AI agent performance on version control and software engineering tasks
Organizations migrating to Jujutsu seeking data on which AI tools work best with the system
Academic studies on AI coding capabilities in modern version control environments
Pricing
Access to benchmark results, performance metrics for different AI models, success rate and execution time data
Quick Info
- Website
- tabbyml.github.io
- Pricing
- Freemium
- Platforms
- Web
- Categories
- Other
- Launched
- Mar 2026