Oxen.ai logo

Oxen.ai

Open-source data version control for multi-modal AI datasets.

Oxen.ai screenshot

What is Oxen.ai?

Oxen.ai is an open-source platform for version controlling AI datasets, similar to how Git manages code. It tracks changes across multi-modal data (images, audio, video, text, and tabular formats) and allows teams to collaborate on dataset creation and iteration. The tool handles large-scale datasets, from millions of images to billions of rows in CSV files, making it useful for machine learning teams that need to manage, reproduce, and share their data reliably. Oxen.ai provides visibility into what changed in your datasets and when, which matters for reproducibility and debugging model performance issues.

Key features

Data version control

Track and manage changes to datasets over time with commit history, branching, and rollback capabilities

Multi-modal support

Handle images, audio, video, text, and tabular data in a single version control system

Scalability

Process large datasets including millions of images and billions of rows without performance degradation

Command-line tools

Git-like CLI interface for efficient dataset management from the terminal

Collaboration features

Share datasets publicly or privately and work with multiple team members simultaneously

Integration with ML workflows

Connect with existing machine learning pipelines and training infrastructure

Pros & cons

Advantages

  • Open-source with an active community, reducing vendor lock-in concerns
  • Handles genuinely large datasets that traditional version control systems struggle with
  • Familiar Git-like workflow makes adoption easier for engineers already using version control
  • Supports both public sharing and private project management within teams

Limitations

  • Requires learning a new tool and workflow, even though it borrows from Git principles
  • Community and ecosystem may be smaller compared to established data management solutions
  • Pricing model is paid, so costs could accumulate for large-scale enterprise usage

Use cases

Computer vision teams tracking iterations of image datasets across model experiments

Audio and speech processing projects managing thousands of hours of recordings with annotations

Collaborative machine learning projects where multiple teams need to work on the same datasets

Research organisations sharing datasets publicly whilst maintaining version history

Data quality improvement workflows where you need to track how datasets evolve over time

Ready to try Oxen.ai?

Pricing

Open Source

Free

Self-hosted version control with command-line tools for local and collaborative dataset management

Hosted/Commercial

Contact for pricing

Managed hosting, advanced collaboration features, and additional infrastructure support

Get started with Oxen.ai

Click through to Oxen.ai and start using it now.