Clips AI

Clips AI

The resizing feature in ClipsAI dynamically adjusts a video to focus on the current speaker. It utilizes speaker diarization with Pyannote, scene change detection with PySceneDetect, and face detectio

Clips AI screenshot

What is Clips AI?

Clips AI is a video editing tool that automatically resizes and reframes videos to keep speakers in focus. It uses speaker identification, scene detection, and face recognition to intelligently crop video content, making it useful for converting standard footage into formats suited for different platforms. The tool is designed for content creators, video editors, and producers who need to repurpose video content across multiple social media channels or formats without manual frame-by-frame editing. It handles both the technical detection work and the actual video file processing, allowing users to specify their target aspect ratio and let the system handle the rest.

Key Features

Speaker diarization

identifies and tracks who is speaking throughout the video using Pyannote technology

Scene change detection

recognises when scenes shift to avoid awkward crops during transitions

Face detection

locates faces in frames using MTCNN and MediaPipe to keep subjects centred

Aspect ratio customisation

resizes videos to fit different platform requirements, from vertical mobile formats to horizontal widescreen

Batch processing capability

processes video files through an API, supporting both video-only and audio-video files

Pros & Cons

Advantages

  • Saves significant time on manual video editing and reframing tasks
  • Intelligent cropping that responds to speaker position rather than applying fixed crops
  • Works with multiple video formats and file types through flexible API integration
  • Freemium model lets you test functionality without upfront cost

Limitations

  • Requires some technical knowledge to set up, particularly for API integration and authentication tokens
  • Quality of automatic cropping depends on video lighting, camera angles, and audio clarity for speaker detection
  • Limited information available about processing speed and performance on very long videos

Use Cases

Converting long-form podcast or interview footage into short vertical clips for TikTok, Instagram Reels, or YouTube Shorts

Automatically adapting webinar or conference recordings for multiple social platforms

Creating focus-adjusted versions of multi-speaker videos for accessibility or emphasis

Batch processing recorded content to maintain consistent framing across a series of videos

Repurposing widescreen content for mobile-first distribution without manual editing