Modal screenshot

What is Modal?

Modal is a serverless cloud platform built for engineers and researchers running compute-intensive workloads. It specialises in AI, machine learning, and data processing applications where you need flexible computing resources without managing servers yourself. You write your code, and Modal handles deployment, scaling, and infrastructure; you only pay for what you use. The platform gives you direct access to GPU acceleration, custom runtime environments, and automatic scaling, making it practical for everything from training models to running batch jobs and APIs that need to grow and shrink with demand.

Key Features

GPU acceleration

Direct access to GPUs for machine learning and AI workloads without procurement hassle

Custom container images

Define your exact runtime environment, dependencies, and system libraries for reproducible deployments

Dynamic auto-scaling

Automatically scale computing resources up or down based on actual demand

Serverless functions

Deploy Python functions that run on demand without managing underlying infrastructure

Third-party integrations

Connect to AWS S3, Google Cloud Storage, Datadog, and OpenTelemetry for monitoring and data access

Secure execution

Isolated execution environments for running sensitive code and handling protected data

Pros & Cons

Advantages

  • Removes infrastructure management burden; you focus on code rather than deployment complexity
  • Pay-as-you-go pricing means no waste on idle resources or long-term commitments
  • Built specifically for AI and ML workloads with native GPU support
  • Good integration options with existing cloud storage and monitoring tools

Limitations

  • Vendor lock-in; migrating workloads to another platform requires rewriting deployment logic
  • Less suitable for workloads needing persistent, always-on infrastructure or real-time responsiveness at millisecond scale
  • Learning curve for teams new to serverless architectures and function-based design patterns

Use Cases

Training machine learning models on schedule or on-demand without maintaining persistent GPU instances

Running batch data processing jobs that vary unpredictably in size and duration

Deploying API endpoints that serve AI model inference with automatic scaling

Building data pipelines that transform and process large datasets periodically

Running parameter sweeps and hyperparameter tuning experiments for research