Together AI Launches Cost-Efficient Batch API for LLM Requests

James Ding
Jun 11, 2025 19:34

Collectively AI introduces a Batch API that reduces prices by 50% for processing massive language mannequin requests. The service gives scalable, asynchronous processing for non-urgent workloads.

Collectively AI has unveiled its new Batch API, a service designed to course of massive volumes of huge language mannequin (LLM) requests at considerably lowered prices. In response to Collectively AI, the Batch API guarantees to ship enterprise-grade efficiency at half the price of real-time inference, making it a lovely choice for companies and builders.

Why Batch Processing?

Batch processing permits for the dealing with of AI workloads that don’t require rapid responses, equivalent to artificial knowledge era and offline summarization. By processing these requests asynchronously throughout off-peak instances, customers can profit from lowered prices whereas sustaining dependable output. Most batches are accomplished inside a couple of hours, with a most processing window of 24 hours.

Key Advantages

50% Value Financial savings

The Batch API gives a 50% price discount on non-urgent workloads in comparison with real-time API calls, enabling customers to scale AI inference with out growing their budgets.

Massive Scale Processing

Customers can submit as much as 50,000 requests in a single batch file, with batch operations having their very own price limits separate from real-time utilization. The service contains real-time progress monitoring by means of varied levels, from validation to completion.

Easy Integration

Requests are uploaded as JSONL information, with progress monitored by means of the Batch API. Outcomes may be downloaded as soon as processing is full.

Supported Fashions

The Batch API helps 15 superior fashions, together with deepseek-ai and meta-llama sequence, that are tailor-made to deal with a wide range of advanced duties.

How It Works

Put together Your Requests: Format requests in a JSONL file, every with a singular identifier.
Add & Submit: Use the Information API to add the batch and create the job.
Monitor Progress: Monitor the job by means of varied processing levels.
Obtain Outcomes: Retrieve structured outcomes, with any errors documented individually.

Fee Limits & Scale

The Batch API operates beneath devoted price limits, permitting as much as 10 million tokens per mannequin and 50,000 requests per batch file, with a most dimension of 100MB per enter file.

Pricing and Greatest Practices

Customers profit from an introductory 50% low cost, with no upfront commitments. Optimum batch sizes vary from 1,000 to 10,000 requests, and mannequin choice needs to be primarily based on job complexity. Monitoring is suggested each 30-60 seconds for updates.

Getting Began

To start utilizing the Batch API, customers ought to improve to the most recent collectively Python consumer, evaluation the Batch API documentation, and discover instance cookbooks out there on-line. The service is now out there for all customers, providing important price financial savings for bulk processing of LLM requests.

Picture supply: Shutterstock

Source link