[ad_1]
James Ding
Jun 11, 2025 19:34
Together AI introduces a Batch API that reduces costs by 50% for processing large language model requests. The service offers scalable, asynchronous processing for non-urgent workloads.
Together AI has unveiled its new Batch API, a service designed to process large volumes of large language model (LLM) requests at significantly reduced costs. According to Together AI, the Batch API promises to deliver enterprise-grade performance at half the cost of real-time inference, making it an attractive option for businesses and developers.
Batch processing allows for the handling of AI workloads that do not require immediate responses, such as synthetic data generation and offline summarization. By processing these requests asynchronously during off-peak times, users can benefit from reduced costs while maintaining reliable output. Most batches are completed within a few hours, with a maximum processing window of 24 hours.
The Batch API offers a 50% cost reduction on non-urgent workloads compared to real-time API calls, enabling users to scale AI inference without increasing their budgets.
Users can submit up to 50,000 requests in a single batch file, with batch operations having their own rate limits separate from real-time usage. The service includes real-time progress tracking through various stages, from validation to completion.
Requests are uploaded as JSONL files, with progress monitored through the Batch API. Results can be downloaded once processing is complete.
The Batch API supports 15 advanced models, including deepseek-ai and meta-llama series, which are tailored to handle a variety of complex tasks.
The Batch API operates under dedicated rate limits, allowing up to 10 million tokens per model and 50,000 requests per batch file, with a maximum size of 100MB per input file.
Users benefit from an introductory 50% discount, with no upfront commitments. Optimal batch sizes range from 1,000 to 10,000 requests, and model selection should be based on task complexity. Monitoring is advised every 30-60 seconds for updates.
To begin using the Batch API, users should upgrade to the latest together Python client, review the Batch API documentation, and explore example cookbooks available online. The service is now available for all users, offering significant cost savings for bulk processing of LLM requests.
Image source: Shutterstock
[ad_2]
Source link
[ad_1] भारतीय शेयर बाजारों में शुक्रवार (11 अप्रैल) को जबरदस्त तेजी देखने को मिली। सेंसेक्स…
[ad_1] Joerg Hiller Dec 13, 2025 13:56 BTC price prediction suggests…
[ad_1] Mutual Fund March 2025 Data: शेयर बाजार में जारी उतार-चढ़ाव और ट्रंप टैरिफ (Trump…
[ad_1] Lawrence Jengar Dec 10, 2025 12:37 Glassnode releases The Bitcoin…
[ad_1] जेफरीज के अनुसार 2026 में देखने योग्य शीर्ष उपभोक्ता वित्त स्टॉक्स [ad_2] Source link
[ad_1] Felix Pinkston Dec 10, 2025 12:39 ARB price prediction shows…
This website uses cookies.