Together AI Launches Voice Agent Platform With Sub-700ms Latency

Lawrence Jengar
Mar 13, 2026 01:57

Collectively AI debuts unified voice agent infrastructure with Deepgram and Cartesia integrations, concentrating on enterprise deployments with end-to-end latency beneath 700ms.

Collectively AI rolled out a unified voice agent platform that retains speech-to-text, language fashions, and text-to-speech processing on the identical infrastructure cluster. The $3.3 billion AI cloud startup claims the setup delivers end-to-end latency beneath 700 milliseconds—quick sufficient for pure dialog move.

The platform integrates natively with Deepgram for transcription and Cartesia for voice synthesis, each working on Collectively’s co-located servers slightly than bouncing audio throughout a number of cloud suppliers.

Why Co-Location Issues for Voice

Most manufacturing voice methods sew collectively separate distributors for every pipeline stage. Audio hits one supplier for transcription, routes to a different for the LLM response, then bounces to a 3rd for speech synthesis. Every handoff provides community latency and failure factors.

Collectively’s pitch: maintain the whole lot in the identical datacenter. The corporate studies sub-500ms latency in optimum situations, although the 700ms determine represents their acknowledged ceiling for end-to-end processing.

“Voice brokers stay or die by latency, and each community hop between suppliers is a spot the place the expertise breaks down,” stated Abe Pursell, Deepgram’s VP of Partnerships.

Mannequin Flexibility With out the Patchwork

The platform helps Whisper Massive v3, Minimax Speech 2.6 Turbo, Rime Arcana, and Kokoro alongside Collectively’s full LLM catalog. Builders can swap elements with out rebuilding integrations—helpful for groups testing totally different voice traits or transcription accuracy for particular use instances.

Cartesia brings its Sonic-3 and Sonic-2 TTS fashions to the platform. Deepgram contributes Nova-3, Nova-3 Multilingual for transcription, Flux for conversational STT, and Aura-2 for synthesis.

Not like opaque speech-to-speech methods, Collectively’s modular strategy preserves entry to intermediate transcripts and response textual content. Groups can examine, modify, and route information mid-stream—a requirement for a lot of enterprise compliance workflows.

Enterprise Necessities and Manufacturing Use

The platform targets regulated industries with zero information retention choices, SOC 2 Kind II certification, HIPAA compliance, and devoted information residency. Decagon, which runs buyer help voice brokers dealing with billing inquiries and technical troubleshooting, already operates on the stack.

Collectively AI raised $305 million in February 2025 at a $3.3 billion valuation, with studies suggesting the corporate is now in talks to boost at $7.5 billion. The corporate has surpassed 450,000 builders and crossed $100 million in annualized income.

The voice platform launch represents Collectively’s growth past its core LLM inference enterprise into the rising voice AI market, the place latency and reliability stay persistent ache factors for manufacturing deployments.

Picture supply: Shutterstock

Source link