Fallback-LLMs

Enterprise-Grade AI Continuity: Fallback LLM Endpoints for Enhancing Reliability

Published on -March 12th 2025

3 Minute Read

Share this post

Enterprise-Grade AI Continuity: Fallback LLM Endpoints for Enhancing Reliability

As Generative AI applications transition from proof of concept to production, they become integral to business operations. If you have utilized ChatGPT or Claude.ai, how frequently have you encountered capacity exceeded warnings advising you to attempt the operation again later? While these consumer-grade applications may tolerate capacity issues, enterprise applications face significant consequences arising from outages or capacity constraints. For instance, consider a customer service expert assistant designed to provide real-time support, guide service agents in understanding customer sentiments, formulate follow-up inquiries, or present promotional offers. Should this system become temporarily unavailable due to capacity limitations, it may result in lost business opportunities. Furthermore, suppose Generative AI agents are employed for intelligent automation in order-to-cash systems. In that case, capacity issues can severely hinder customer order fulfillment, potentially resulting in breaches of service level agreements (SLAs) or loss of business revenue.

Karini AI is a foundational Generative AI solution enterprises utilize for various applications, including intelligent automation and conversational agents. These applications necessitate enterprise-grade availability. Model providers such as Amazon Bedrock, Azure OpenAI, and Google Vertex offer shared capacity through their on-demand endpoints, allowing customers to select provisioned throughput for latency-sensitive workloads. However, provisioned throughput may prove financially prohibitive during initial production rollouts due to insufficient demand or simply being unnecessary. On-demand LLM endpoints operate with shared capacity and incorporate request queuing. Consequently, during peak hours, users may encounter timeout errors or prolonged response times resulting from capacity restrictions, leading to failures in Generative AI applications. In scenarios requiring provisioned throughput for fluctuating workloads, users may also experience delayed response times during high-traffic periods.

To address this challenge, we are excited to introduce Fallback LLM endpoints to address LLM model endpoint failures. These fallback endpoints will serve requests when the primary LLM endpoint is unresponsive or encounters an error. This capability mitigates or entirely resolves LLM endpoint capacity issues. The feature delivers versatility and flexibility, facilitating usage across various endpoint combinations.

The primary and fallback endpoints associated with the same LLM (e.g., Anthropic Sonnet 3.5) are distributed across different AWS regions. Additionally, the primary and fallback endpoints for the same LLM (e.g., Anthropic Sonnet 3.5) can be integrated across cloud service providers such as AWS and Google. Finally, the primary endpoint configured for provisioned throughput and the fallback endpoint utilizing an on-demand configuration can coexist within the same region.

While robust, this feature is easily tested in Karini’s prompt playground and deployed into the production environment through the Generative AI recipe. This video demonstrates the streamlined testing and deployment process.

We are also excited to announce a system alerting feature that sends email alerts when workflow errors are detected. Users can customize email templates within the recipe to transmit reports to their enterprise system monitoring tools, such as GuardDuty.

With these enterprise-grade features, Karini AI solidifies its position of generative AI as a critical component within enterprise systems, propelling business transformation. GenAIFoundation is purpose-built for generative AI in the enterprise. Contact us directly for more information on how Karini can help your organization.

Karini AI: Building Better AI, Faster.
Orchestrating GenAI Apps for Enterprises GenAiOps at scale.