Open Source LLMOps Stack

Introduction

Choosing the right technology stack is one of the most critical decisions teams must make when building LLM-powered applications. The wrong choices can lead to vendor lock-in, limited flexibility, and high switching costs, all of which can stifle innovation and slow down iteration cycles.

Thus we introduce the “Open Source LLMOps Stack” as an open, well-integrated, scalable, and established setup backed by large OSS communities. It includes LiteLLM (GitHub) as an LLM Proxy/Gateway and Langfuse (GitHub) for Observability, Evaluation and Prompt Management.

Selection of users/contributors:

“Our experience with LiteLLM and Langfuse at Lemonade has been outstanding. LiteLLM streamlines the complexities of managing multiple LLM models, while Langfuse provides clear, actionable insights for monitoring and optimizing our AI applications. Being able to host this stack within our infrastructure has been a game-changer, and we’re particularly impressed with the rapid development pace of both projects.” — Mark Koltnuk, Principal Architect (GenAI Platform) at Lemonade

“Generative AI technology evolves at an unprecedented rate. LiteLLM is an amazing product that helps our Engineering teams quickly prototype, launch, and deliver new AI features at scale. Langfuse instrumentation gives our teams the visibility to quickly pinpoint issues across our LLMops stack and resolve them in minutes. Together, these products are essential at Fletch and consistently deliver best-in-class experiences for our customers.” — Darien Kindlund, VP of Technology at Fletch

“LiteLLM x Langfuse has improved our iteration speed and monitoring capabilities by 10x. Langfuse’s crisp documentation allowed Decisional to deploy detailed LLM observability and evaluation locally in less than half a day. This instrumentation caught multiple bugs within a week, saving us hundreds of hours (not exaggerated) of dev time. LiteLLM’s model picker made it really easy to switch between model providers and add failovers.” — Adit Sanghvi, CTO at Decisional

You can deploy this stack in a matter of minutes via the templates (HELM or docker compose) in your own environment. Read on to learn more.

LiteLLM

Unified LLM API (OpenAI Format), Cost Allocation, Model Access Management

LiteLLM is an open-source Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]. It offers teams:

Unified LLM API (OpenAI Format)

OpenAI-Compatible API - Access 100+ LLMs in the OpenAI format: Bedrock, VertexAI, Anthropic, Hugging Face, vLLM and more.
Built-in failover mechanisms - Automatically switch to alternative models if a provider is down.
Load balancing - Distribute traffic efficiently across multiple LLMs for scalability and reliability.

Observability & Cost Management

Logging & request tracking - Capture logs of all requests and responses for debugging and analysis.
Model Access Controls - Control access to models by virtual keys, teams
Budgets & Rate Limits - Track spend and set budget limits across models, keys, teams and custom tags
Prompt versioning & A/B testing - Manage different prompt iterations and compare effectiveness.

Model Access Management

Virtual Keys - Control access to models by virtual keys, teams and model access groups
Self-serve Portal (SSO) - Allow teams to login via SSO and manage their own keys in production with a self-serve key management portal

Langfuse

Observability, Evaluation, and Prompt Management

Langfuse complements LiteLLM by offering deep visibility and structured evaluations for LLM-powered applications. These are the core capabilities of Langfuse:

LLM Application Observability/Tracing: Instrument your app and start ingesting traces to Langfuse, thereby tracking LLM calls and other relevant logic in your app such as retrieval, embedding, or agent actions. Inspect and debug complex logs and user sessions. Try the interactive demo to see this in action.
Prompt Management helps you centrally manage, version control, and collaboratively iterate on your prompts. Thanks to strong caching on server and client side, you can iterate on prompts without adding latency to your application.
Evaluations are key to the LLM application development workflow, and Langfuse adapts to your needs. It supports LLM-as-a-judge, user feedback collection, manual labeling, and custom evaluation pipelines via APIs/SDKs.
Datasets enable test sets and benchmarks for evaluating your LLM application. They support continuous improvement, pre-deployment testing, structured experiments, flexible evaluation, and seamless integration with frameworks like LangChain and LlamaIndex.
LLM Playground is a tool for testing and iterating on your prompts and model configurations, shortening the feedback loop and accelerating development. When you see a bad result in tracing, you can directly jump to the playground to iterate on it.
Comprehensive API: Langfuse is frequently used to power bespoke LLMOps workflows while using the building blocks provided by Langfuse via the API. OpenAPI spec, Postman collection, and typed SDKs for Python, JS/TS are available.

Integration

LiteLLM and Langfuse integrate directly, thereby you can use LiteLLM as a central model gateway and Langfuse for observability, evals and prompt management. To learn more, see the integration docs.

Using Langfuse Prompts directly via LiteLLM is currently in beta (docs). Thereby you can only pass the prompt variables to the LLM request and all prompt fetching/caching is done by LiteLLM. Alternatively, you can rely on the Langfuse SDKs to use prompts in your application code (docs).

Why This Stack?

Choosing the right LLM Ops stack is critical for teams that require long-term flexibility, scalability, and control over their AI infrastructure. Many proprietary solutions impose vendor lock-in, limiting adaptability and making it difficult to iterate on workflows as technology evolves.

By adopting the Open-Source LLM Ops Stack, teams gain:

Self-Hosting & Open Source - Both LiteLLM and Langfuse are open-source, allowing teams to deploy them in their own infrastructure while ensuring complete control over their LLM operations.
Technology Independence - Avoid reliance on a single cloud provider or LLM vendor, ensuring the ability to switch models and providers with minimal friction.
Enterprise-Ready & Scalable - These tools are built to handle large-scale deployments with robust performance, failover mechanisms, and high-availability setups.
Battle-Tested & Well-Documented - Both projects (with a combined 25K Github Stars) are widely used in production environments and offer extensive documentation to support engineering teams.
Large Community & Active Development - A vibrant open-source community actively contributes to both LiteLLM and Langfuse, ensuring continuous improvements, feature additions, and long-term viability.

Getting Started with the Open Source LLMOps Stack

Deploying the stack is straightforward:

Deploy LiteLLM and Langfuse
Setup Integration by enabling callbacks from LiteLLM to Langfuse
Use LiteLLM gateway endpoints across your application
(optional) Setup evals and prompt management in Langfuse
Done

👉 Follow the step-by-step tutorial

Questions/Feedback?

We are very much looking forward to your feedback!

OSS LLMOps Stack: Repo
LiteLLM: Issue tracker, Discord
Langfuse: Issue tracker, GitHub Discussions, Discord, Ask AI