LiteLLM - An open-source library to simplify LLM completion + embedding calls.

Posted in Recipe on March 21, 2024 by Venkatesh S ‐ 4 min read

LLM genai chatgpt lmstudio Beginner Intermediate 10 Minutes

🚅 LiteLLM - An open-source library to simplify LLM completion + embedding calls.

I have been working on using may local LLMs. When these LLMs are run using some of the services like Ollama, LMStudio and others, they tend to expose their services using wrappers that are not OpenAI compatible. There are many frameworks that by default are built to work on OpenAI APIs.

LiteLLM is a proxy server that helps with a unified wrapper and assists us to call 100+ LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, etc.]. This eables us to build applications and services on top of LLMs as if we were building it for OpenAI. It also helps us to switch between models and not get tied to one vendor or provider specifications or APIs.

What does it do?

LiteLLM manages:

Translate inputs to provider’s completion, embedding, and image_generation endpoints
Consistent output, text responses will always be available at ['choices'][0]['message']['content']
Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
Set Budgets & Rate limits per project, api key, model OpenAI Proxy Server

How to use LiteLLM?

You can use litellm through either:

OpenAI Proxy Server - Server to call 100+ LLMs, load balance, cost tracking across projects
LiteLLM python SDK - Python Client to call 100+ LLMs, load balance, cost tracking

Today we will focus on the OpenAI Proxy Server option.

Deploying LiteLLM using Docker

While there are various options to deploy the LiteLLM onto production, we will focus on deploying this for a developer desktop. Please note that we will be using Ollama with Llama2 as the model for this example. If you want to know to setup Ollama and bring it up locally, refer this article on Ollama : Get up and running with Large Language Models locally

Step 1 : Create a new config file called litellm_config.yaml with the following contents.

model_list:
  - model_name: llama2
    litellm_params:
      model: ollama/llama2
      api_base: http://localhost:11434

Step 2 : Run litellm docker image

docker run \
    -v $(pwd)/litellm_config.yaml:/app/config.yaml \
    -p 4000:4000 \
    ghcr.io/berriai/litellm:main-latest \
    --config /app/config.yaml --detailed_debug

Step 3 : Send a test request to ensure your setup is working

Note that the name of the model_name in first step and model in this step is a match

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "llama2",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ]
}'

There are various ways to deploy this as given in this link. Refer for more details

What does OpenAI Proxy provide?

The proxy provides:

The Swagger Docs for all the APIs it provides are listed.

Supported Providers (Docs)

As of this day, the following are the providers supported.

Provider	Completion	Streaming	Async Completion	Async Streaming	Async Embedding	Async Image Generation
openai	✅	✅	✅	✅	✅	✅
azure	✅	✅	✅	✅	✅	✅
aws - sagemaker	✅	✅	✅	✅	✅
aws - bedrock	✅	✅	✅	✅	✅
google - vertex_ai [Gemini]	✅	✅	✅	✅
google - palm	✅	✅	✅	✅
google AI Studio - gemini	✅		✅
mistral ai api	✅	✅	✅	✅	✅
cloudflare AI Workers	✅	✅	✅	✅
cohere	✅	✅	✅	✅	✅
anthropic	✅	✅	✅	✅
huggingface	✅	✅	✅	✅	✅
replicate	✅	✅	✅	✅
together_ai	✅	✅	✅	✅
openrouter	✅	✅	✅	✅
ai21	✅	✅	✅	✅
baseten	✅	✅	✅	✅
vllm	✅	✅	✅	✅
nlp_cloud	✅	✅	✅	✅
aleph alpha	✅	✅	✅	✅
petals	✅	✅	✅	✅
ollama	✅	✅	✅	✅
deepinfra	✅	✅	✅	✅
perplexity-ai	✅	✅	✅	✅
Groq AI	✅	✅	✅	✅
anyscale	✅	✅	✅	✅
voyage ai					✅
xinference [Xorbits Inference]					✅

LiteLLM - An open-source library to simplify LLM completion + embedding calls.

🚅 LiteLLM - An open-source library to simplify LLM completion + embedding calls. #

What does it do? #

How to use LiteLLM? #

Deploying LiteLLM using Docker #

What does OpenAI Proxy provide? #

Supported Providers (Docs) #

References #

Related posts

LMStudio - Discover, Download, and Run Local LLMs

FlowiseAI - Build LLM Apps Easily

ChatGPT like Interface with Open WebUI and Ollama