准备工作
系统为openEuler 22.03 ARM架构 (x86也可以用,但是要自行修改镜像地址)
cat /etc/os-release NAME="openEuler" VERSION="22.03 (LTS-SP4)" ID="openEuler" VERSION_ID="22.03" PRETTY_NAME="openEuler 22.03 (LTS-SP4)" ANSI_COLOR="0;31"已安装docker compose
Docker version 29.5.3, build d1c06ef Docker Compose version v5.1.4- LiteLLM版本号为
v1.89.1 - vLLM已搭建,并且可以用curl请求,测试请求格式范例如下:
chat
curl http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen3","messages":[{"role":"user","content":"你好,请简单介绍下自己"}],"temperature":0.7,"max_tokens":1024}'embedding
curl http://127.0.0.1:9010/v1/embeddings -H "Content-Type: application/json" -d '{"model":"/model","input":"测试文本向量化"}'rerank
curl http://127.0.0.1:9011/v1/score -H "Content-Type: application/json" -d '{"model":"/model","pairs":[["人工智能是什么","人工智能是模拟人类智能的技术"],["人工智能是什么","今天下雨了"]]}'一、目录结构
litellm/
├── .env
├── docker-compose.yml
└── prometheus.yml
二、创建配置文件
mkdir litellm && cd litellm.env中的LITELLM_MASTER_KEY为默认登入密码,SEARXNG_API_BASE为可选配置
cat > .env << 'EOF'
LITELLM_MASTER_KEY="sk-c0oE4EgitT4DmXyRj4Av"
SEARXNG_API_BASE="http://192.168.105.143:18080"
EOFcat > docker-compose.yml << 'EOF'
services:
litellm:
build:
context: .
args:
target: runtime
image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/docker.litellm.ai/berriai/litellm:main-stable-linuxarm64
#########################################
## Uncomment these lines to start proxy with a config.yaml file ##
# volumes:
# - ./config.yaml:/app/config.yaml
# command:
# - "--config=/app/config.yaml"
##############################################
ports:
- "3000:4000" # Map the container port to the host, change the host port if necessary
environment:
DATABASE_URL: "postgresql://llmproxy:dbpassword9090@db:5432/litellm"
# Optional: route read-only queries (find_*, count, group_by, query_raw/_first)
# to a separate reader endpoint, e.g. an Aurora reader. Leave unset for
# single-DB deployments. With IAM_TOKEN_DB_AUTH enabled, the reader URL
# is auto-refreshed alongside the writer.
# DATABASE_URL_READ_REPLICA: "postgresql://llmproxy:dbpassword9090@db-reader:5432/litellm"
STORE_MODEL_IN_DB: "True" # allows adding models to proxy via UI
env_file:
- .env # Load local .env file
depends_on:
- db # Indicates that this service depends on the 'db' service, ensuring 'db' starts first
healthcheck: # Defines the health check configuration for the container
test:
- CMD-SHELL
- python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:4000/health/liveliness')" # Command to execute for health check
interval: 30s # Perform health check every 30 seconds
timeout: 10s # Health check command times out after 10 seconds
retries: 3 # Retry up to 3 times if health check fails
start_period: 40s # Wait 40 seconds after container start before beginning health checks
db:
image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/postgres:16-linuxarm64
restart: always
container_name: litellm_db
environment:
POSTGRES_DB: litellm
POSTGRES_USER: llmproxy
POSTGRES_PASSWORD: dbpassword9090
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data # Persists Postgres data across container restarts
healthcheck:
test: ["CMD-SHELL", "pg_isready -d litellm -U llmproxy"]
interval: 1s
timeout: 5s
retries: 10
prometheus:
image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/prom/prometheus:v3.5.4-linuxarm64
volumes:
- prometheus_data:/prometheus
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=15d"
restart: always
volumes:
prometheus_data:
driver: local
postgres_data:
name: litellm_postgres_data # Named volume for Postgres data persistence
EOFcat > prometheus.yml << 'EOF'
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'litellm'
static_configs:
- targets: ['litellm:4000'] # Assuming Litellm exposes metrics at port 4000
EOF三、启动LiteLLM
docker compose up -d正常启动后可以访问http://localhost:3000/ui来访问后台管理web
四、配置web
1、配置模型地址
Models + Endpoints ---> Add Model
表单配置如下:
| 字段名 | 值 | 说明 |
|---|---|---|
| Provider | vllm | 上游模型供应商类型 |
| LiteLLM Model Name(s) | qwen3 | 上游模型名称,根据vllm的docker启动配置的名称 |
| Model Mappings->Public Model Name | qwen3-public | 提供给下游平台的暴露模型名称 |
| Model Mappings->LiteLLM Model Name | 默认值 | 上面的填好了这里自动完成,正常和上面配置的字符串相等 |
| Mode | Chat - /chat/completions | 聊天模型选择chat,embedding和rerank根据类型自行选择 |
| Existing Credentials | 留空 | |
| API Base | http://127.0.0.1:8000/v1 | api地址,路由填写到v1 |
| API Key | 留空 | 没配置秘钥就不用填写 |
配置完成后点击Test Connect测试连接是否成功,成功后保存
2、配置用户组
Teams ---> Create Team
表单配置如下:
| 字段名 | 值 | 说明 |
|---|---|---|
| Team Name | test-team | 自行定义组名 |
| Models | qwen3-public | 根据上面配置的对外服务模型名称选择 |
其余字段全部留空或者默认
3、配置用户
Internal Users ---> Invite User
表单配置如下:
| 字段名 | 值 | 说明 |
|---|---|---|
| User Email | test-person | 用户名,可以不是email地址,也支持中文 |
| Global Proxy Role | Internal User (View Only) | 可选,这里默认是只可使用模型的用户,无管理权限 |
| Team | test-team | 用户组名,根据上面配置好的组名选择 |
其余字段全部留空或者默认
4、配置API Key
Virtual Keys ---> Create New Key
表单配置如下:
| 字段名 | 值 | 说明 |
|---|---|---|
| Owned By | Another User | 配置给用户使用的key,还有其他的比如配置给服务或者agent |
| User ID | test-person | 上面创建的用户名 |
| Organization | 留空 | |
| Team | test-team | 上面创建的用户组名 |
| Key Name | test-key | 秘钥名称,在log或usage中显示的名称 |
| Models | All team Models | 秘钥可以使用的模型名称,这里可以选所有可用模型,或者单个模型,单个模型场景下可以针对每个用户每个模型单独配额 |
| Key Type | AI APIs | 秘钥权限 |
其余字段全部留空或者默认
保存后会跳出对话框显示秘钥,注意此处的秘钥有且仅有一次机会可以复制,后续都会变为密文形式
我这里的秘钥是sk-mHz0miMx_QyR6xAhxVmSgw
五、测试连通性
使用以下curl测试,key和model字段,替换为上面的值
curl http://localhost:3000/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-mHz0miMx_QyR6xAhxVmSgw" -d '{"model": "qwen3-public", "messages": [{"role": "user", "content": "你好,请介绍一下你自己"}]}'正常的话,结果如下
[root@localhost gongwen]# curl http://localhost:3000/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-mHz0miMx_QyR6xAhxVmSgw" -d '{"model": "qwen3-public", "messages": [{"role": "user", "content": "你好,请介绍一下你自己"}]}'
{"id":"chatcmpl-bfbabeb7f39ecddb","created":1782720274,"model":"qwen3-public","object":"chat.completion","choices":[{"finish_reason":"stop","index":0,"message":{"content":"你好!我是通义千问(Qwen),是阿里巴巴集团旗下的通义实验室研发的超大规模语言模型。我能够回答问题、创作文字,比如写故事、公文、邮件、剧本,进行逻辑推理、编程,甚至能玩游戏、表达观点和聊天。\n\n我的能力包括但不限于:\n\n- 多语言支持:我支持包括中文、英文、法语、西班牙语、葡萄牙语、俄语、阿拉伯语、日语、韩语、越南语、泰语、印尼语等在内的数十种语言。\n- 代码写作:我熟悉多种编程语言,可以帮你写代码、调试、解释算法。\n- 逻辑推理:我能处理复杂的逻辑问题,比如数学题、谜题、推理题等。\n- 长文本处理:我支持超长上下文(最高可达32768个token),适合处理长文档、书籍摘要、会议记录等。\n- 对话理解:我擅长多轮对话,能记住上下文,提供连贯、自然的交流体验。\n- 知识丰富:我训练数据截至2024年,涵盖广泛领域,能为你提供最新、最全面的信息(在训练数据范围内)。\n\n我叫“通义千问”,“通义”代表我具有广泛的知识和普适性,“千问”则代表我能够回答各种各样的问题,无论多么复杂或独特。\n\n如果你有任何问题或需要帮助,尽管告诉我,我会尽力为你提供支持!😊\n\n—— 你的AI助手 Qwen","role":"assistant","provider_specific_fields":{"refusal":null,"reasoning":null}},"provider_specific_fields":{"token_ids":null,"stop_reason":null}}],"usage":{"completion_tokens":318,"prompt_tokens":12,"total_tokens":330}}[root@localhost gongwen]#