准备工作

  • 系统为openEuler 22.03 ARM架构 (x86也可以用,但是要自行修改镜像地址)

    cat /etc/os-release
    NAME="openEuler"
    VERSION="22.03 (LTS-SP4)"
    ID="openEuler"
    VERSION_ID="22.03"
    PRETTY_NAME="openEuler 22.03 (LTS-SP4)"
    ANSI_COLOR="0;31"
  • 已安装docker compose

    Docker version 29.5.3, build d1c06ef 
    Docker Compose version v5.1.4
  • LiteLLM版本号为v1.89.1
  • vLLM已搭建,并且可以用curl请求,测试请求格式范例如下:

chat

curl http://127.0.0.1:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen3","messages":[{"role":"user","content":"你好,请简单介绍下自己"}],"temperature":0.7,"max_tokens":1024}'

embedding

curl http://127.0.0.1:9010/v1/embeddings -H "Content-Type: application/json" -d '{"model":"/model","input":"测试文本向量化"}'

rerank

curl http://127.0.0.1:9011/v1/score -H "Content-Type: application/json" -d '{"model":"/model","pairs":[["人工智能是什么","人工智能是模拟人类智能的技术"],["人工智能是什么","今天下雨了"]]}'

一、目录结构

litellm/

├── .env

├── docker-compose.yml

└── prometheus.yml

二、创建配置文件

mkdir litellm && cd litellm

.env中的LITELLM_MASTER_KEY为默认登入密码,SEARXNG_API_BASE为可选配置

cat > .env << 'EOF'
LITELLM_MASTER_KEY="sk-c0oE4EgitT4DmXyRj4Av"
SEARXNG_API_BASE="http://192.168.105.143:18080"
EOF
cat > docker-compose.yml << 'EOF'
services:
  litellm:
    build:
      context: .
      args:
        target: runtime
    image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/docker.litellm.ai/berriai/litellm:main-stable-linuxarm64
    #########################################
    ## Uncomment these lines to start proxy with a config.yaml file ##
    # volumes:
    #  - ./config.yaml:/app/config.yaml
    # command:
    #  - "--config=/app/config.yaml"
    ##############################################
    ports:
      - "3000:4000" # Map the container port to the host, change the host port if necessary
    environment:
      DATABASE_URL: "postgresql://llmproxy:dbpassword9090@db:5432/litellm"
      # Optional: route read-only queries (find_*, count, group_by, query_raw/_first)
      # to a separate reader endpoint, e.g. an Aurora reader. Leave unset for
      # single-DB deployments. With IAM_TOKEN_DB_AUTH enabled, the reader URL
      # is auto-refreshed alongside the writer.
      # DATABASE_URL_READ_REPLICA: "postgresql://llmproxy:dbpassword9090@db-reader:5432/litellm"
      STORE_MODEL_IN_DB: "True" # allows adding models to proxy via UI
    env_file:
      - .env # Load local .env file
    depends_on:
      - db  # Indicates that this service depends on the 'db' service, ensuring 'db' starts first
    healthcheck:  # Defines the health check configuration for the container
      test:
        - CMD-SHELL
        - python3 -c "import urllib.request; urllib.request.urlopen('http://localhost:4000/health/liveliness')"  # Command to execute for health check
      interval: 30s  # Perform health check every 30 seconds
      timeout: 10s   # Health check command times out after 10 seconds
      retries: 3     # Retry up to 3 times if health check fails
      start_period: 40s  # Wait 40 seconds after container start before beginning health checks

  db:
    image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/postgres:16-linuxarm64
    restart: always
    container_name: litellm_db
    environment:
      POSTGRES_DB: litellm
      POSTGRES_USER: llmproxy
      POSTGRES_PASSWORD: dbpassword9090
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data # Persists Postgres data across container restarts
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d litellm -U llmproxy"]
      interval: 1s
      timeout: 5s
      retries: 10

  prometheus:
    image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/prom/prometheus:v3.5.4-linuxarm64
    volumes:
      - prometheus_data:/prometheus
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--storage.tsdb.retention.time=15d"
    restart: always

volumes:
  prometheus_data:
    driver: local
  postgres_data:
    name: litellm_postgres_data # Named volume for Postgres data persistence
EOF
cat > prometheus.yml << 'EOF'
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'litellm'
    static_configs:
      - targets: ['litellm:4000']  # Assuming Litellm exposes metrics at port 4000

EOF

三、启动LiteLLM

docker compose up -d

正常启动后可以访问http://localhost:3000/ui来访问后台管理web

四、配置web

1、配置模型地址

Models + Endpoints ---> Add Model

表单配置如下:

字段名说明
Providervllm上游模型供应商类型
LiteLLM Model Name(s)qwen3上游模型名称,根据vllm的docker启动配置的名称
Model Mappings->Public Model Nameqwen3-public提供给下游平台的暴露模型名称
Model Mappings->LiteLLM Model Name默认值上面的填好了这里自动完成,正常和上面配置的字符串相等
ModeChat - /chat/completions聊天模型选择chat,embedding和rerank根据类型自行选择
Existing Credentials留空
API Basehttp://127.0.0.1:8000/v1api地址,路由填写到v1
API Key留空没配置秘钥就不用填写

配置完成后点击Test Connect测试连接是否成功,成功后保存

2、配置用户组

Teams ---> Create Team

表单配置如下:

字段名说明
Team Nametest-team自行定义组名
Modelsqwen3-public根据上面配置的对外服务模型名称选择

其余字段全部留空或者默认

3、配置用户

Internal Users ---> Invite User

表单配置如下:

字段名说明
User Emailtest-person用户名,可以不是email地址,也支持中文
Global Proxy RoleInternal User (View Only)可选,这里默认是只可使用模型的用户,无管理权限
Teamtest-team用户组名,根据上面配置好的组名选择

其余字段全部留空或者默认

4、配置API Key

Virtual Keys ---> Create New Key

表单配置如下:

字段名说明
Owned ByAnother User配置给用户使用的key,还有其他的比如配置给服务或者agent
User IDtest-person上面创建的用户名
Organization留空
Teamtest-team上面创建的用户组名
Key Nametest-key秘钥名称,在log或usage中显示的名称
ModelsAll team Models秘钥可以使用的模型名称,这里可以选所有可用模型,或者单个模型,单个模型场景下可以针对每个用户每个模型单独配额
Key TypeAI APIs秘钥权限

其余字段全部留空或者默认

保存后会跳出对话框显示秘钥,注意此处的秘钥有且仅有一次机会可以复制,后续都会变为密文形式

我这里的秘钥是sk-mHz0miMx_QyR6xAhxVmSgw

五、测试连通性

使用以下curl测试,key和model字段,替换为上面的值

curl http://localhost:3000/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-mHz0miMx_QyR6xAhxVmSgw" -d '{"model": "qwen3-public", "messages": [{"role": "user", "content": "你好,请介绍一下你自己"}]}'

正常的话,结果如下

[root@localhost gongwen]# curl http://localhost:3000/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer sk-mHz0miMx_QyR6xAhxVmSgw" -d '{"model": "qwen3-public", "messages": [{"role": "user", "content": "你好,请介绍一下你自己"}]}'
{"id":"chatcmpl-bfbabeb7f39ecddb","created":1782720274,"model":"qwen3-public","object":"chat.completion","choices":[{"finish_reason":"stop","index":0,"message":{"content":"你好!我是通义千问(Qwen),是阿里巴巴集团旗下的通义实验室研发的超大规模语言模型。我能够回答问题、创作文字,比如写故事、公文、邮件、剧本,进行逻辑推理、编程,甚至能玩游戏、表达观点和聊天。\n\n我的能力包括但不限于:\n\n- 多语言支持:我支持包括中文、英文、法语、西班牙语、葡萄牙语、俄语、阿拉伯语、日语、韩语、越南语、泰语、印尼语等在内的数十种语言。\n- 代码写作:我熟悉多种编程语言,可以帮你写代码、调试、解释算法。\n- 逻辑推理:我能处理复杂的逻辑问题,比如数学题、谜题、推理题等。\n- 长文本处理:我支持超长上下文(最高可达32768个token),适合处理长文档、书籍摘要、会议记录等。\n- 对话理解:我擅长多轮对话,能记住上下文,提供连贯、自然的交流体验。\n- 知识丰富:我训练数据截至2024年,涵盖广泛领域,能为你提供最新、最全面的信息(在训练数据范围内)。\n\n我叫“通义千问”,“通义”代表我具有广泛的知识和普适性,“千问”则代表我能够回答各种各样的问题,无论多么复杂或独特。\n\n如果你有任何问题或需要帮助,尽管告诉我,我会尽力为你提供支持!😊\n\n—— 你的AI助手 Qwen","role":"assistant","provider_specific_fields":{"refusal":null,"reasoning":null}},"provider_specific_fields":{"token_ids":null,"stop_reason":null}}],"usage":{"completion_tokens":318,"prompt_tokens":12,"total_tokens":330}}[root@localhost gongwen]#
最后修改:2026 年 07 月 03 日
如果觉得我的文章对你有用,请随意赞赏