<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>liteLLM Blog</title>
        <link>https://docs.litellm.ai/blog</link>
        <description>liteLLM Blog</description>
        <lastBuildDate>Mon, 16 Mar 2026 10:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[New Video Characters, Edit and Extension API support]]></title>
            <link>https://docs.litellm.ai/blog/video_characters_api</link>
            <guid>https://docs.litellm.ai/blog/video_characters_api</guid>
            <pubDate>Mon, 16 Mar 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[LiteLLM now supports creating, retrieving, and managing reusable video characters across multiple video generations.]]></description>
            <content:encoded><![CDATA[<p>LiteLLM now supoports videos character, edit and extension apis.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-new">What's New<a href="https://docs.litellm.ai/blog/video_characters_api#whats-new" class="hash-link" aria-label="Direct link to What's New" title="Direct link to What's New">​</a></h2>
<p>Four new endpoints for video character operations:</p>
<ul>
<li><strong>Create character</strong> - Upload a video to create a reusable asset</li>
<li><strong>Get character</strong> - Retrieve character metadata</li>
<li><strong>Edit video</strong> - Modify generated videos</li>
<li><strong>Extend video</strong> - Continue clips with character consistency</li>
</ul>
<p><strong>Available from:</strong> LiteLLM v1.83.0+</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="quick-example">Quick Example<a href="https://docs.litellm.ai/blog/video_characters_api#quick-example" class="hash-link" aria-label="Direct link to Quick Example" title="Direct link to Quick Example">​</a></h2>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> litellm</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Create character from video</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">character </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">avideo_create_character</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Luna"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    video</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"luna.mp4"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"rb"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    custom_llm_provider</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"openai"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"sora-2"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Character: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">character</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation builtin">id</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Use in generation</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">video </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">avideo</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"sora-2"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    prompt</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Luna dances through a magical forest."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    characters</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"id"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> character</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    seconds</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"8"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Get character info</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">fetched </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">avideo_get_character</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    character_id</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">character</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    custom_llm_provider</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"openai"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Edit with character preserved</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">edited </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">avideo_edit</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    video_id</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">video</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    prompt</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Add warm golden lighting"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Extend sequence</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">extended </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">avideo_extension</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    video_id</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">video</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    prompt</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Luna waves goodbye"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    seconds</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"5"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="via-proxy">Via Proxy<a href="https://docs.litellm.ai/blog/video_characters_api#via-proxy" class="hash-link" aria-label="Direct link to Via Proxy" title="Direct link to Via Proxy">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Create character</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST "http://localhost:4000/v1/videos/characters" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer sk-litellm-key" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -F "video=@luna.mp4" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -F "name=Luna"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Get character</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">curl -X GET "http://localhost:4000/v1/videos/characters/char_abc123def456" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer sk-litellm-key"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Edit video</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST "http://localhost:4000/v1/videos/edits" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer sk-litellm-key" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Content-Type: application/json" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "video": {"id": "video_xyz789"},</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "prompt": "Add warm golden lighting and enhance colors"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }'</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Extend video</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST "http://localhost:4000/v1/videos/extensions" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer sk-litellm-key" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Content-Type: application/json" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "video": {"id": "video_xyz789"},</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "prompt": "Luna waves goodbye and walks into the sunset",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "seconds": "5"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }'</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="managed-character-ids">Managed Character IDs<a href="https://docs.litellm.ai/blog/video_characters_api#managed-character-ids" class="hash-link" aria-label="Direct link to Managed Character IDs" title="Direct link to Managed Character IDs">​</a></h2>
<p>LiteLLM automatically encodes provider and model metadata into character IDs:</p>
<p><strong>What happens:</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Upload character "Luna" with model "sora-2" on OpenAI</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">LiteLLM creates: char_abc123def456 (contains provider + model_id)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">When you reference it later, LiteLLM decodes automatically</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Router knows exactly which deployment to use</span><br></span></code></pre></div></div>
<p><strong>Behind the scenes:</strong></p>
<ul>
<li>Character ID format: <code>character_&lt;base64_encoded_metadata&gt;</code></li>
<li>Metadata includes: provider, model_id, original_character_id</li>
<li>Transparent to you - just use the ID, LiteLLM handles routing</li>
</ul>]]></content:encoded>
            <category>videos</category>
            <category>characters</category>
            <category>proxy</category>
            <category>routing</category>
        </item>
        <item>
            <title><![CDATA[Realtime WebRTC HTTP Endpoints]]></title>
            <link>https://docs.litellm.ai/blog/realtime_webrtc_http_endpoints</link>
            <guid>https://docs.litellm.ai/blog/realtime_webrtc_http_endpoints</guid>
            <pubDate>Thu, 12 Mar 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Use the LiteLLM proxy to route OpenAI-style WebRTC realtime via HTTP: client_secrets and SDP exchange.]]></description>
            <content:encoded><![CDATA[<p>Connect to the Realtime API via WebRTC from browser/mobile clients. LiteLLM handles auth and key management.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-it-works">How it works<a href="https://docs.litellm.ai/blog/realtime_webrtc_http_endpoints#how-it-works" class="hash-link" aria-label="Direct link to How it works" title="Direct link to How it works">​</a></h2>
<p><img decoding="async" loading="lazy" alt="WebRTC flow: Browser, LiteLLM Proxy, and OpenAI/Azure" src="https://docs.litellm.ai/assets/images/webrtc_flow-fec21e7e5ee4dd2fefecd921464dac8d.png" width="1024" height="404" class="img_ev3q"></p>
<p><strong>Flow of generating ephemeral token</strong></p>
<p><img decoding="async" loading="lazy" alt="Ephemeral token flow: Browser requests token, LiteLLM gets real token from OpenAI, returns encrypted token" src="https://docs.litellm.ai/assets/images/ephemeral_token-4944942cb47e00195c87b88d8cda4650.png" width="1600" height="591" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="proxy-setup">Proxy Setup<a href="https://docs.litellm.ai/blog/realtime_webrtc_http_endpoints#proxy-setup" class="hash-link" aria-label="Direct link to Proxy Setup" title="Direct link to Proxy Setup">​</a></h2>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gpt</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4o</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">realtime</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> openai/gpt</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4o</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">realtime</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">preview</span><span class="token punctuation" style="color:#393A34">-</span><span class="token datetime number" style="color:#36acaa">2024-12-17</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/OPENAI_API_KEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">model_info</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">mode</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> realtime</span><br></span></code></pre></div></div>
<p><strong>Azure:</strong> use <code>model: azure/gpt-4o-realtime-preview</code>, <code>api_key</code>, <code>api_base</code>.</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">litellm --config /path/to/config.yaml</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="try-it-live">Try it live<a href="https://docs.litellm.ai/blog/realtime_webrtc_http_endpoints#try-it-live" class="hash-link" aria-label="Direct link to Try it live" title="Direct link to Try it live">​</a></h2>
<style>
.wrt-wrap {
  font-family: &#x27;JetBrains Mono&#x27;, &#x27;Fira Code&#x27;, monospace;
  background: #0d0d14;
  border: 1px solid #1e1e2e;
  border-radius: 10px;
  overflow: hidden;
  margin: 24px 0;
}

.wrt-toggle {
  display: flex;
  align-items: center;
  justify-content: space-between;
  padding: 14px 20px;
  cursor: pointer;
  user-select: none;
  background: #0d0d14;
  transition: background 0.15s;
}
.wrt-toggle:hover { background: #111120; }

.wrt-toggle-left { display: flex; align-items: center; gap: 10px; }

.wrt-live-dot {
  width: 8px; height: 8px; border-radius: 50%;
  background: #00ff88;
  box-shadow: 0 0 8px #00ff88;
  animation: wrt-blink 2s infinite;
}
@keyframes wrt-blink { 0%,100%{opacity:1} 50%{opacity:0.4} }

.wrt-toggle-title { font-size: 12px; font-weight: 600; color: #e2e8f0; letter-spacing: 0.06em; }
.wrt-toggle-sub { font-size: 10px; color: #4a5568; margin-top: 1px; }
.wrt-chevron { font-size: 11px; color: #4a5568; transition: transform 0.2s; }
.wrt-chevron.open { transform: rotate(180deg); }

.wrt-body {
  border-top: 1px solid #1e1e2e;
  display: grid;
  grid-template-columns: 280px 1fr;
  height: 460px;
}

.wrt-sidebar {
  border-right: 1px solid #1e1e2e;
  padding: 14px;
  display: flex;
  flex-direction: column;
  gap: 12px;
  overflow-y: auto;
}

.wrt-label {
  font-size: 9px;
  letter-spacing: 0.15em;
  color: #4a5568;
  text-transform: uppercase;
  margin-bottom: 5px;
}

.wrt-field { display: flex; flex-direction: column; gap: 4px; margin-bottom: 6px; }
.wrt-field label { font-size: 10px; color: #4a5568; }
.wrt-field input {
  background: #0a0a0f;
  border: 1px solid #1e1e2e;
  border-radius: 5px;
  color: #e2e8f0;
  font-family: inherit;
  font-size: 11px;
  padding: 7px 9px;
  outline: none;
  width: 100%;
  transition: border-color 0.2s;
}
.wrt-field input:focus { border-color: #7c3aed; }

.wrt-divider { height: 1px; background: #1e1e2e; }

.wrt-btn {
  display: flex; align-items: center; justify-content: center;
  border: none; border-radius: 5px; cursor: pointer;
  font-family: inherit; font-size: 11px; font-weight: 600;
  padding: 8px; width: 100%;
  transition: all 0.15s; letter-spacing: 0.04em;
}
.wrt-btn + .wrt-btn { margin-top: 5px; }
.wrt-btn-primary { background: #00ff88; color: #000; }
.wrt-btn-primary:hover:not(:disabled) { filter: brightness(1.1); }
.wrt-btn-primary:disabled { opacity: 0.35; cursor: not-allowed; }
.wrt-btn-danger { background: transparent; color: #ff4466; border: 1px solid #ff4466; }
.wrt-btn-danger:hover:not(:disabled) { background: rgba(255,68,102,0.08); }
.wrt-btn-danger:disabled { opacity: 0.3; cursor: not-allowed; }
.wrt-btn-ghost { background: #111118; color: #e2e8f0; border: 1px solid #1e1e2e; }
.wrt-btn-ghost:hover { border-color: #7c3aed; }

.wrt-flow { display: flex; align-items: center; padding: 4px 0; gap: 0; }
.wrt-flow-box {
  padding: 4px 7px; border-radius: 4px; font-size: 9px;
  border: 1px solid #1e1e2e; color: #4a5568;
  transition: all 0.3s; white-space: nowrap;
}
.wrt-flow-box.active { border-color: #00ff88; color: #00ff88; box-shadow: 0 0 8px rgba(0,255,136,0.15); }
.wrt-flow-arrow { font-size: 10px; color: #4a5568; padding: 0 4px; transition: color 0.3s; }
.wrt-flow-arrow.active { color: #00ff88; }

.wrt-meta { display: flex; flex-direction: column; gap: 4px; }
.wrt-meta-row { display: flex; justify-content: space-between; font-size: 10px; }
.wrt-meta-row span:first-child { color: #4a5568; }
.wrt-meta-row span:last-child { color: #e2e8f0; }

.wrt-status-pill {
  display: flex; align-items: center; gap: 6px;
  font-size: 10px; color: #4a5568;
  background: #111118; border: 1px solid #1e1e2e;
  border-radius: 100px; padding: 3px 10px;
}
.wrt-status-dot {
  width: 6px; height: 6px; border-radius: 50%;
  background: #4a5568; transition: all 0.3s;
}
.wrt-status-dot.connected { background: #00ff88; box-shadow: 0 0 6px #00ff88; }
.wrt-status-dot.connecting { background: #ffaa00; animation: wrt-blink 1s infinite; }
.wrt-status-dot.error { background: #ff4466; }

.wrt-main { display: flex; flex-direction: column; overflow: hidden; }

.wrt-header {
  display: flex; align-items: center; justify-content: space-between;
  padding: 8px 14px; border-bottom: 1px solid #1e1e2e; background: #111118;
}
.wrt-header-title { font-size: 10px; color: #4a5568; letter-spacing: 0.08em; }

.wrt-tabs { display: flex; padding: 0 14px; border-bottom: 1px solid #1e1e2e; }
.wrt-tab {
  font-size: 9px; letter-spacing: 0.08em; padding: 10px 12px; cursor: pointer;
  color: #4a5568; border-bottom: 2px solid transparent; transition: all 0.15s;
  user-select: none;
}
.wrt-tab.active { color: #00ff88; border-bottom-color: #00ff88; }
.wrt-tab:hover:not(.active) { color: #e2e8f0; }

.wrt-tab-content { flex: 1; overflow: hidden; display: none; flex-direction: column; }
.wrt-tab-content.active { display: flex; }

.wrt-log {
  flex: 1; overflow-y: auto; padding: 8px 12px;
  display: flex; flex-direction: column; gap: 2px;
}
.wrt-log::-webkit-scrollbar { width: 3px; }
.wrt-log::-webkit-scrollbar-thumb { background: #1e1e2e; border-radius: 2px; }

.wrt-entry {
  display: grid; grid-template-columns: 58px 56px 1fr; gap: 8px;
  padding: 3px 7px; border-radius: 3px;
  border-left: 2px solid transparent;
  font-size: 10px; line-height: 1.5;
  animation: wrt-fadein 0.15s ease;
}
@keyframes wrt-fadein { from { opacity:0; transform:translateY(2px); } to { opacity:1; transform:none; } }

.wrt-entry.info    { border-left-color: #7c3aed; }
.wrt-entry.info    .we-tag { color: #7c3aed; }
.wrt-entry.success { border-left-color: #00ff88; }
.wrt-entry.success .we-tag { color: #00ff88; }
.wrt-entry.error   { border-left-color: #ff4466; }
.wrt-entry.error   .we-tag { color: #ff4466; }
.wrt-entry.warn    { border-left-color: #ffaa00; }
.wrt-entry.warn    .we-tag { color: #ffaa00; }
.wrt-entry.step    { border-left-color: #60a5fa; }
.wrt-entry.step    .we-tag { color: #60a5fa; }

.we-time { color: #4a5568; font-size: 9px; padding-top: 1px; }
.we-tag  { font-size: 9px; font-weight: 700; padding-top: 1px; }
.we-msg  { color: #e2e8f0; word-break: break-all; white-space: pre-wrap; }

.wrt-empty {
  display: flex; flex-direction: column; align-items: center; justify-content: center;
  flex: 1; gap: 6px; color: #4a5568; font-size: 11px;
}

.wrt-sdp-pane { flex: 1; display: grid; grid-template-columns: 1fr 1fr; overflow: hidden; }
.wrt-sdp-box { display: flex; flex-direction: column; border-right: 1px solid #1e1e2e; overflow: hidden; }
.wrt-sdp-box:last-child { border-right: none; }
.wrt-sdp-hdr {
  padding: 7px 12px; border-bottom: 1px solid #1e1e2e;
  font-size: 9px; color: #4a5568; letter-spacing: 0.08em;
  display: flex; align-items: center; gap: 6px;
}
.wrt-sdp-dot { width: 5px; height: 5px; border-radius: 50%; background: #1e1e2e; }
.wrt-sdp-dot.active { background: #00ff88; }
.wrt-sdp-pane textarea {
  flex: 1; background: transparent; border: none; color: #e2e8f0;
  font-family: inherit; font-size: 10px; padding: 10px 12px;
  resize: none; outline: none; line-height: 1.5;
}

.wrt-audio-pane {
  flex: 1; display: flex; flex-direction: column;
  align-items: center; justify-content: center; gap: 14px;
}
.wrt-viz { display: flex; align-items: center; gap: 2px; height: 44px; }
.wrt-bar { width: 3px; border-radius: 2px; min-height: 2px; background: #00ff88; transition: height 0.05s; }
.wrt-mic-btn {
  width: 52px; height: 52px; border-radius: 50%;
  background: #111118; border: 1.5px solid #1e1e2e;
  font-size: 18px; cursor: pointer;
  display: flex; align-items: center; justify-content: center; transition: all 0.2s;
}
.wrt-mic-btn.active { border-color: #00ff88; box-shadow: 0 0 16px rgba(0,255,136,0.2); }
.wrt-audio-status { font-size: 10px; color: #4a5568; text-align: center; }
</style><div class="wrt-wrap"><div class="wrt-toggle closed"><div class="wrt-toggle-left"><div class="wrt-live-dot"></div><div><div class="wrt-toggle-title">INTERACTIVE TESTER</div><div class="wrt-toggle-sub">Browser → LiteLLM → OpenAI · WebRTC</div></div></div><span class="wrt-chevron">▼</span></div></div><style>
.wrt-wrap {
  background: #1f2937;
  border: 1px solid #334155;
}
.wrt-toggle,
.wrt-toggle:hover {
  background: #111827;
}
.wrt-toggle-title,
.we-msg {
  color: #e2e8f0;
}
.wrt-toggle-sub,
.wrt-label,
.wrt-field label,
.wrt-flow-box,
.wrt-flow-arrow,
.wrt-meta-row span:first-child,
.wrt-header-title,
.wrt-tab,
.we-time {
  color: #94a3b8;
}
.wrt-body,
.wrt-sidebar,
.wrt-main,
.wrt-header,
.wrt-tabs,
.wrt-sdp-box,
.wrt-sdp-hdr,
.wrt-divider {
  border-color: #334155;
}
.wrt-header {
  background: #111827;
}
.wrt-field input,
.wrt-mic-btn,
.wrt-status-pill {
  background: #0b1220;
  border-color: #334155;
  color: #e2e8f0;
}
.wrt-field input:focus,
.wrt-btn-ghost:hover {
  border-color: #60a5fa;
}
.wrt-btn-ghost {
  background: #0b1220;
  border-color: #334155;
  color: #e2e8f0;
}
.wrt-log::-webkit-scrollbar-thumb {
  background: #475569;
}
.wrt-tab.active {
  color: #93c5fd;
  border-bottom-color: #93c5fd;
}
.wrt-empty,
.wrt-audio-status,
.wrt-meta-row span:last-child {
  color: #cbd5e1;
}
.wrt-sdp-dot {
  background: #475569;
}
.wrt-sdp-pane textarea {
  color: #e2e8f0;
}
</style>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="client-usage">Client Usage<a href="https://docs.litellm.ai/blog/realtime_webrtc_http_endpoints#client-usage" class="hash-link" aria-label="Direct link to Client Usage" title="Direct link to Client Usage">​</a></h2>
<p><strong>1. Get token</strong> - <code>POST /v1/realtime/client_secrets</code> with LiteLLM API key and <code>{ model }</code>.</p>
<p><strong>2. WebRTC handshake</strong> - Create <code>RTCPeerConnection</code>, add mic track, create data channel <code>oai-events</code>, send SDP offer to <code>POST /v1/realtime/calls</code> with <code>Authorization: Bearer &lt;encrypted_token&gt;</code> and <code>Content-Type: application/sdp</code>.</p>
<p><strong>3. Events</strong> - Use the data channel for <code>session.update</code> and other events.</p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Full code example</summary><div><div class="collapsibleContent_i85q"><div class="language-javascript codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-javascript codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">// 1. Token</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> r </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword control-flow" style="color:#00009f">await</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">fetch</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"http://proxy:4000/v1/realtime/client_secrets"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token literal-property property" style="color:#36acaa">method</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"POST"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token literal-property property" style="color:#36acaa">headers</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token string-property property" style="color:#36acaa">"Authorization"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Bearer sk-litellm-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string-property property" style="color:#36acaa">"Content-Type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"application/json"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token literal-property property" style="color:#36acaa">body</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token known-class-name class-name">JSON</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">stringify</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token literal-property property" style="color:#36acaa">model</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"gpt-4o-realtime"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> client_secret </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword control-flow" style="color:#00009f">await</span><span class="token plain"> r</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">json</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> token </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client_secret</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">value</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">// 2. WebRTC</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> pc </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">new</span><span class="token plain"> </span><span class="token class-name">RTCPeerConnection</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> audio </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token dom variable" style="color:#36acaa">document</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">createElement</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"audio"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">audio</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">autoplay</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">pc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method-variable function-variable method function property-access" style="color:#d73a49">ontrack</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token parameter">e</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token arrow operator" style="color:#393A34">=&gt;</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">audio</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">srcObject</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> e</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">streams</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> ms </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword control-flow" style="color:#00009f">await</span><span class="token plain"> </span><span class="token dom variable" style="color:#36acaa">navigator</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">mediaDevices</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">getUserMedia</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token literal-property property" style="color:#36acaa">audio</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">pc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">addTrack</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">ms</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">getTracks</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> dc </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> pc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">createDataChannel</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"oai-events"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> offer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword control-flow" style="color:#00009f">await</span><span class="token plain"> pc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">createOffer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword control-flow" style="color:#00009f">await</span><span class="token plain"> pc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">setLocalDescription</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">offer</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">const</span><span class="token plain"> sdpRes </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword control-flow" style="color:#00009f">await</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">fetch</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"http://proxy:4000/v1/realtime/calls"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token literal-property property" style="color:#36acaa">method</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"POST"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token literal-property property" style="color:#36acaa">headers</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token string-property property" style="color:#36acaa">"Authorization"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token template-string template-punctuation string" style="color:#e3116c">`</span><span class="token template-string string" style="color:#e3116c">Bearer </span><span class="token template-string interpolation interpolation-punctuation punctuation" style="color:#393A34">${</span><span class="token template-string interpolation">token</span><span class="token template-string interpolation interpolation-punctuation punctuation" style="color:#393A34">}</span><span class="token template-string template-punctuation string" style="color:#e3116c">`</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string-property property" style="color:#36acaa">"Content-Type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"application/sdp"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token literal-property property" style="color:#36acaa">body</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> offer</span><span class="token punctuation" style="color:#393A34">.</span><span class="token property-access">sdp</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword control-flow" style="color:#00009f">await</span><span class="token plain"> pc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">setRemoteDescription</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token literal-property property" style="color:#36acaa">type</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"answer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token literal-property property" style="color:#36acaa">sdp</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token keyword control-flow" style="color:#00009f">await</span><span class="token plain"> sdpRes</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">text</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">// 3. Events</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">dc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">send</span><span class="token punctuation" style="color:#393A34">(</span><span class="token known-class-name class-name">JSON</span><span class="token punctuation" style="color:#393A34">.</span><span class="token method function property-access" style="color:#d73a49">stringify</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token literal-property property" style="color:#36acaa">type</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"session.update"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token literal-property property" style="color:#36acaa">session</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token literal-property property" style="color:#36acaa">instructions</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"..."</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></span></code></pre></div></div></div></div></details>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="faq">FAQ<a href="https://docs.litellm.ai/blog/realtime_webrtc_http_endpoints#faq" class="hash-link" aria-label="Direct link to FAQ" title="Direct link to FAQ">​</a></h2>
<p><strong>Q: What do I do if I get a 401 Token expired error?</strong><br>
<!-- -->A: Tokens are short-lived. Get a fresh token right before creating the WebRTC offer.</p>
<p><strong>Q: Which key should I use for <code>/v1/realtime/calls</code>?</strong><br>
<!-- -->A: Use the <strong>encrypted token</strong> from <code>client_secrets</code>, not your raw API key.</p>
<p><strong>Q: Should I pass the <code>model</code> parameter when making the call?</strong><br>
<!-- -->A: No, the encrypted token already encodes all routing information including model.</p>
<p><strong>Q: How do I resolve Azure <code>api-version</code> errors?</strong><br>
<!-- -->A: Set the correct <code>api_version</code> in <code>litellm_params</code> (or via the <code>AZURE_API_VERSION</code> environment variable), along with the right <code>api_base</code> and deployment values.</p>
<p><strong>Q: What if I get no audio?</strong><br>
<!-- -->A: Make sure you grant microphone permission, ensure <code>pc.ontrack</code> assigns the audio element with <code>autoplay</code> enabled, check your network/firewall for WebRTC traffic, and inspect the browser console for ICE or SDP errors.</p>]]></content:encoded>
            <category>realtime</category>
            <category>webrtc</category>
            <category>proxy</category>
            <category>openai</category>
        </item>
        <item>
            <title><![CDATA[Day 0 Support: GPT-5.4]]></title>
            <link>https://docs.litellm.ai/blog/gpt_5_4</link>
            <guid>https://docs.litellm.ai/blog/gpt_5_4</guid>
            <pubDate>Thu, 05 Mar 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[GPT-5.4 model support in LiteLLM]]></description>
            <content:encoded><![CDATA[<p>LiteLLM now supports fully GPT-5.4!</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="docker-image">Docker Image<a href="https://docs.litellm.ai/blog/gpt_5_4#docker-image" class="hash-link" aria-label="Direct link to Docker Image" title="Direct link to Docker Image">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker pull ghcr.io/berriai/litellm:v1.81.14-stable.gpt-5.4_patch</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage">Usage<a href="https://docs.litellm.ai/blog/gpt_5_4#usage" class="hash-link" aria-label="Direct link to Usage" title="Direct link to Usage">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">LiteLLM SDK</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gpt</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">5.4</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> openai/gpt</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">5.4</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/OPENAI_API_KEY</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e OPENAI_API_KEY=$OPENAI_API_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:v1.81.14-stable.gpt-5.4_patch \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST "http://0.0.0.0:4000/chat/completions" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Content-Type: application/json" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer $LITELLM_KEY" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "gpt-5.4",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      {"role": "user", "content": "Write a Python function to check if a number is prime."}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }'</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"openai/gpt-5.4"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Write a Python function to check if a number is prime."</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="notes">Notes<a href="https://docs.litellm.ai/blog/gpt_5_4#notes" class="hash-link" aria-label="Direct link to Notes" title="Direct link to Notes">​</a></h2>
<ul>
<li>Restart your container to get the cost tracking for this model.</li>
<li>Use <code>/responses</code> for better model performance.</li>
<li>GPT-5.4 supports reasoning, function calling, vision, and tool-use — see the <a href="https://docs.litellm.ai/docs/providers/openai">OpenAI provider docs</a> for advanced usage.</li>
</ul>]]></content:encoded>
            <category>openai</category>
            <category>gpt-5.4</category>
            <category>completion</category>
        </item>
        <item>
            <title><![CDATA[DAY 0 Support: Gemini 3.1 Flash Lite Preview on LiteLLM]]></title>
            <link>https://docs.litellm.ai/blog/gemini_3_1_flash_lite_preview</link>
            <guid>https://docs.litellm.ai/blog/gemini_3_1_flash_lite_preview</guid>
            <pubDate>Tue, 03 Mar 2026 08:00:00 GMT</pubDate>
            <description><![CDATA[Guide to using Gemini 3.1 Flash Lite Preview on LiteLLM Proxy and SDK with day 0 support.]]></description>
            <content:encoded><![CDATA[<p>LiteLLM now supports <code>gemini-3.1-flash-lite-preview</code> with full day 0 support!</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>If you only want cost tracking, you need no change in your current Litellm version. But if you want the support for new features introduced along with it like thinking levels, you will need to use v1.80.8-stable.1 or above.</p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="deploy-this-version">Deploy this version<a href="https://docs.litellm.ai/blog/gemini_3_1_flash_lite_preview#deploy-this-version" class="hash-link" aria-label="Direct link to Deploy this version" title="Direct link to Deploy this version">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">Docker</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">Pip</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><div class="language-showLineNumbers language-showlinenumbers codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">docker run litellm</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-showlinenumbers codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-e STORE_MODEL_IN_DB=True \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ghcr.io/berriai/litellm:main-v1.80.8-stable.1</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-showLineNumbers language-showlinenumbers codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">pip install litellm</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-showlinenumbers codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">pip install litellm==v1.80.8-stable.1</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-new">What's New<a href="https://docs.litellm.ai/blog/gemini_3_1_flash_lite_preview#whats-new" class="hash-link" aria-label="Direct link to What's New" title="Direct link to What's New">​</a></h2>
<p>Supports all four thinking levels:</p>
<ul>
<li><strong>MINIMAL</strong>: Ultra-fast responses with minimal reasoning</li>
<li><strong>LOW</strong>: Simple instruction following</li>
<li><strong>MEDIUM</strong>: Balanced reasoning for complex tasks</li>
<li><strong>HIGH</strong>: Maximum reasoning depth (dynamic)</li>
</ul>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="quick-start">Quick Start<a href="https://docs.litellm.ai/blog/gemini_3_1_flash_lite_preview#quick-start" class="hash-link" aria-label="Direct link to Quick Start" title="Direct link to Quick Start">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">SDK</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">PROXY</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>Basic Usage</strong></p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gemini/gemini-3.1-flash-lite-preview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Extract key entities from this text: ..."</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div><p><strong>With Thinking Levels</strong></p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Use MEDIUM thinking for complex reasoning tasks</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gemini/gemini-3.1-flash-lite-preview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Analyze this dataset and identify patterns"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    reasoning_effort</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"medium"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># low, medium , high</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">flash</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">lite</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gemini/gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">flash</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">lite</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">preview</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/GEMINI_API_KEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Or use Vertex AI</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vertex</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">flash</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">lite</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vertex_ai/gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">flash</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">lite</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">preview</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">vertex_project</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> your</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">project</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">id</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">vertex_location</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> us</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">central1</span><br></span></code></pre></div></div><p><strong>2. Start proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">litellm --config /path/to/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Make requests</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST http://localhost:4000/v1/chat/completions \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Content-Type: application/json" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer &lt;YOUR-LITELLM-KEY&gt;" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "gemini-3.1-flash-lite",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [{"role": "user", "content": "Extract structured data from this text"}],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "reasoning_effort": "low"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }'</span><br></span></code></pre></div></div></div></div></div>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="supported-endpoints">Supported Endpoints<a href="https://docs.litellm.ai/blog/gemini_3_1_flash_lite_preview#supported-endpoints" class="hash-link" aria-label="Direct link to Supported Endpoints" title="Direct link to Supported Endpoints">​</a></h2>
<p>LiteLLM provides <strong>full end-to-end support</strong> for Gemini 3.1 Flash Lite Preview on:</p>
<ul>
<li>✅ <code>/v1/chat/completions</code> - OpenAI-compatible chat completions endpoint</li>
<li>✅ <code>/v1/responses</code> - OpenAI Responses API endpoint (streaming and non-streaming)</li>
<li>✅ <a href="https://docs.litellm.ai/docs/anthropic_unified"><code>/v1/messages</code></a> - Anthropic-compatible messages endpoint</li>
<li>✅ <code>/v1/generateContent</code> – <a href="https://docs.litellm.ai/docs/generateContent.md">Google Gemini API</a> compatible endpoint</li>
</ul>
<p>All endpoints support:</p>
<ul>
<li>Streaming and non-streaming responses</li>
<li>Function calling with thought signatures</li>
<li>Multi-turn conversations</li>
<li>All Gemini 3-specific features (thinking levels, thought signatures)</li>
<li>Full multimodal support (text, image, audio, video)</li>
</ul>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="reasoning_effort-mapping-for-gemini-31"><code>reasoning_effort</code> Mapping for Gemini 3.1<a href="https://docs.litellm.ai/blog/gemini_3_1_flash_lite_preview#reasoning_effort-mapping-for-gemini-31" class="hash-link" aria-label="Direct link to reasoning_effort-mapping-for-gemini-31" title="Direct link to reasoning_effort-mapping-for-gemini-31">​</a></h2>
<p>LiteLLM automatically maps OpenAI's <code>reasoning_effort</code> parameter to Gemini's <code>thinkingLevel</code>:</p>
<table><thead><tr><th>reasoning_effort</th><th>thinking_level</th><th>Use Case</th></tr></thead><tbody><tr><td><code>minimal</code></td><td><code>minimal</code></td><td>Ultra-fast responses, simple queries</td></tr><tr><td><code>low</code></td><td><code>low</code></td><td>Basic instruction following</td></tr><tr><td><code>medium</code></td><td><code>medium</code></td><td>Balanced reasoning for moderate complexity</td></tr><tr><td><code>high</code></td><td><code>high</code></td><td>Maximum reasoning depth, complex problems</td></tr><tr><td><code>disable</code></td><td><code>minimal</code></td><td>Disable extended reasoning</td></tr><tr><td><code>none</code></td><td><code>minimal</code></td><td>No extended reasoning</td></tr></tbody></table>]]></content:encoded>
            <category>gemini</category>
            <category>day 0 support</category>
            <category>llms</category>
            <category>supernova</category>
        </item>
        <item>
            <title><![CDATA[Incident Report: Cache Eviction Closes In-Use httpx Clients]]></title>
            <link>https://docs.litellm.ai/blog/httpx-cache-eviction-incident</link>
            <guid>https://docs.litellm.ai/blog/httpx-cache-eviction-incident</guid>
            <pubDate>Fri, 27 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Date: February 27, 2026]]></description>
            <content:encoded><![CDATA[<p><strong>Date:</strong> February 27, 2026
<strong>Duration:</strong> ~6 days (Feb 21 merge -&gt; Feb 27 fix)
<strong>Severity:</strong> High
<strong>Status:</strong> Resolved</p>
<blockquote>
<p><strong>Note:</strong> This fix is available starting from LiteLLM <code>v1.81.14.rc.2</code> or higher.</p>
</blockquote>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="summary">Summary<a href="https://docs.litellm.ai/blog/httpx-cache-eviction-incident#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary">​</a></h2>
<p>A change to improve Redis connection pool cleanup introduced a regression that closed <strong>httpx clients</strong> that were still actively being used by the proxy. The <code>LLMClientCache</code> (an in-memory TTL cache) stores both Redis clients <em>and</em> httpx clients under the same eviction policy. When a cache entry expired or was evicted, the new cleanup code called <code>aclose()</code>/<code>close()</code> on the evicted value which worked correctly for Redis clients, but destroyed httpx clients that other parts of the system still held references to and were actively using for LLM API calls.</p>
<p><strong>Impact:</strong> Any proxy instance that hit the cache TTL (default 10 minutes) or capacity limit (200 entries) would have its httpx clients closed out from under it, causing requests to LLM providers to fail with connection errors.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="background">Background<a href="https://docs.litellm.ai/blog/httpx-cache-eviction-incident#background" class="hash-link" aria-label="Direct link to Background" title="Direct link to Background">​</a></h2>
<p><code>LLMClientCache</code> extends <code>InMemoryCache</code> and is used to cache SDK clients (OpenAI, Anthropic, etc.) to avoid re-creating them on every request. These clients are keyed by configuration + event loop ID. The cache has:</p>
<ul>
<li><strong>Max size:</strong> 200 entries</li>
<li><strong>Default TTL:</strong> 10 minutes</li>
</ul>
<p>When the cache is full or entries expire, <code>InMemoryCache.evict_cache()</code> calls <code>_remove_key()</code> to drop entries.</p>
<p>The cached values are a mix of:</p>
<ul>
<li><strong>Redis/async Redis clients</strong> — owned exclusively by the cache, safe to close on eviction</li>
<li><strong>httpx-backed SDK clients</strong> (OpenAI, Anthropic, etc.) — shared references, still in use by router/model instances</li>
</ul>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="root-cause">Root Cause<a href="https://docs.litellm.ai/blog/httpx-cache-eviction-incident#root-cause" class="hash-link" aria-label="Direct link to Root Cause" title="Direct link to Root Cause">​</a></h2>
<p><a href="https://github.com/BerriAI/litellm/pull/21717" target="_blank" rel="noopener noreferrer">PR #21717</a> overrode <code>_remove_key()</code> in <code>LLMClientCache</code> to close async clients on eviction:</p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Problematic code added in PR #21717</summary><div><div class="collapsibleContent_i85q"><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">LLMClientCache</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">InMemoryCache</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">_remove_key</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        value </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">cache_dict</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">key</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token builtin">super</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">_remove_key</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">key</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> value </span><span class="token keyword" style="color:#00009f">is</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            close_fn </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">getattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">value</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"aclose"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">or</span><span class="token plain"> </span><span class="token builtin">getattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">value</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"close"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> close_fn </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> asyncio</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">iscoroutinefunction</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">close_fn</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">try</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    asyncio</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_running_loop</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create_task</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">close_fn</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">except</span><span class="token plain"> RuntimeError</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token keyword" style="color:#00009f">pass</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">elif</span><span class="token plain"> close_fn </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> </span><span class="token builtin">callable</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">close_fn</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">try</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    close_fn</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">except</span><span class="token plain"> Exception</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token keyword" style="color:#00009f">pass</span><br></span></code></pre></div></div></div></div></details>
<p>The intent was correct for Redis clients — prevent connection pool leaks when cached Redis clients expire. But <code>LLMClientCache</code> also stores httpx-backed SDK clients (e.g., <code>AsyncOpenAI</code>, <code>AsyncAnthropic</code>). These clients:</p>
<ol>
<li>Have an <code>aclose()</code> method (inherited from httpx)</li>
<li>Are still held by references elsewhere in the codebase (router, model instances)</li>
<li>Were being closed without any check on whether they were still in use</li>
</ol>
<p>So when the cache evicted an entry, it would call <code>aclose()</code> on an httpx client that was still being used for active LLM requests → closed transport → connection errors.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-fix">The Fix<a href="https://docs.litellm.ai/blog/httpx-cache-eviction-incident#the-fix" class="hash-link" aria-label="Direct link to The Fix" title="Direct link to The Fix">​</a></h2>
<p><a href="https://github.com/BerriAI/litellm/pull/22247" target="_blank" rel="noopener noreferrer">PR #22247</a> removed the <code>_remove_key</code> override entirely:</p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>The fix (PR #22247)</summary><div><div class="collapsibleContent_i85q"><div class="language-diff codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-diff codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"> class LLMClientCache(InMemoryCache):</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-    def _remove_key(self, key: str) -&gt; None:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-        """Close async clients before evicting them to prevent connection pool leaks."""</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-        value = self.cache_dict.get(key)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-        super()._remove_key(key)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-        if value is not None:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-            close_fn = getattr(value, "aclose", None) or getattr(</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-                value, "close", None</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-            )</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-            ...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     def update_cache_key_with_event_loop(self, key):</span><br></span></code></pre></div></div></div></div></details>
<p>The eviction now simply drops the reference and lets Python's GC handle cleanup, which is safe because:</p>
<ul>
<li>httpx clients that are still referenced elsewhere stay alive</li>
<li>Unreferenced clients get cleaned up by GC naturally</li>
</ul>
<p>The other improvements from PR #21717 were kept:</p>
<ul>
<li><strong><code>max_connections</code> respected for URL-based Redis configs</strong>, previously silently dropped</li>
<li><strong><code>disconnect()</code> now closes both sync and async Redis clients</strong>, sync client was previously leaked</li>
<li><strong>Connection pool passthrough</strong>, when a pool is provided with a URL config, it's used directly instead of creating a duplicate</li>
</ul>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="remediation">Remediation<a href="https://docs.litellm.ai/blog/httpx-cache-eviction-incident#remediation" class="hash-link" aria-label="Direct link to Remediation" title="Direct link to Remediation">​</a></h2>
<table><thead><tr><th>Action</th><th>Status</th><th>Code</th></tr></thead><tbody><tr><td>Remove <code>_remove_key</code> override that closes shared clients on eviction</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/pull/22247" target="_blank" rel="noopener noreferrer">PR #22247</a></td></tr><tr><td>Add e2e test: evicted client still usable (capacity)</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/pull/22313" target="_blank" rel="noopener noreferrer">PR #22313</a></td></tr><tr><td>Add e2e test: expired client still usable (TTL)</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/pull/22313" target="_blank" rel="noopener noreferrer">PR #22313</a></td></tr></tbody></table>
<p>The e2e tests go through <code>get_async_httpx_client()</code> the same code path the proxy uses in production and assert the client is still functional after eviction. These run in CI on every PR against <code>main</code>. If anyone modifies <code>LLMClientCache</code> eviction behavior, overrides <code>_remove_key</code>, or adds any form of client cleanup on eviction, these tests will fail regardless of the implementation approach.</p>]]></content:encoded>
            <category>incident-report</category>
            <category>caching</category>
            <category>stability</category>
        </item>
        <item>
            <title><![CDATA[Day 0 Support: GPT-5.3-Codex]]></title>
            <link>https://docs.litellm.ai/blog/gpt_5_3_codex</link>
            <guid>https://docs.litellm.ai/blog/gpt_5_3_codex</guid>
            <pubDate>Tue, 24 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Day 0 support for GPT-5.3-Codex on LiteLLM, including phase parameter handling for Responses API.]]></description>
            <content:encoded><![CDATA[<p>LiteLLM now supports GPT-5.3-Codex on Day 0, including support for the new assistant <code>phase</code> metadata on Responses API output items.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-phase-matters-for-gpt-53-codex">Why <code>phase</code> matters for GPT-5.3-Codex<a href="https://docs.litellm.ai/blog/gpt_5_3_codex#why-phase-matters-for-gpt-53-codex" class="hash-link" aria-label="Direct link to why-phase-matters-for-gpt-53-codex" title="Direct link to why-phase-matters-for-gpt-53-codex">​</a></h2>
<p><code>phase</code> appears on assistant output items and helps distinguish preamble/commentary turns from final closeout responses.</p>
<p>Reference: <a href="https://developers.openai.com/api/reference/overview" target="_blank" rel="noopener noreferrer">Phase parameter docs</a></p>
<p>Supported values:</p>
<ul>
<li><code>null</code></li>
<li><code>"commentary"</code></li>
<li><code>"final_answer"</code></li>
</ul>
<p>Important:</p>
<ul>
<li>Persist assistant output items with <code>phase</code> exactly as returned.</li>
<li>Send those assistant items back on the next turn.</li>
<li>Do <strong>not</strong> add <code>phase</code> to user messages.</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="docker-image">Docker Image<a href="https://docs.litellm.ai/blog/gpt_5_3_codex#docker-image" class="hash-link" aria-label="Direct link to Docker Image" title="Direct link to Docker Image">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker pull ghcr.io/berriai/litellm:v1.81.12-stable.gpt-5.3</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage">Usage<a href="https://docs.litellm.ai/blog/gpt_5_3_codex#usage" class="hash-link" aria-label="Direct link to Usage" title="Direct link to Usage">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gpt</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">5.3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">codex</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> openai/gpt</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">5.3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">codex</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e ANTHROPIC_API_KEY=$OPENAI_API_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:v1.81.12-stable.gpt-5.3 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST "http://0.0.0.0:4000/v1/responses" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Content-Type: application/json" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer $LITELLM_KEY" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "gpt-5.3-codex",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "input": "Write a Python script that checks if a number is prime."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }'</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="python-example-persist-phase-with-openai-client--litellm-base-url">Python Example: Persist <code>phase</code> with OpenAI Client + LiteLLM Base URL<a href="https://docs.litellm.ai/blog/gpt_5_3_codex#python-example-persist-phase-with-openai-client--litellm-base-url" class="hash-link" aria-label="Direct link to python-example-persist-phase-with-openai-client--litellm-base-url" title="Direct link to python-example-persist-phase-with-openai-client--litellm-base-url">​</a></h2>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OpenAI</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">client </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OpenAI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    base_url</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"http://0.0.0.0:4000/v1"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># LiteLLM Proxy</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-litellm-api-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">items </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Persist this per conversation/thread</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">_item_get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> key</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> default</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token builtin">isinstance</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">dict</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> item</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">key</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> default</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token builtin">getattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> key</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> default</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">run_turn</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">user_text</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">global</span><span class="token plain"> items</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># User message: no phase field</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    items</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"message"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"input_text"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> user_text</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    resp </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">responses</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gpt-5.3-codex"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token builtin">input</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">items</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Persist assistant output items verbatim, including phase</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> out_item </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">resp</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">output </span><span class="token keyword" style="color:#00009f">or</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        items</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">out_item</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Optional: inspect latest phase for UI/telemetry routing</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    latest_phase </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> out_item </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">reversed</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">resp</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">output </span><span class="token keyword" style="color:#00009f">or</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> _item_get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">out_item</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"output_item.done"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> _item_get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">out_item</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"phase"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">is</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            latest_phase </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> _item_get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">out_item</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"phase"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">break</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> resp</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> latest_phase</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="notes">Notes<a href="https://docs.litellm.ai/blog/gpt_5_3_codex#notes" class="hash-link" aria-label="Direct link to Notes" title="Direct link to Notes">​</a></h2>
<ul>
<li>Use <code>/v1/responses</code> for GPT Codex models.</li>
<li>Preserve full assistant output history for best multi-turn behavior.</li>
<li>If <code>phase</code> metadata is dropped during history reconstruction, output quality can degrade on long-running tasks.</li>
</ul>]]></content:encoded>
            <category>openai</category>
            <category>gpt-5.3-codex</category>
            <category>codex</category>
            <category>day 0 support</category>
        </item>
        <item>
            <title><![CDATA[Incident Report: Encrypted Content Failures in Multi-Region Responses API Load Balancing]]></title>
            <link>https://docs.litellm.ai/blog/responses-api-encrypted-content-incident</link>
            <guid>https://docs.litellm.ai/blog/responses-api-encrypted-content-incident</guid>
            <pubDate>Tue, 24 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Date: Feb 24, 2026]]></description>
            <content:encoded><![CDATA[<p><strong>Date:</strong> Feb 24, 2026<br>
<strong>Duration:</strong> Ongoing (until fix deployed)<br>
<strong>Severity:</strong> High (for users load balancing Responses API across different API keys)<br>
<strong>Status:</strong> Resolved</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="summary">Summary<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary">​</a></h2>
<p>When load balancing OpenAI's Responses API across deployments with <strong>different API keys</strong> (e.g., different Azure regions or OpenAI organizations), follow-up requests containing encrypted content items (like <code>rs_...</code> reasoning items) would fail with:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"error"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"message"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"The encrypted content for item rs_0d09d6e56879e76500699d6feee41c8197bd268aae76141f87 could not be verified. Reason: Encrypted content organization_id did not match the target organization."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"invalid_request_error"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"code"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"invalid_encrypted_content"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p>Encrypted content items are cryptographically tied to the API key's organization that created them. When the router load balanced a follow-up request to a deployment with a different API key, decryption failed.</p>
<ul>
<li><strong>Responses API calls with encrypted content:</strong> Complete failure when routed to wrong deployment</li>
<li><strong>Initial requests:</strong> Unaffected — only follow-up requests containing encrypted items failed</li>
<li><strong>Other API endpoints:</strong> No impact — chat completions, embeddings, etc. functioned normally</li>
</ul>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="background">Background<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#background" class="hash-link" aria-label="Direct link to Background" title="Direct link to Background">​</a></h2>
<p>OpenAI's Responses API can return encrypted "reasoning items" (with IDs like <code>rs_...</code>) that contain intermediate reasoning steps. These items are encrypted with the organization's key and can only be decrypted by the same organization's API key.</p>
<p>When load balancing across deployments with different API keys, the existing affinity mechanisms were insufficient:</p>
<ul>
<li><strong><code>responses_api_deployment_check</code></strong>: Requires <code>previous_response_id</code> which some clients (like Codex) don't provide</li>
<li><strong><code>deployment_affinity</code></strong>: Too broad — pins <em>all</em> requests from a user to one deployment, reducing effective quota by the number of users</li>
<li><strong><code>session_affinity</code></strong>: Requires explicit session IDs and still reduces quota</li>
</ul>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="root-cause">Root Cause<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#root-cause" class="hash-link" aria-label="Direct link to Root Cause" title="Direct link to Root Cause">​</a></h2>
<p>LiteLLM's router had no mechanism to track which deployment created specific encrypted content items and route follow-up requests accordingly. The router treated all deployments as interchangeable, leading to decryption failures when encrypted content crossed organizational boundaries.</p>
<p><strong>The Problem Flow:</strong></p>
<ol>
<li>User calls <code>router.aresponses()</code> with model <code>gpt-5.1-codex</code></li>
<li>Router load balances to Deployment A (Azure East US, API Key 1)</li>
<li>Response contains encrypted reasoning item <code>rs_abc123</code> (encrypted with Org 1's key)</li>
<li>User makes follow-up request with <code>rs_abc123</code> in the input</li>
<li>Router load balances to Deployment B (Azure West Europe, API Key 2)</li>
<li>Deployment B tries to decrypt <code>rs_abc123</code> with Org 2's key → <strong>fails</strong></li>
</ol>
<p><strong>Why Existing Solutions Didn't Work:</strong></p>
<ul>
<li><strong><code>previous_response_id</code></strong>: Not provided by all clients (e.g., Codex)</li>
<li><strong><code>deployment_affinity</code></strong>: Pins <em>all</em> user requests to one deployment → reduces quota to 1/N where N = number of deployments</li>
<li><strong><code>session_affinity</code></strong>: Requires explicit session management and still reduces quota</li>
</ul>
<p><strong>Timeline:</strong></p>
<ol>
<li>Users configured multi-region Responses API load balancing with different API keys</li>
<li>Initial requests succeeded, but follow-up requests with encrypted content failed intermittently</li>
<li>Error rate correlated with number of deployments (more deployments = higher chance of routing to wrong one)</li>
<li>Investigation revealed encrypted content was organization-bound</li>
<li>Existing affinity mechanisms deemed unsuitable (quota reduction, missing <code>previous_response_id</code>)</li>
<li>New solution designed and implemented: <code>encrypted_content_affinity</code></li>
</ol>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-fix">The Fix<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#the-fix" class="hash-link" aria-label="Direct link to The Fix" title="Direct link to The Fix">​</a></h2>
<p>Implemented a new <code>encrypted_content_affinity</code> pre-call check that intelligently tracks encrypted content and routes follow-up requests <strong>only when necessary</strong>.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="implementation">Implementation<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#implementation" class="hash-link" aria-label="Direct link to Implementation" title="Direct link to Implementation">​</a></h3>
<p><strong>1. Encoding <code>model_id</code> into output items</strong> (<a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm/responses/utils.py" target="_blank" rel="noopener noreferrer"><code>responses/utils.py</code></a>)</p>
<p>The same approach used for <code>previous_response_id</code> affinity — no cache needed. When a response contains output items with <code>encrypted_content</code>, LiteLLM encodes the originating deployment's <code>model_id</code> in <strong>two places</strong> for redundancy:</p>
<ol>
<li><strong>Into the item ID</strong> (if present): <code>rs_abc123</code> → <code>encitem_{base64("litellm:model_id:{model_id};item_id:rs_abc123")}</code></li>
<li><strong>Into the encrypted_content itself</strong>: Wraps the content with <code>litellm_enc:{base64("model_id:{model_id}")};{original_encrypted_content}</code></li>
</ol>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Encoding item IDs (when present)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">_build_encrypted_item_id</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> item_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    assembled </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"litellm:model_id:</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">model_id</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">;item_id:</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">item_id</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    encoded </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> base64</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">b64encode</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">assembled</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">encode</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"utf-8"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">decode</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"utf-8"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"encitem_</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">encoded</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Wrapping encrypted_content (always, for redundancy)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">_wrap_encrypted_content_with_model_id</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">encrypted_content</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> model_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    metadata </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"model_id:</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">model_id</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    encoded_metadata </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> base64</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">b64encode</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">metadata</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">encode</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"utf-8"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">decode</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"utf-8"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"litellm_enc:</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">encoded_metadata</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">;</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">encrypted_content</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><br></span></code></pre></div></div>
<p><strong>Why wrap encrypted_content directly?</strong> Some clients (like Codex) don't consistently send item IDs in follow-up requests, but they always send the <code>encrypted_content</code> itself. By embedding <code>model_id</code> into the content, affinity works even when IDs are missing.</p>
<p><strong>Streaming responses:</strong> The wrapping logic is applied to both:</p>
<ul>
<li>Final response objects (non-streaming)</li>
<li>Individual streaming events (<code>response.output_item.added</code>, <code>response.output_item.done</code>)</li>
</ul>
<p>This ensures clients receiving streaming responses get wrapped content they can send back.</p>
<p>Before forwarding to the upstream provider, LiteLLM restores the original item IDs and unwraps encrypted_content so the provider never sees the encoded form:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># In responses/main.py — before calling the handler</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token builtin">input</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ResponsesAPIRequestUtils</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">_restore_encrypted_content_item_ids_in_input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>2. <code>EncryptedContentAffinityCheck</code> — routing only</strong> (<a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm/router_utils/pre_call_checks/encrypted_content_affinity_check.py" target="_blank" rel="noopener noreferrer"><code>encrypted_content_affinity_check.py</code></a>)</p>
<p>No <code>async_log_success_event</code> or cache lookups — the <code>model_id</code> is decoded directly from the item ID or encrypted_content:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">EncryptedContentAffinityCheck</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">CustomLogger</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">async_filter_deployments</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> model</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> healthy_deployments</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token triple-quoted-string string" style="color:#e3116c">"""Extract model_id from input items (ID or encrypted_content) and pin to that deployment."""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> item </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> request_kwargs</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"input"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token comment" style="color:#999988;font-style:italic"># Try to extract model_id from two sources:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            model_id </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">_extract_model_id_from_input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> model_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                deployment </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">_find_deployment_by_model_id</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    healthy_deployments</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> model_id</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> deployment</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    request_kwargs</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"_encrypted_content_affinity_pinned"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">deployment</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> healthy_deployments</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">_extract_model_id_from_input</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> item</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">dict</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> Optional</span><span class="token punctuation" style="color:#393A34">[</span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token triple-quoted-string string" style="color:#e3116c">"""Extract model_id from either encoded ID or wrapped encrypted_content."""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># 1. Try decoding from item ID (if present)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        item_id </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> item</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"id"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> item_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            decoded </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ResponsesAPIRequestUtils</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">_decode_encrypted_item_id</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item_id</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> decoded</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> decoded</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"model_id"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># 2. Try unwrapping from encrypted_content (fallback for clients that omit IDs)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        encrypted_content </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> item</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"encrypted_content"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> encrypted_content </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> encrypted_content</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">startswith</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"litellm_enc:"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            model_id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> _ </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ResponsesAPIRequestUtils</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">_unwrap_encrypted_content_with_model_id</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                encrypted_content</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> model_id</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><br></span></code></pre></div></div>
<p><strong>3. Rate Limit Bypass</strong> (<a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm/router.py" target="_blank" rel="noopener noreferrer"><code>router.py</code></a>)</p>
<p>When encrypted content requires a specific deployment, RPM/TPM limits are bypassed (the request would fail on any other deployment anyway):</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># In async_get_available_deployment, after filtering healthy deployments:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    request_kwargs</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"_encrypted_content_affinity_pinned"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">healthy_deployments</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> healthy_deployments</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Bypass routing strategy (RPM/TPM checks)</span><br></span></code></pre></div></div>
<p><strong>3. Configuration</strong></p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">router_settings</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">routing_strategy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> usage</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">based</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">routing</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">v2</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">enable_pre_call_checks</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">optional_pre_call_checks</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> encrypted_content_affinity</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">deployment_affinity_ttl_seconds</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">86400</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 24 hours</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="key-benefits">Key Benefits<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#key-benefits" class="hash-link" aria-label="Direct link to Key Benefits" title="Direct link to Key Benefits">​</a></h3>
<p>✅ <strong>No quota reduction</strong>: Only pins requests containing encrypted items<br>
<!-- -->✅ <strong>Bypasses rate limits</strong>: When encrypted content requires a specific deployment, RPM/TPM limits don't block it<br>
<!-- -->✅ <strong>No <code>previous_response_id</code> required</strong>: Works by encoding <code>model_id</code> directly into the item ID<br>
<!-- -->✅ <strong>No cache required</strong>: <code>model_id</code> is decoded on-the-fly from the item ID — no Redis, no TTL<br>
<!-- -->✅ <strong>Globally safe</strong>: Can be enabled for all models; non-Responses-API calls are unaffected<br>
<!-- -->✅ <strong>Surgical precision</strong>: Normal requests continue to load balance freely</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="remediation">Remediation<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#remediation" class="hash-link" aria-label="Direct link to Remediation" title="Direct link to Remediation">​</a></h2>
<table><thead><tr><th>#</th><th>Action</th><th>Status</th><th>Code</th></tr></thead><tbody><tr><td>1</td><td>Encode <code>model_id</code> into encrypted-content item IDs on response</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm/responses/utils.py" target="_blank" rel="noopener noreferrer"><code>responses/utils.py</code></a></td></tr><tr><td>2</td><td>Restore original item IDs before forwarding to upstream provider</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm/responses/main.py" target="_blank" rel="noopener noreferrer"><code>responses/main.py</code></a></td></tr><tr><td>3</td><td><code>EncryptedContentAffinityCheck</code>: decode item IDs to route (no cache)</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm/router_utils/pre_call_checks/encrypted_content_affinity_check.py" target="_blank" rel="noopener noreferrer"><code>encrypted_content_affinity_check.py</code></a></td></tr><tr><td>4</td><td>Add <code>encrypted_content_affinity</code> to <code>OptionalPreCallChecks</code> type</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm/types/router.py" target="_blank" rel="noopener noreferrer"><code>types/router.py</code></a></td></tr><tr><td>5</td><td>Implement rate limit bypass for affinity-pinned requests</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm/router.py" target="_blank" rel="noopener noreferrer"><code>router.py</code></a></td></tr><tr><td>6</td><td>Unit tests: encoding/decoding utilities, routing, RPM bypass</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/tests/test_litellm/router_utils/pre_call_checks/test_encrypted_content_affinity_check.py" target="_blank" rel="noopener noreferrer"><code>test_encrypted_content_affinity_check.py</code></a></td></tr><tr><td>7</td><td>Documentation: Responses API guide, load balancing guide, config reference</td><td>✅ Done</td><td><a href="https://docs.litellm.ai/docs/response_api#encrypted-content-affinity-multi-region-load-balancing" target="_blank" rel="noopener noreferrer">Docs</a></td></tr><tr><td>8</td><td><strong>[Mar 3]</strong> Fix streaming events to wrap encrypted_content</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm/responses/streaming_iterator.py" target="_blank" rel="noopener noreferrer"><code>responses/streaming_iterator.py</code></a></td></tr></tbody></table>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="follow-up-fix-streaming-responses-mar-3-2026">Follow-up Fix: Streaming Responses (Mar 3, 2026)<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#follow-up-fix-streaming-responses-mar-3-2026" class="hash-link" aria-label="Direct link to Follow-up Fix: Streaming Responses (Mar 3, 2026)" title="Direct link to Follow-up Fix: Streaming Responses (Mar 3, 2026)">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-issue">The Issue<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#the-issue" class="hash-link" aria-label="Direct link to The Issue" title="Direct link to The Issue">​</a></h3>
<p>After the initial fix was deployed, users reported that the <code>invalid_encrypted_content</code> error <strong>still occurred</strong> when using streaming responses with clients like Codex. Investigation revealed:</p>
<ul>
<li>✅ Non-streaming responses: <code>encrypted_content</code> was correctly wrapped with <code>litellm_enc:</code> prefix</li>
<li>❌ Streaming responses: Individual <code>response.output_item.added</code> and <code>response.output_item.done</code> events contained <strong>raw, unwrapped</strong> <code>encrypted_content</code></li>
</ul>
<p>Since Codex and other clients consume responses as streams, they received unwrapped content in these events and sent it back in follow-up requests, causing the affinity check to fail.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-root-cause">The Root Cause<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#the-root-cause" class="hash-link" aria-label="Direct link to The Root Cause" title="Direct link to The Root Cause">​</a></h3>
<p>The <code>_update_encrypted_content_item_ids_in_response</code> function only modified the <strong>final</strong> response object, which is used for non-streaming responses. For streaming responses, individual chunks are processed by <code>ResponsesAPIStreamingIterator._process_chunk</code>, which was <strong>not</strong> applying the wrapping logic to streaming events.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="the-fix-1">The Fix<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#the-fix-1" class="hash-link" aria-label="Direct link to The Fix" title="Direct link to The Fix">​</a></h3>
<p>Modified <code>litellm/litellm/responses/streaming_iterator.py</code> to wrap <code>encrypted_content</code> in streaming events:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># In ResponsesAPIStreamingIterator._process_chunk</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">litellm_metadata</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">litellm_metadata</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"encrypted_content_affinity_enabled"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    event_type </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">getattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">openai_responses_api_chunk</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> event_type </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ResponsesAPIStreamEvents</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">OUTPUT_ITEM_ADDED</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ResponsesAPIStreamEvents</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">OUTPUT_ITEM_DONE</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        item </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">getattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">openai_responses_api_chunk</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"item"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> item</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            encrypted_content </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">getattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"encrypted_content"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> encrypted_content </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> </span><span class="token builtin">isinstance</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">encrypted_content</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                model_id </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">litellm_metadata</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"model_info"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"id"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">litellm_metadata</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> model_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    wrapped_content </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ResponsesAPIRequestUtils</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">_wrap_encrypted_content_with_model_id</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                        encrypted_content</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> model_id</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token builtin">setattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"encrypted_content"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> wrapped_content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>This ensures that <strong>all</strong> <code>encrypted_content</code> sent to clients (streaming or non-streaming) is wrapped with <code>model_id</code> metadata, enabling consistent affinity routing.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="migration-guide">Migration Guide<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#migration-guide" class="hash-link" aria-label="Direct link to Migration Guide" title="Direct link to Migration Guide">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="before-using-deployment_affinity">Before (Using <code>deployment_affinity</code>)<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#before-using-deployment_affinity" class="hash-link" aria-label="Direct link to before-using-deployment_affinity" title="Direct link to before-using-deployment_affinity">​</a></h3>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">router_settings</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">optional_pre_call_checks</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> deployment_affinity  </span><span class="token comment" style="color:#999988;font-style:italic"># ❌ Reduces quota by number of users</span><br></span></code></pre></div></div>
<p><strong>Problem:</strong> All requests from a user pin to one deployment, reducing effective quota to 1/N.</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="after-using-encrypted_content_affinity">After (Using <code>encrypted_content_affinity</code>)<a href="https://docs.litellm.ai/blog/responses-api-encrypted-content-incident#after-using-encrypted_content_affinity" class="hash-link" aria-label="Direct link to after-using-encrypted_content_affinity" title="Direct link to after-using-encrypted_content_affinity">​</a></h3>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">router_settings</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">optional_pre_call_checks</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> encrypted_content_affinity  </span><span class="token comment" style="color:#999988;font-style:italic"># ✅ Only pins requests with encrypted content</span><br></span></code></pre></div></div>
<p><strong>Benefit:</strong> Normal requests load balance freely, only encrypted content requests pin when necessary.</p>
<hr>]]></content:encoded>
            <category>incident-report</category>
            <category>proxy</category>
            <category>responses-api</category>
            <category>load-balancing</category>
        </item>
        <item>
            <title><![CDATA[Incident Report: Wildcard Blocking New Models After Cost Map Reload]]></title>
            <link>https://docs.litellm.ai/blog/anthropic-wildcard-model-access-incident</link>
            <guid>https://docs.litellm.ai/blog/anthropic-wildcard-model-access-incident</guid>
            <pubDate>Mon, 23 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Date: Feb 23, 2026]]></description>
            <content:encoded><![CDATA[<p><strong>Date:</strong> Feb 23, 2026<br>
<strong>Duration:</strong> ~3 hours<br>
<strong>Severity:</strong> High (for users with provider wildcard access rules)<br>
<strong>Status:</strong> Resolved</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="summary">Summary<a href="https://docs.litellm.ai/blog/anthropic-wildcard-model-access-incident#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary">​</a></h2>
<p>When a new Anthropic model (e.g. <code>claude-sonnet-4-6</code>) was added to the LiteLLM model cost map and a cost map reload was triggered, requests to the new model were rejected with:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">key not allowed to access model. This key can only access models=['anthropic/*']. Tried to access claude-sonnet-4-6.</span><br></span></code></pre></div></div>
<p>The reload updated <code>litellm.model_cost</code> correctly but never re-ran <code>add_known_models()</code>, so <code>litellm.anthropic_models</code> (the in-memory set used by the wildcard resolver) remained stale. The new model was invisible to the <code>anthropic/*</code> wildcard even though the cost map knew about it.</p>
<ul>
<li><strong>LLM calls:</strong> All requests to newly-added Anthropic models were blocked with a 401.</li>
<li><strong>Existing models:</strong> Unaffected — only models missing from the stale provider set were impacted.</li>
<li><strong>Other providers:</strong> Same bug class existed for any provider wildcard (e.g. <code>openai/*</code>, <code>gemini/*</code>).</li>
</ul>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="background">Background<a href="https://docs.litellm.ai/blog/anthropic-wildcard-model-access-incident#background" class="hash-link" aria-label="Direct link to Background" title="Direct link to Background">​</a></h2>
<p>LiteLLM supports provider-level wildcard access rules. When an admin configures a key or team with <code>models=['anthropic/*']</code>, any model whose provider resolves to <code>anthropic</code> should be allowed. The resolution happens in <code>_model_custom_llm_provider_matches_wildcard_pattern</code>:</p>
<!-- -->
<p><code>litellm.anthropic_models</code> is a Python <code>set</code> populated at import time by <code>add_known_models()</code>. It is the source <code>get_llm_provider()</code> consults to map a bare model name like <code>claude-sonnet-4-6</code> to the provider string <code>"anthropic"</code>.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="root-cause">Root Cause<a href="https://docs.litellm.ai/blog/anthropic-wildcard-model-access-incident#root-cause" class="hash-link" aria-label="Direct link to Root Cause" title="Direct link to Root Cause">​</a></h2>
<p><code>add_known_models()</code> is called <strong>once</strong> at module import time. Both reload paths in <code>proxy_server.py</code> updated <code>litellm.model_cost</code> with the fresh map but never called <code>add_known_models()</code> again:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Before the fix — both reload paths looked like this:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">new_model_cost_map </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> get_model_cost_map</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">url</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">model_cost_map_url</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">model_cost </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> new_model_cost_map          </span><span class="token comment" style="color:#999988;font-style:italic"># ✅ cost map updated</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">_invalidate_model_cost_lowercase_map</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">           </span><span class="token comment" style="color:#999988;font-style:italic"># ✅ cache cleared</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ❌ add_known_models() never called</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">#    → litellm.anthropic_models still has the old set</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">#    → new model not in the set</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">#    → get_llm_provider() raises for the new model</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">#    → wildcard match returns False</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic">#    → 401 for every request to the new model</span><br></span></code></pre></div></div>
<p>The gap existed in two places:</p>
<ol>
<li><code>_check_and_reload_model_cost_map</code> — the periodic automatic reload (every 10 s)</li>
<li>The <code>/reload/model_cost_map</code> admin endpoint — the manual reload</li>
</ol>
<p><strong>Timeline:</strong></p>
<ol>
<li>New model (<code>claude-sonnet-4-6</code>) added to <code>model_prices_and_context_window.json</code></li>
<li>Admin triggers cost map reload via UI → <code>litellm.model_cost</code> updated</li>
<li>Users with <code>anthropic/*</code> wildcard keys attempt requests to <code>claude-sonnet-4-6</code></li>
<li><code>get_llm_provider('claude-sonnet-4-6')</code> raises → wildcard returns False → 401</li>
<li>Admin reloads cost map again — same result (root cause not addressed)</li>
<li>~3 hours of investigation → root cause identified → fix deployed</li>
</ol>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-fix">The Fix<a href="https://docs.litellm.ai/blog/anthropic-wildcard-model-access-incident#the-fix" class="hash-link" aria-label="Direct link to The Fix" title="Direct link to The Fix">​</a></h2>
<p>After each reload, <code>add_known_models()</code> is called with the freshly fetched map passed explicitly. Passing the map directly (rather than relying on the module-level reference) removes any ambiguity about which dict is iterated:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># After the fix — both reload paths now do:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">new_model_cost_map </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> get_model_cost_map</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">url</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">model_cost_map_url</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">model_cost </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> new_model_cost_map</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">_invalidate_model_cost_lowercase_map</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_known_models</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model_cost_map</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">new_model_cost_map</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># ✅ sets repopulated</span><br></span></code></pre></div></div>
<p><code>add_known_models()</code> was also updated to accept an optional explicit map so callers cannot accidentally iterate a stale module-level reference:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Before</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">add_known_models</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> key</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> value </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> model_cost</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">items</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">   </span><span class="token comment" style="color:#999988;font-style:italic"># reads module global — ambiguous after reload</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># After</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">add_known_models</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model_cost_map</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Optional</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">Dict</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    _map </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> model_cost_map </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> model_cost_map </span><span class="token keyword" style="color:#00009f">is</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> model_cost</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> key</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> value </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> _map</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">items</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">         </span><span class="token comment" style="color:#999988;font-style:italic"># always iterates the map you just fetched</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><br></span></code></pre></div></div>
<p>After the fix, the provider sets (<code>anthropic_models</code>, <code>open_ai_chat_completion_models</code>, etc.) are always consistent with <code>litellm.model_cost</code> immediately after every reload. New models become accessible via wildcard rules without any proxy restart.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="remediation">Remediation<a href="https://docs.litellm.ai/blog/anthropic-wildcard-model-access-incident#remediation" class="hash-link" aria-label="Direct link to Remediation" title="Direct link to Remediation">​</a></h2>
<table><thead><tr><th>#</th><th>Action</th><th>Status</th><th>Code</th></tr></thead><tbody><tr><td>1</td><td>Call <code>add_known_models(model_cost_map=...)</code> in the periodic reload path</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/proxy/proxy_server.py#L4393" target="_blank" rel="noopener noreferrer"><code>proxy_server.py#L4393</code></a></td></tr><tr><td>2</td><td>Call <code>add_known_models(model_cost_map=...)</code> in the <code>/reload/model_cost_map</code> endpoint</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/proxy/proxy_server.py#L11904" target="_blank" rel="noopener noreferrer"><code>proxy_server.py#L11904</code></a></td></tr><tr><td>3</td><td>Update <code>add_known_models()</code> to accept an explicit map parameter</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py#L617" target="_blank" rel="noopener noreferrer"><code>__init__.py#L617</code></a></td></tr><tr><td>4</td><td>Regression test: <code>add_known_models(model_cost_map=...)</code> populates provider sets</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/tests/proxy_unit_tests/test_auth_checks.py" target="_blank" rel="noopener noreferrer"><code>test_auth_checks.py</code></a></td></tr><tr><td>5</td><td>Regression test: <code>anthropic/*</code> wildcard grants/denies access correctly after reload</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/tests/proxy_unit_tests/test_auth_checks.py" target="_blank" rel="noopener noreferrer"><code>test_auth_checks.py</code></a></td></tr></tbody></table>
<hr>]]></content:encoded>
            <category>incident-report</category>
            <category>proxy</category>
            <category>auth</category>
            <category>model-access</category>
        </item>
        <item>
            <title><![CDATA[Incident Report: SERVER_ROOT_PATH regression broke UI routing]]></title>
            <link>https://docs.litellm.ai/blog/server-root-path-incident</link>
            <guid>https://docs.litellm.ai/blog/server-root-path-incident</guid>
            <pubDate>Sat, 21 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Date: January 22, 2026]]></description>
            <content:encoded><![CDATA[<p><strong>Date:</strong> January 22, 2026
<strong>Duration:</strong> ~4 days (until fix merged January 26, 2026)
<strong>Severity:</strong> High
<strong>Status:</strong> Resolved</p>
<blockquote>
<p><strong>Note:</strong> This fix is available starting from LiteLLM <code>v1.81.3.rc.6</code> or higher.</p>
</blockquote>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="summary">Summary<a href="https://docs.litellm.ai/blog/server-root-path-incident#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary">​</a></h2>
<p>A PR (<a href="https://github.com/BerriAI/litellm/pull/19467" target="_blank" rel="noopener noreferrer"><code>#19467</code></a>) accidentally removed the <code>root_path=server_root_path</code> parameter from the FastAPI app initialization in <code>proxy_server.py</code>. This caused the proxy to ignore the <code>SERVER_ROOT_PATH</code> environment variable when serving the UI. Users who deploy LiteLLM behind a reverse proxy with a path prefix (e.g., <code>/api/v1</code> or <code>/llmproxy</code>) found that all UI pages returned 404 Not Found.</p>
<ul>
<li><strong>LLM API calls:</strong> No impact. API routing was unaffected.</li>
<li><strong>UI pages:</strong> All UI pages returned 404 for deployments using <code>SERVER_ROOT_PATH</code>.</li>
<li><strong>Swagger/OpenAPI docs:</strong> Broken when accessed through the configured root path.</li>
</ul>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="background">Background<a href="https://docs.litellm.ai/blog/server-root-path-incident#background" class="hash-link" aria-label="Direct link to Background" title="Direct link to Background">​</a></h2>
<p>Many LiteLLM deployments run behind a reverse proxy (e.g., Nginx, Traefik, AWS ALB) that routes traffic to LiteLLM under a path prefix. FastAPI's <code>root_path</code> parameter tells the application about this prefix so it can correctly serve static files, generate URLs, and handle routing.</p>
<!-- -->
<p>The <code>root_path</code> parameter was present in <code>proxy_server.py</code> since early versions of LiteLLM. It was removed as a side effect of PR <a href="https://github.com/BerriAI/litellm/pull/19467" target="_blank" rel="noopener noreferrer">#19467</a>, which was intended to fix a different UI 404 issue.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="root-cause">Root cause<a href="https://docs.litellm.ai/blog/server-root-path-incident#root-cause" class="hash-link" aria-label="Direct link to Root cause" title="Direct link to Root cause">​</a></h2>
<p>PR <a href="https://github.com/BerriAI/litellm/pull/19467" target="_blank" rel="noopener noreferrer">#19467</a> (<code>73d49f8</code>) removed the <code>root_path=server_root_path</code> line from the <code>FastAPI()</code> constructor in <code>proxy_server.py</code>:</p>
<div class="language-diff codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-diff codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"> app = FastAPI(</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     docs_url=_get_docs_url(),</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     redoc_url=_get_redoc_url(),</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     title=_title,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     description=_description,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     version=version,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-    root_path=server_root_path,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">     lifespan=proxy_startup_event,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> )</span><br></span></code></pre></div></div>
<p>Without <code>root_path</code>, FastAPI treated all requests as if the application was mounted at <code>/</code>, causing path mismatches for any deployment using <code>SERVER_ROOT_PATH</code>.</p>
<p>The regression went undetected because:</p>
<ol>
<li><strong>No automated test</strong> verified that <code>root_path</code> was set on the FastAPI app.</li>
<li><strong>No manual test procedure</strong> existed for <code>SERVER_ROOT_PATH</code> functionality.</li>
<li><strong>Default deployments</strong> (without <code>SERVER_ROOT_PATH</code>) were unaffected, so most CI tests passed.</li>
</ol>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="remediation">Remediation<a href="https://docs.litellm.ai/blog/server-root-path-incident#remediation" class="hash-link" aria-label="Direct link to Remediation" title="Direct link to Remediation">​</a></h2>
<table><thead><tr><th>#</th><th>Action</th><th>Status</th><th>Code</th></tr></thead><tbody><tr><td>1</td><td>Restore <code>root_path=server_root_path</code> in FastAPI app initialization</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/pull/19790" target="_blank" rel="noopener noreferrer"><code>#19790</code></a> (<code>5426b3c</code>)</td></tr><tr><td>2</td><td>Add unit tests for <code>get_server_root_path()</code> and FastAPI app initialization</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/tests/proxy_unit_tests/test_server_root_path.py" target="_blank" rel="noopener noreferrer"><code>test_server_root_path.py</code></a></td></tr><tr><td>3</td><td>Add CI workflow that builds Docker image and tests UI routing with <code>SERVER_ROOT_PATH</code> on every PR</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/.github/workflows/test_server_root_path.yml" target="_blank" rel="noopener noreferrer"><code>test_server_root_path.yml</code></a></td></tr><tr><td>4</td><td>Document manual test procedure for <code>SERVER_ROOT_PATH</code></td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/discussions/8495" target="_blank" rel="noopener noreferrer">Discussion #8495</a></td></tr></tbody></table>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="ci-workflow-details">CI workflow details<a href="https://docs.litellm.ai/blog/server-root-path-incident#ci-workflow-details" class="hash-link" aria-label="Direct link to CI workflow details" title="Direct link to CI workflow details">​</a></h2>
<p>The new <a href="https://github.com/BerriAI/litellm/blob/main/.github/workflows/test_server_root_path.yml" target="_blank" rel="noopener noreferrer"><code>test_server_root_path.yml</code></a> workflow runs on every PR against <code>main</code>. It:</p>
<ol>
<li>Builds the LiteLLM Docker image</li>
<li>Starts a container with <code>SERVER_ROOT_PATH</code> set (tests both <code>/api/v1</code> and <code>/llmproxy</code>)</li>
<li>Verifies the UI returns valid HTML at <code>{ROOT_PATH}/ui/</code></li>
<li>Fails the workflow if the UI is unreachable</li>
</ol>
<!-- -->
<p>This prevents future regressions where changes to <code>proxy_server.py</code> accidentally break <code>SERVER_ROOT_PATH</code> support.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="timeline">Timeline<a href="https://docs.litellm.ai/blog/server-root-path-incident#timeline" class="hash-link" aria-label="Direct link to Timeline" title="Direct link to Timeline">​</a></h2>
<table><thead><tr><th>Time (UTC)</th><th>Event</th></tr></thead><tbody><tr><td>Jan 22, 2026 04:20</td><td>PR <a href="https://github.com/BerriAI/litellm/pull/19467" target="_blank" rel="noopener noreferrer">#19467</a> merged, removing <code>root_path=server_root_path</code></td></tr><tr><td>Jan 22–26</td><td>Users on nightly builds report UI 404 errors when using <code>SERVER_ROOT_PATH</code></td></tr><tr><td>Jan 26, 2026 17:48</td><td>Fix PR <a href="https://github.com/BerriAI/litellm/pull/19790" target="_blank" rel="noopener noreferrer">#19790</a> merged, restoring <code>root_path=server_root_path</code></td></tr><tr><td>Feb 18, 2026</td><td>CI workflow <a href="https://github.com/BerriAI/litellm/blob/main/.github/workflows/test_server_root_path.yml" target="_blank" rel="noopener noreferrer"><code>test_server_root_path.yml</code></a> added to run on every PR</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="resolution-steps-for-users">Resolution steps for users<a href="https://docs.litellm.ai/blog/server-root-path-incident#resolution-steps-for-users" class="hash-link" aria-label="Direct link to Resolution steps for users" title="Direct link to Resolution steps for users">​</a></h2>
<p>For users still experiencing issues, update to the latest LiteLLM version:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">pip install --upgrade litellm</span><br></span></code></pre></div></div>
<p>Verify your <code>SERVER_ROOT_PATH</code> is correctly set:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># In your environment or docker-compose.yml</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SERVER_ROOT_PATH="/your-prefix"</span><br></span></code></pre></div></div>
<p>Then confirm the UI is accessible at <code>http://your-host:4000/your-prefix/ui/</code>.</p>]]></content:encoded>
            <category>incident-report</category>
            <category>ui</category>
            <category>stability</category>
        </item>
        <item>
            <title><![CDATA[DAY 0 Support: Gemini 3.1 Pro on LiteLLM]]></title>
            <link>https://docs.litellm.ai/blog/gemini_3_1_pro</link>
            <guid>https://docs.litellm.ai/blog/gemini_3_1_pro</guid>
            <pubDate>Thu, 19 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Guide to using Gemini 3.1 Pro on LiteLLM Proxy and SDK with day 0 support.]]></description>
            <content:encoded><![CDATA[<p>LiteLLM now supports <code>gemini-3.1-pro-preview</code> and all the new API changes along with it.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="deploy-this-version">Deploy this version<a href="https://docs.litellm.ai/blog/gemini_3_1_pro#deploy-this-version" class="hash-link" aria-label="Direct link to Deploy this version" title="Direct link to Deploy this version">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">Docker</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">Pip</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><div class="language-showLineNumbers language-showlinenumbers codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">docker run litellm</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-showlinenumbers codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-e STORE_MODEL_IN_DB=True \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ghcr.io/berriai/litellm:main-v1.81.9-stable.gemini.3.1-pro</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-showLineNumbers language-showlinenumbers codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">pip install litellm</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-showlinenumbers codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">pip install litellm==v1.81.9-stable.gemini.3.1-pro</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-new">What's New<a href="https://docs.litellm.ai/blog/gemini_3_1_pro#whats-new" class="hash-link" aria-label="Direct link to What's New" title="Direct link to What's New">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="1-new-thinking-levels-thinkinglevel-with-minimal--medium">1. New Thinking Levels: <code>thinkingLevel</code> with MINIMAL &amp; MEDIUM<a href="https://docs.litellm.ai/blog/gemini_3_1_pro#1-new-thinking-levels-thinkinglevel-with-minimal--medium" class="hash-link" aria-label="Direct link to 1-new-thinking-levels-thinkinglevel-with-minimal--medium" title="Direct link to 1-new-thinking-levels-thinkinglevel-with-minimal--medium">​</a></h3>
<p>Gemini 3.1 Pro introduces support for <strong>medium</strong> thinking level</p>
<p>LiteLLM automatically maps the OpenAI <code>reasoning_effort</code> parameter to Gemini's <code>thinkingLevel</code>, so you can use familiar <code>reasoning_effort</code> values (<code>minimal</code>, <code>low</code>, <code>medium</code>, <code>high</code>) without changing your code!</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="supported-endpoints">Supported Endpoints<a href="https://docs.litellm.ai/blog/gemini_3_1_pro#supported-endpoints" class="hash-link" aria-label="Direct link to Supported Endpoints" title="Direct link to Supported Endpoints">​</a></h2>
<p>LiteLLM provides <strong>full end-to-end support</strong> for Gemini 3.1 Pro on:</p>
<ul>
<li>✅ <code>/v1/chat/completions</code> - OpenAI-compatible chat completions endpoint</li>
<li>✅ <code>/v1/responses</code> - OpenAI Responses API endpoint (streaming and non-streaming)</li>
<li>✅ <a href="https://docs.litellm.ai/docs/anthropic_unified"><code>/v1/messages</code></a> - Anthropic-compatible messages endpoint</li>
<li>✅ <code>/v1/generateContent</code> – <a href="https://docs.litellm.ai/docs/generateContent.md">Google Gemini API</a> compatible endpoint</li>
</ul>
<p>All endpoints support:</p>
<ul>
<li>Streaming and non-streaming responses</li>
<li>Function calling with thought signatures</li>
<li>Multi-turn conversations</li>
<li>All Gemini 3-specific features</li>
<li>Conversion of provider specific thinking related param to thinkingLevel</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="quick-start">Quick Start<a href="https://docs.litellm.ai/blog/gemini_3_1_pro#quick-start" class="hash-link" aria-label="Direct link to Quick Start" title="Direct link to Quick Start">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">SDK</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">PROXY</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>Basic Usage with MEDIUM thinking (NEW)</strong></p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># No need to make any changes to your code as we map openai reasoning param to thinkingLevel</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gemini/gemini-3.1-pro-preview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Solve this complex math problem: 25 * 4 + 10"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    reasoning_effort</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"medium"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># NEW: MEDIUM thinking level</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pro</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">preview</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gemini/gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pro</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">preview</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/GEMINI_API_KEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vertex</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pro</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">preview</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vertex_ai/gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3.1</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pro</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">preview</span><br></span></code></pre></div></div><p><strong>2. Start proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">litellm --config /path/to/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Call with MEDIUM thinking</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST http://localhost:4000/v1/chat/completions \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Content-Type: application/json" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer &lt;YOUR-LITELLM-KEY&gt;" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "gemini-3.1-pro-preview",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [{"role": "user", "content": "Complex reasoning task"}],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "reasoning_effort": "medium"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }'</span><br></span></code></pre></div></div></div></div></div>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="reasoning_effort-mapping-for-gemini-3"><code>reasoning_effort</code> Mapping for Gemini 3+<a href="https://docs.litellm.ai/blog/gemini_3_1_pro#reasoning_effort-mapping-for-gemini-3" class="hash-link" aria-label="Direct link to reasoning_effort-mapping-for-gemini-3" title="Direct link to reasoning_effort-mapping-for-gemini-3">​</a></h2>
<table><thead><tr><th>reasoning_effort</th><th>thinking_level</th></tr></thead><tbody><tr><td><code>minimal</code></td><td><code>minimal</code></td></tr><tr><td><code>low</code></td><td><code>low</code></td></tr><tr><td><code>medium</code></td><td><code>medium</code></td></tr><tr><td><code>high</code></td><td><code>high</code></td></tr><tr><td><code>disable</code></td><td><code>minimal</code></td></tr><tr><td><code>none</code></td><td><code>minimal</code></td></tr></tbody></table>]]></content:encoded>
            <category>gemini</category>
            <category>day 0 support</category>
            <category>llms</category>
        </item>
        <item>
            <title><![CDATA[Incident Report: vLLM Embeddings Broken by encoding_format Parameter]]></title>
            <link>https://docs.litellm.ai/blog/vllm-embeddings-incident</link>
            <guid>https://docs.litellm.ai/blog/vllm-embeddings-incident</guid>
            <pubDate>Wed, 18 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Date: Feb 16, 2026]]></description>
            <content:encoded><![CDATA[<p><strong>Date:</strong> Feb 16, 2026
<strong>Duration:</strong> ~3 hours
<strong>Severity:</strong> High (for vLLM embedding users)
<strong>Status:</strong> Resolved</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="summary">Summary<a href="https://docs.litellm.ai/blog/vllm-embeddings-incident#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary">​</a></h2>
<p>A commit (<a href="https://github.com/BerriAI/litellm/commit/dbcae4aca5836770d0e9cd43abab0333c3d61ab2" target="_blank" rel="noopener noreferrer"><code>dbcae4a</code></a>) intended to fix OpenAI SDK behavior broke vLLM embeddings by explicitly passing <code>encoding_format=None</code> in API requests. vLLM rejects this with error: <code>"unknown variant \</code>`, expected float or base64"`.</p>
<ul>
<li><strong>vLLM embedding calls:</strong> Complete failure - all requests rejected</li>
<li><strong>Other providers:</strong> No impact - OpenAI and other providers functioned normally</li>
<li><strong>Other vLLM functionality:</strong> No impact - only embeddings were affected</li>
</ul>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="background">Background<a href="https://docs.litellm.ai/blog/vllm-embeddings-incident#background" class="hash-link" aria-label="Direct link to Background" title="Direct link to Background">​</a></h2>
<p>The <code>encoding_format</code> parameter for embeddings specifies whether vectors should be returned as <code>float</code> arrays or <code>base64</code> encoded strings. Different providers have different expectations:</p>
<ul>
<li><strong>OpenAI SDK:</strong> If <code>encoding_format</code> is omitted, the SDK adds a default value of <code>"float"</code></li>
<li><strong>vLLM:</strong> Strictly validates <code>encoding_format</code> - only accepts <code>"float"</code>, <code>"base64"</code>, or complete omission. Rejects <code>None</code> or empty string values.</li>
</ul>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="root-cause">Root cause<a href="https://docs.litellm.ai/blog/vllm-embeddings-incident#root-cause" class="hash-link" aria-label="Direct link to Root cause" title="Direct link to Root cause">​</a></h2>
<p>A well-intentioned fix for OpenAI SDK behavior inadvertently broke vLLM embeddings:</p>
<p><strong>The Breaking Change (<a href="https://github.com/BerriAI/litellm/commit/dbcae4aca5836770d0e9cd43abab0333c3d61ab2" target="_blank" rel="noopener noreferrer"><code>dbcae4a</code></a>):</strong></p>
<p>In <code>litellm/main.py</code>, the code was changed to explicitly set <code>encoding_format=None</code> instead of omitting it:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Added in dbcae4a</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> encoding_format </span><span class="token keyword" style="color:#00009f">is</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    optional_params</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"encoding_format"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> encoding_format</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">else</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Omitting causes openai sdk to add default value of "float"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    optional_params</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"encoding_format"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><br></span></code></pre></div></div>
<p>This fix worked correctly for OpenAI - explicitly passing <code>None</code> prevented the SDK from adding its default value. However, vLLM's strict parameter validation rejected <code>None</code> values, causing all embedding requests to fail.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-fix">The Fix<a href="https://docs.litellm.ai/blog/vllm-embeddings-incident#the-fix" class="hash-link" aria-label="Direct link to The Fix" title="Direct link to The Fix">​</a></h2>
<p>Fix deployed (<a href="https://github.com/BerriAI/litellm/commit/55348dd9c51b5b028f676d25ad023b8f052fc071" target="_blank" rel="noopener noreferrer"><code>55348dd</code></a>). The solution filters out <code>None</code> and empty string values from <code>optional_params</code> before sending requests to OpenAI-like providers (including vLLM).</p>
<p><strong>In <code>litellm/llms/openai_like/embedding/handler.py</code>:</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Before (broken)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">data </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"model"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> model</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"input"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">**</span><span class="token plain">optional_params</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># After (fixed)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">filtered_optional_params </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">k</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> v </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> k</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> v </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> optional_params</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">items</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> v </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">''</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">data </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"model"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> model</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"input"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">input</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">**</span><span class="token plain">filtered_optional_params</span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p>This ensures:</p>
<ul>
<li>Valid values (<code>"float"</code>, <code>"base64"</code>) are preserved and sent</li>
<li><code>None</code> and empty string values are filtered out (parameter omitted entirely)</li>
<li>OpenAI SDK no longer adds defaults because liteLLM handles the parameter upstream</li>
</ul>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="remediation">Remediation<a href="https://docs.litellm.ai/blog/vllm-embeddings-incident#remediation" class="hash-link" aria-label="Direct link to Remediation" title="Direct link to Remediation">​</a></h2>
<table><thead><tr><th>#</th><th>Action</th><th>Status</th><th>Code</th></tr></thead><tbody><tr><td>1</td><td>Filter <code>None</code> and empty string values in OpenAI-like embedding handler</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/llms/openai_like/embedding/handler.py#L108" target="_blank" rel="noopener noreferrer"><code>handler.py#L108</code></a></td></tr><tr><td>2</td><td>Unit tests for parameter filtering (None, empty string, valid values)</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/tests/test_litellm/llms/openai_like/embedding/test_openai_like_embedding.py" target="_blank" rel="noopener noreferrer"><code>test_openai_like_embedding.py</code></a></td></tr><tr><td>3</td><td>Transformation tests for hosted_vllm embedding config</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/tests/test_litellm/llms/hosted_vllm/embedding/test_hosted_vllm_embedding_transformation.py" target="_blank" rel="noopener noreferrer"><code>test_hosted_vllm_embedding_transformation.py</code></a></td></tr><tr><td>4</td><td>E2E tests with actual vLLM endpoint</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/tests/test_litellm/llms/hosted_vllm/embedding/test_hosted_vllm_embedding_e2e.py" target="_blank" rel="noopener noreferrer"><code>test_hosted_vllm_embedding_e2e.py</code></a></td></tr><tr><td>5</td><td>Validate JSON payload structure matches vLLM expectations</td><td>✅ Done</td><td>Tests verify exact JSON sent to endpoint</td></tr></tbody></table>
<hr>]]></content:encoded>
            <category>incident-report</category>
            <category>embeddings</category>
            <category>vllm</category>
        </item>
        <item>
            <title><![CDATA[Day 0 Support: Claude Sonnet 4.6]]></title>
            <link>https://docs.litellm.ai/blog/claude_sonnet_4_6</link>
            <guid>https://docs.litellm.ai/blog/claude_sonnet_4_6</guid>
            <pubDate>Tue, 17 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Day 0 support for Claude Sonnet 4.6 on LiteLLM AI Gateway - use across Anthropic, Azure, Vertex AI, and Bedrock.]]></description>
            <content:encoded><![CDATA[<p>LiteLLM now supports Claude Sonnet 4.6 on Day 0. Use it across Anthropic, Azure, Vertex AI, and Bedrock through the LiteLLM AI Gateway.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="docker-image">Docker Image<a href="https://docs.litellm.ai/blog/claude_sonnet_4_6#docker-image" class="hash-link" aria-label="Direct link to Docker Image" title="Direct link to Docker Image">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker pull ghcr.io/berriai/litellm:v1.81.3-stable.sonnet-4-6</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---anthropic">Usage - Anthropic<a href="https://docs.litellm.ai/blog/claude_sonnet_4_6#usage---anthropic" class="hash-link" aria-label="Direct link to Usage - Anthropic" title="Direct link to Usage - Anthropic">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">LiteLLM SDK</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sonnet</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> anthropic/claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sonnet</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/ANTHROPIC_API_KEY</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:v1.81.3-stable.sonnet-4-6 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-sonnet-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"anthropic/claude-sonnet-4-6"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"what llm are you"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---azure">Usage - Azure<a href="https://docs.litellm.ai/blog/claude_sonnet_4_6#usage---azure" class="hash-link" aria-label="Direct link to Usage - Azure" title="Direct link to Usage - Azure">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">LiteLLM SDK</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sonnet</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> azure_ai/claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sonnet</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/AZURE_AI_API_KEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_base</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/AZURE_AI_API_BASE  </span><span class="token comment" style="color:#999988;font-style:italic"># https://&lt;resource&gt;.services.ai.azure.com</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e AZURE_AI_API_KEY=$AZURE_AI_API_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e AZURE_AI_API_BASE=$AZURE_AI_API_BASE \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:v1.81.3-stable.sonnet-4-6 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-sonnet-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"azure_ai/claude-sonnet-4-6"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-azure-api-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_base</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"https://&lt;resource&gt;.services.ai.azure.com"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"what llm are you"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---vertex-ai">Usage - Vertex AI<a href="https://docs.litellm.ai/blog/claude_sonnet_4_6#usage---vertex-ai" class="hash-link" aria-label="Direct link to Usage - Vertex AI" title="Direct link to Usage - Vertex AI">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">LiteLLM SDK</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sonnet</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vertex_ai/claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sonnet</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">vertex_project</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/VERTEX_PROJECT</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">vertex_location</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> us</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">east5</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e VERTEX_PROJECT=$VERTEX_PROJECT \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e GOOGLE_APPLICATION_CREDENTIALS=/app/credentials.json \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/credentials.json:/app/credentials.json \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:v1.81.3-stable.sonnet-4-6 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-sonnet-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"vertex_ai/claude-sonnet-4-6"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    vertex_project</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-project-id"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    vertex_location</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"us-east5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"what llm are you"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---bedrock">Usage - Bedrock<a href="https://docs.litellm.ai/blog/claude_sonnet_4_6#usage---bedrock" class="hash-link" aria-label="Direct link to Usage - Bedrock" title="Direct link to Usage - Bedrock">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">LiteLLM SDK</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sonnet</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> bedrock/anthropic.claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sonnet</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">6</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">aws_access_key_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/AWS_ACCESS_KEY_ID</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">aws_secret_access_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/AWS_SECRET_ACCESS_KEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">aws_region_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> us</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">east</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">1</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:v1.81.3-stable.sonnet-4-6 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-sonnet-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"bedrock/anthropic.claude-sonnet-4-6-v1"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    aws_access_key_id</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-access-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    aws_secret_access_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-secret-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    aws_region_name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"us-east-1"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"what llm are you"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div></div></div>]]></content:encoded>
            <category>anthropic</category>
            <category>claude</category>
            <category>sonnet 4.6</category>
        </item>
        <item>
            <title><![CDATA[Incident Report: Invalid beta headers with Claude Code]]></title>
            <link>https://docs.litellm.ai/blog/claude-code-beta-headers-incident</link>
            <guid>https://docs.litellm.ai/blog/claude-code-beta-headers-incident</guid>
            <pubDate>Mon, 16 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Date: February 13, 2026]]></description>
            <content:encoded><![CDATA[<p><strong>Date:</strong> February 13, 2026
<strong>Duration:</strong> ~3 hours
<strong>Severity:</strong> High
<strong>Status:</strong> Resolved</p>
<blockquote>
<p><strong>Note:</strong> This fix will be available starting from <code>v1.81.13-nightly</code> or higher of LiteLLM.</p>
</blockquote>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="summary">Summary<a href="https://docs.litellm.ai/blog/claude-code-beta-headers-incident#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary">​</a></h2>
<p>Claude Code began sending unsupported Anthropic beta headers to non-Anthropic providers (Bedrock, Azure AI, Vertex AI), causing <code>invalid beta flag</code> errors. LiteLLM was forwarding all beta headers without provider-specific validation. Users experienced request failures when routing Claude Code requests through LiteLLM to these providers.</p>
<ul>
<li><strong>LLM calls to Anthropic:</strong> No impact.</li>
<li><strong>LLM calls to Bedrock/Azure/Vertex:</strong> Failed with <code>invalid beta flag</code> errors when unsupported headers were present.</li>
<li><strong>Cost tracking and routing:</strong> No impact.</li>
</ul>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="background">Background<a href="https://docs.litellm.ai/blog/claude-code-beta-headers-incident#background" class="hash-link" aria-label="Direct link to Background" title="Direct link to Background">​</a></h2>
<p>Anthropic uses beta headers to enable experimental features in Claude. When Claude Code makes API requests, it includes headers like <code>anthropic-beta: prompt-caching-scope-2026-01-05,advanced-tool-use-2025-11-20</code>. However, not all providers support all Anthropic beta features.</p>
<p>Before this incident, LiteLLM forwarded all beta headers to all providers without validation:</p>
<!-- -->
<p>Requests succeeded for Anthropic (native support) but failed for other providers when Claude Code sent headers those providers didn't support.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="root-cause">Root cause<a href="https://docs.litellm.ai/blog/claude-code-beta-headers-incident#root-cause" class="hash-link" aria-label="Direct link to Root cause" title="Direct link to Root cause">​</a></h2>
<p>LiteLLM lacked provider-specific beta header validation. When Claude Code introduced new beta features or sent headers that specific providers didn't support, those headers were blindly forwarded, causing provider API errors.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="remediation">Remediation<a href="https://docs.litellm.ai/blog/claude-code-beta-headers-incident#remediation" class="hash-link" aria-label="Direct link to Remediation" title="Direct link to Remediation">​</a></h2>
<table><thead><tr><th>#</th><th>Action</th><th>Status</th><th>Code</th></tr></thead><tbody><tr><td>1</td><td>Create <code>anthropic_beta_headers_config.json</code> with provider-specific mappings</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/anthropic_beta_headers_config.json" target="_blank" rel="noopener noreferrer"><code>anthropic_beta_headers_config.json</code></a></td></tr><tr><td>2</td><td>Implement strict validation: headers must be explicitly mapped to be forwarded</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm_core_utils/litellm_logging.py" target="_blank" rel="noopener noreferrer"><code>litellm_logging.py</code></a></td></tr><tr><td>3</td><td>Add <code>/reload/anthropic_beta_headers</code> endpoint for dynamic config updates</td><td>✅ Done</td><td>Proxy management endpoints</td></tr><tr><td>4</td><td>Add <code>/schedule/anthropic_beta_headers_reload</code> for automatic periodic updates</td><td>✅ Done</td><td>Proxy management endpoints</td></tr><tr><td>5</td><td>Support <code>LITELLM_ANTHROPIC_BETA_HEADERS_URL</code> for custom config sources</td><td>✅ Done</td><td>Environment configuration</td></tr><tr><td>6</td><td>Support <code>LITELLM_LOCAL_ANTHROPIC_BETA_HEADERS</code> for air-gapped deployments</td><td>✅ Done</td><td>Environment configuration</td></tr></tbody></table>
<p>Now LiteLLM validates and transforms headers per-provider:</p>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="dynamic-configuration-updates">Dynamic configuration updates<a href="https://docs.litellm.ai/blog/claude-code-beta-headers-incident#dynamic-configuration-updates" class="hash-link" aria-label="Direct link to Dynamic configuration updates" title="Direct link to Dynamic configuration updates">​</a></h2>
<p>A key improvement is zero-downtime configuration updates. When Anthropic releases new beta features, users can update their configuration without restarting:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Manually trigger reload (no restart needed)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST "https://your-proxy-url/reload/anthropic_beta_headers" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer YOUR_ADMIN_TOKEN"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Or schedule automatic reloads every 24 hours</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST "https://your-proxy-url/schedule/anthropic_beta_headers_reload?hours=24" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer YOUR_ADMIN_TOKEN"</span><br></span></code></pre></div></div>
<p>This prevents future incidents where Claude Code introduces new headers before LiteLLM configuration is updated.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="configuration-format">Configuration format<a href="https://docs.litellm.ai/blog/claude-code-beta-headers-incident#configuration-format" class="hash-link" aria-label="Direct link to Configuration format" title="Direct link to Configuration format">​</a></h2>
<p>The <code>anthropic_beta_headers_config.json</code> file maps input headers to provider-specific output headers:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"description"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Mapping of Anthropic beta headers for each provider."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"anthropic"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"advanced-tool-use-2025-11-20"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"advanced-tool-use-2025-11-20"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"computer-use-2025-01-24"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"computer-use-2025-01-24"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"bedrock_converse"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"advanced-tool-use-2025-11-20"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token null keyword" style="color:#00009f">null</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"computer-use-2025-01-24"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"computer-use-2025-01-24"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"azure_ai"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"advanced-tool-use-2025-11-20"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"advanced-tool-use-2025-11-20"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"computer-use-2025-01-24"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"computer-use-2025-01-24"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p><strong>Validation rules:</strong></p>
<ol>
<li>Headers must exist in the mapping for the target provider</li>
<li>Headers with <code>null</code> values are filtered out (unsupported)</li>
<li>Header names can be transformed per-provider (e.g., Bedrock uses different names for some features)</li>
</ol>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="resolution-steps-for-users">Resolution steps for users<a href="https://docs.litellm.ai/blog/claude-code-beta-headers-incident#resolution-steps-for-users" class="hash-link" aria-label="Direct link to Resolution steps for users" title="Direct link to Resolution steps for users">​</a></h2>
<p>For users still experiencing issues, update to the latest LiteLLM version if &lt; v1.81.11-nightly:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">pip install --upgrade litellm</span><br></span></code></pre></div></div>
<p>Or manually reload the configuration without restarting:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST "https://your-proxy-url/reload/anthropic_beta_headers" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer YOUR_ADMIN_TOKEN"</span><br></span></code></pre></div></div>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="related-documentation">Related documentation<a href="https://docs.litellm.ai/blog/claude-code-beta-headers-incident#related-documentation" class="hash-link" aria-label="Direct link to Related documentation" title="Direct link to Related documentation">​</a></h2>
<ul>
<li><a href="https://docs.litellm.ai/proxy/sync_anthropic_beta_headers.md">Managing Anthropic Beta Headers</a> - Complete configuration guide</li>
<li><a href="https://github.com/BerriAI/litellm/blob/main/litellm/anthropic_beta_headers_config.json" target="_blank" rel="noopener noreferrer"><code>anthropic_beta_headers_config.json</code></a> - Current configuration file</li>
</ul>]]></content:encoded>
            <category>incident-report</category>
            <category>anthropic</category>
            <category>stability</category>
        </item>
        <item>
            <title><![CDATA[Day 0 Support: MiniMax-M2.5]]></title>
            <link>https://docs.litellm.ai/blog/minimax_m2_5</link>
            <guid>https://docs.litellm.ai/blog/minimax_m2_5</guid>
            <pubDate>Thu, 12 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Day 0 support for MiniMax-M2.5 on LiteLLM]]></description>
            <content:encoded><![CDATA[<p>LiteLLM now supports MiniMax-M2.5 on Day 0. Use it across OpenAI-compatible and Anthropic-compatible APIs through the LiteLLM AI Gateway.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="supported-models">Supported Models<a href="https://docs.litellm.ai/blog/minimax_m2_5#supported-models" class="hash-link" aria-label="Direct link to Supported Models" title="Direct link to Supported Models">​</a></h2>
<p>LiteLLM supports the following MiniMax models:</p>
<table><thead><tr><th>Model</th><th>Description</th><th>Input Cost</th><th>Output Cost</th><th>Context Window</th></tr></thead><tbody><tr><td><strong>MiniMax-M2.5</strong></td><td>Advanced reasoning, Agentic capabilities</td><td>$0.3/M tokens</td><td>$1.2/M tokens</td><td>1M tokens</td></tr><tr><td><strong>MiniMax-M2.5-lightning</strong></td><td>Faster and More Agile (~100 tps)</td><td>$0.3/M tokens</td><td>$2.4/M tokens</td><td>1M tokens</td></tr></tbody></table>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="features-supported">Features Supported<a href="https://docs.litellm.ai/blog/minimax_m2_5#features-supported" class="hash-link" aria-label="Direct link to Features Supported" title="Direct link to Features Supported">​</a></h2>
<ul>
<li><strong>Prompt Caching</strong>: Reduce costs with cached prompts ($0.03/M tokens for cache read, $0.375/M tokens for cache write)</li>
<li><strong>Function Calling</strong>: Built-in tool calling support</li>
<li><strong>Reasoning</strong>: Advanced reasoning capabilities with thinking support</li>
<li><strong>System Messages</strong>: Full system message support</li>
<li><strong>Cost Tracking</strong>: Automatic cost calculation for all requests</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="docker-image">Docker Image<a href="https://docs.litellm.ai/blog/minimax_m2_5#docker-image" class="hash-link" aria-label="Direct link to Docker Image" title="Direct link to Docker Image">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker pull litellm/litellm:v1.81.3-stable</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---openai-compatible-api-v1chatcompletions">Usage - OpenAI Compatible API (/v1/chat/completions)<a href="https://docs.litellm.ai/blog/minimax_m2_5#usage---openai-compatible-api-v1chatcompletions" class="hash-link" aria-label="Direct link to Usage - OpenAI Compatible API (/v1/chat/completions)" title="Direct link to Usage - OpenAI Compatible API (/v1/chat/completions)">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> minimax</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">m2</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">5</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> minimax/MiniMax</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">M2.5</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/MINIMAX_API_KEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_base</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> https</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//api.minimax.io/v1</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e MINIMAX_API_KEY=$MINIMAX_API_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:v1.81.3-stable \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "minimax-m2-5",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="with-reasoning-split">With Reasoning Split<a href="https://docs.litellm.ai/blog/minimax_m2_5#with-reasoning-split" class="hash-link" aria-label="Direct link to With Reasoning Split" title="Direct link to With Reasoning Split">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "minimax-m2-5",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "Solve: 2+2=?"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "extra_body": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "reasoning_split": true</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---anthropic-compatible-api-v1messages">Usage - Anthropic Compatible API (/v1/messages)<a href="https://docs.litellm.ai/blog/minimax_m2_5#usage---anthropic-compatible-api-v1messages" class="hash-link" aria-label="Direct link to Usage - Anthropic Compatible API (/v1/messages)" title="Direct link to Usage - Anthropic Compatible API (/v1/messages)">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> minimax</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">m2</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">5</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> minimax/MiniMax</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">M2.5</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/MINIMAX_API_KEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_base</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> https</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//api.minimax.io/anthropic/v1/messages</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e MINIMAX_API_KEY=$MINIMAX_API_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:v1.81.3-stable \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/v1/messages' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "minimax-m2-5",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "max_tokens": 1000,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="with-thinking">With Thinking<a href="https://docs.litellm.ai/blog/minimax_m2_5#with-thinking" class="hash-link" aria-label="Direct link to With Thinking" title="Direct link to With Thinking">​</a></h3>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/v1/messages' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "minimax-m2-5",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "max_tokens": 1000,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "thinking": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "type": "enabled",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "budget_tokens": 1000</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  },</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "Solve: 2+2=?"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---litellm-sdk">Usage - LiteLLM SDK<a href="https://docs.litellm.ai/blog/minimax_m2_5#usage---litellm-sdk" class="hash-link" aria-label="Direct link to Usage - LiteLLM SDK" title="Direct link to Usage - LiteLLM SDK">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="openai-compatible-api">OpenAI-compatible API<a href="https://docs.litellm.ai/blog/minimax_m2_5#openai-compatible-api" class="hash-link" aria-label="Direct link to OpenAI-compatible API" title="Direct link to OpenAI-compatible API">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> litellm</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"minimax/MiniMax-M2.5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello, how are you?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-minimax-api-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_base</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"https://api.minimax.io/v1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="anthropic-compatible-api">Anthropic-compatible API<a href="https://docs.litellm.ai/blog/minimax_m2_5#anthropic-compatible-api" class="hash-link" aria-label="Direct link to Anthropic-compatible API" title="Direct link to Anthropic-compatible API">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> litellm</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">anthropic</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">messages</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">acreate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"minimax/MiniMax-M2.5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello, how are you?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-minimax-api-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_base</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"https://api.minimax.io/anthropic/v1/messages"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    max_tokens</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1000</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="with-thinking-1">With Thinking<a href="https://docs.litellm.ai/blog/minimax_m2_5#with-thinking-1" class="hash-link" aria-label="Direct link to With Thinking" title="Direct link to With Thinking">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">anthropic</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">messages</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">acreate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"minimax/MiniMax-M2.5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Solve: 2+2=?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    thinking</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"enabled"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"budget_tokens"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1000</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-minimax-api-key"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Access thinking content</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> block </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token builtin">hasattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">block</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'type'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> block</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">type</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'thinking'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Thinking: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">block</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">thinking</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="with-reasoning-split-openai-api">With Reasoning Split (OpenAI API)<a href="https://docs.litellm.ai/blog/minimax_m2_5#with-reasoning-split-openai-api" class="hash-link" aria-label="Direct link to With Reasoning Split (OpenAI API)" title="Direct link to With Reasoning Split (OpenAI API)">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"minimax/MiniMax-M2.5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Solve: 2+2=?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    extra_body</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"reasoning_split"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-minimax-api-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_base</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"https://api.minimax.io/v1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Access thinking and response</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token builtin">hasattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'reasoning_details'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Thinking: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">response</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">choices</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation number" style="color:#36acaa">0</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">message</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">reasoning_details</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Response: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">response</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">choices</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation number" style="color:#36acaa">0</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">message</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">content</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="cost-tracking">Cost Tracking<a href="https://docs.litellm.ai/blog/minimax_m2_5#cost-tracking" class="hash-link" aria-label="Direct link to Cost Tracking" title="Direct link to Cost Tracking">​</a></h2>
<p>LiteLLM automatically tracks costs for MiniMax-M2.5 requests. The pricing is:</p>
<ul>
<li><strong>Input</strong>: $0.3 per 1M tokens</li>
<li><strong>Output</strong>: $1.2 per 1M tokens</li>
<li><strong>Cache Read</strong>: $0.03 per 1M tokens</li>
<li><strong>Cache Write</strong>: $0.375 per 1M tokens</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="accessing-cost-information">Accessing Cost Information<a href="https://docs.litellm.ai/blog/minimax_m2_5#accessing-cost-information" class="hash-link" aria-label="Direct link to Accessing Cost Information" title="Direct link to Accessing Cost Information">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"minimax/MiniMax-M2.5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hello!"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-minimax-api-key"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Access cost information</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Cost: $</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">response</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">_hidden_params</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">get</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'response_cost'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation number" style="color:#36acaa">0</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="streaming-support">Streaming Support<a href="https://docs.litellm.ai/blog/minimax_m2_5#streaming-support" class="hash-link" aria-label="Direct link to Streaming Support" title="Direct link to Streaming Support">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="openai-api">OpenAI API<a href="https://docs.litellm.ai/blog/minimax_m2_5#openai-api" class="hash-link" aria-label="Direct link to OpenAI API" title="Direct link to OpenAI API">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"minimax/MiniMax-M2.5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Tell me a story"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    stream</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-minimax-api-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_base</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"https://api.minimax.io/v1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> chunk </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> chunk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">delta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">chunk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">delta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> end</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="streaming-with-reasoning-split">Streaming with Reasoning Split<a href="https://docs.litellm.ai/blog/minimax_m2_5#streaming-with-reasoning-split" class="hash-link" aria-label="Direct link to Streaming with Reasoning Split" title="Direct link to Streaming with Reasoning Split">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">stream </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"minimax/MiniMax-M2.5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Tell me a story"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    extra_body</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"reasoning_split"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    stream</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-minimax-api-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_base</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"https://api.minimax.io/v1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">reasoning_buffer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">text_buffer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> chunk </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> stream</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token builtin">hasattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">chunk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">delta</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"reasoning_details"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> chunk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">delta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">reasoning_details</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> detail </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> chunk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">delta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">reasoning_details</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"text"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> detail</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                reasoning_text </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> detail</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                new_reasoning </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> reasoning_text</span><span class="token punctuation" style="color:#393A34">[</span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">reasoning_buffer</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> new_reasoning</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">new_reasoning</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> end</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> flush</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    reasoning_buffer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> reasoning_text</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> chunk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">delta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        content_text </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> chunk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">delta</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        new_text </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> content_text</span><span class="token punctuation" style="color:#393A34">[</span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">text_buffer</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> text_buffer </span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> content_text</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> new_text</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">new_text</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> end</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> flush</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            text_buffer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> content_text</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="using-with-native-sdks">Using with Native SDKs<a href="https://docs.litellm.ai/blog/minimax_m2_5#using-with-native-sdks" class="hash-link" aria-label="Direct link to Using with Native SDKs" title="Direct link to Using with Native SDKs">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="anthropic-sdk-via-litellm-proxy">Anthropic SDK via LiteLLM Proxy<a href="https://docs.litellm.ai/blog/minimax_m2_5#anthropic-sdk-via-litellm-proxy" class="hash-link" aria-label="Direct link to Anthropic SDK via LiteLLM Proxy" title="Direct link to Anthropic SDK via LiteLLM Proxy">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> os</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">environ</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"ANTHROPIC_BASE_URL"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"http://localhost:4000"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">environ</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"ANTHROPIC_API_KEY"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"sk-1234"</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Your LiteLLM proxy key</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> anthropic</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">client </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> anthropic</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">Anthropic</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">message </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">messages</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"minimax-m2-5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    max_tokens</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1000</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    system</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"You are a helpful assistant."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hi, how are you?"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> block </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> block</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">type</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"thinking"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Thinking:\n</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">block</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">thinking</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">\n"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">elif</span><span class="token plain"> block</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">type</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Text:\n</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">block</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">text</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">\n"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="openai-sdk-via-litellm-proxy">OpenAI SDK via LiteLLM Proxy<a href="https://docs.litellm.ai/blog/minimax_m2_5#openai-sdk-via-litellm-proxy" class="hash-link" aria-label="Direct link to OpenAI SDK via LiteLLM Proxy" title="Direct link to OpenAI SDK via LiteLLM Proxy">​</a></h3>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> os</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">environ</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"OPENAI_BASE_URL"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"http://localhost:4000"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">os</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">environ</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"OPENAI_API_KEY"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"sk-1234"</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Your LiteLLM proxy key</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> openai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OpenAI</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">client </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OpenAI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completions</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"minimax-m2-5"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"system"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"You are a helpful assistant."</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Hi, how are you?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    extra_body</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"reasoning_split"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Access thinking and response</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token builtin">hasattr</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'reasoning_details'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Thinking:\n</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">response</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">choices</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation number" style="color:#36acaa">0</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">message</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">reasoning_details</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation number" style="color:#36acaa">0</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'text'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">\n"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Text:\n</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">response</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">choices</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation number" style="color:#36acaa">0</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">message</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">content</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">\n"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>]]></content:encoded>
            <category>minimax</category>
            <category>M2.5</category>
            <category>llm</category>
        </item>
        <item>
            <title><![CDATA[Incident Report: Invalid model cost map on main]]></title>
            <link>https://docs.litellm.ai/blog/model-cost-map-incident</link>
            <guid>https://docs.litellm.ai/blog/model-cost-map-incident</guid>
            <pubDate>Tue, 10 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Date: January 27, 2026]]></description>
            <content:encoded><![CDATA[<p><strong>Date:</strong> January 27, 2026
<strong>Duration:</strong> ~20 minutes
<strong>Severity:</strong> Low
<strong>Status:</strong> Resolved</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="summary">Summary<a href="https://docs.litellm.ai/blog/model-cost-map-incident#summary" class="hash-link" aria-label="Direct link to Summary" title="Direct link to Summary">​</a></h2>
<p>A malformed JSON entry in <code>model_prices_and_context_window.json</code> was merged to <code>main</code> (<a href="https://github.com/BerriAI/litellm/commit/562f0a028251750e3d75386bee0e630d9796d0df" target="_blank" rel="noopener noreferrer"><code>562f0a0</code></a>). This caused LiteLLM to silently fall back to a stale local copy of the model cost map. Users on older package versions lost cost tracking for newer models only (e.g. <code>azure/gpt-5.2</code>). No LLM calls were blocked.</p>
<ul>
<li><strong>LLM calls and proxy routing:</strong> No impact.</li>
<li><strong>Cost tracking:</strong> Impacted for newer models not present in the local backup. Older models were unaffected. The incident lasted ~20 minutes until the commit was reverted.</li>
</ul>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="background">Background<a href="https://docs.litellm.ai/blog/model-cost-map-incident#background" class="hash-link" aria-label="Direct link to Background" title="Direct link to Background">​</a></h2>
<p>The model cost map is not in the request path. It is used after the LLM response comes back, inside a try/catch, to calculate spend. A missing entry never blocks a call.</p>
<!-- -->
<p>Both paths return a response to the caller. When the cost map lookup fails, the only difference is <code>cost=0</code> on that request.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="root-cause">Root cause<a href="https://docs.litellm.ai/blog/model-cost-map-incident#root-cause" class="hash-link" aria-label="Direct link to Root cause" title="Direct link to Root cause">​</a></h2>
<p>LiteLLM fetches the model cost map from GitHub <code>main</code> at import time. If the fetch fails, it falls back to a local backup bundled with the package. Before this incident, the fallback was completely silent -- no warning was logged.</p>
<p>A contributor PR introduced an extra <code>{</code> bracket, producing invalid JSON. The remote fetch failed with <code>JSONDecodeError</code>, triggering the silent fallback. Users on older package versions had backup files missing newer models.</p>
<p><strong>Timeline:</strong></p>
<ol>
<li>Malformed JSON merged to <code>main</code></li>
<li>LiteLLM installations fall back to local backup on next import</li>
<li>Users report <code>"This model isn't mapped yet"</code> for newer models</li>
<li>Bad commit identified and reverted (~20 minutes)</li>
</ol>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="remediation">Remediation<a href="https://docs.litellm.ai/blog/model-cost-map-incident#remediation" class="hash-link" aria-label="Direct link to Remediation" title="Direct link to Remediation">​</a></h2>
<table><thead><tr><th>#</th><th>Action</th><th>Status</th><th>Code</th></tr></thead><tbody><tr><td>1</td><td>CI validation on <code>model_prices_and_context_window.json</code></td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/.github/workflows/test-model-map.yaml" target="_blank" rel="noopener noreferrer"><code>test-model-map.yaml</code></a></td></tr><tr><td>2</td><td>Warning log on fallback to local backup</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm_core_utils/get_model_cost_map.py#L57-L68" target="_blank" rel="noopener noreferrer"><code>get_model_cost_map.py#L57-L68</code></a></td></tr><tr><td>3</td><td><code>GetModelCostMap</code> class with integrity validation helpers</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/litellm/litellm_core_utils/get_model_cost_map.py#L24-L149" target="_blank" rel="noopener noreferrer"><code>get_model_cost_map.py#L24-L149</code></a></td></tr><tr><td>4</td><td>Resilience test suite (bad hosted map, fallback, completion)</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/tests/llm_translation/test_model_cost_map_resilience.py#L150-L291" target="_blank" rel="noopener noreferrer"><code>test_model_cost_map_resilience.py#L150-L291</code></a></td></tr><tr><td>5</td><td>Test that backup model cost map always exists and contains common models</td><td>✅ Done</td><td><a href="https://github.com/BerriAI/litellm/blob/main/tests/llm_translation/test_model_cost_map_resilience.py#L213-L228" target="_blank" rel="noopener noreferrer"><code>test_model_cost_map_resilience.py#L213-L228</code></a></td></tr></tbody></table>
<p>Enterprises that require zero external dependencies at import time can set <code>LITELLM_LOCAL_MODEL_COST_MAP=True</code> to skip the GitHub fetch entirely.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="other-dependencies-on-external-resources">Other dependencies on external resources<a href="https://docs.litellm.ai/blog/model-cost-map-incident#other-dependencies-on-external-resources" class="hash-link" aria-label="Direct link to Other dependencies on external resources" title="Direct link to Other dependencies on external resources">​</a></h2>
<table><thead><tr><th>Dependency</th><th>Impact if unavailable</th><th>Fallback</th></tr></thead><tbody><tr><td>Model cost map (GitHub)</td><td>Cost tracking for newer models</td><td>Local backup (now with warning)</td></tr><tr><td>JWT public keys (IDP/SSO)</td><td>Auth fails</td><td>None</td></tr><tr><td>OIDC UserInfo (IDP/SSO)</td><td>Auth fails</td><td>None</td></tr><tr><td>HuggingFace model API</td><td>HF provider calls fail</td><td>None</td></tr><tr><td>Ollama tags (localhost)</td><td>Ollama model list stale</td><td>Static list</td></tr></tbody></table>]]></content:encoded>
            <category>incident-report</category>
            <category>stability</category>
        </item>
        <item>
            <title><![CDATA[Your Middleware Could Be a Bottleneck]]></title>
            <link>https://docs.litellm.ai/blog/fastapi-middleware-performance</link>
            <guid>https://docs.litellm.ai/blog/fastapi-middleware-performance</guid>
            <pubDate>Sat, 07 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[How we improved LiteLLM proxy latency and throughput by replacing a single middleware base class]]></description>
            <content:encoded><![CDATA[<blockquote>
<p>How we improved LiteLLM proxy latency and throughput by replacing a single, simple middleware base class</p>
</blockquote>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="our-setup">Our Setup<a href="https://docs.litellm.ai/blog/fastapi-middleware-performance#our-setup" class="hash-link" aria-label="Direct link to Our Setup" title="Direct link to Our Setup">​</a></h2>
<p>The LiteLLM proxy server has two middleware layers. The first is Starlette's <code>CORSMiddleware</code> (re-exported by FastAPI), which is a pure ASGI middleware. Then we have a simple BaseHTTPMiddleware called PrometheusAuthMiddleware.</p>
<p>The job of <code>PrometheusAuthMiddleware</code> is to authenticate requests to the <code>/metrics</code> endpoint. It's not on by default, you enable it with a flag in your proxy config:</p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Proxy config flag</summary><div><div class="collapsibleContent_i85q"><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">litellm_settings</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">require_auth_for_metrics_endpoint</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><br></span></code></pre></div></div></div></div></details>
<p>The middleware checks two things: is the request hitting <code>/metrics</code>, and is auth even enabled? If both checks fail, which they do for the vast majority of requests, it just passes the request through unchanged.</p>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>PrometheusAuthMiddleware source</summary><div><div class="collapsibleContent_i85q"><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">PrometheusAuthMiddleware</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">BaseHTTPMiddleware</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">dispatch</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> request</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Request</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> call_next</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">_is_prometheus_metrics_endpoint</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">request</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">_should_run_auth_on_metrics_endpoint</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">is</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">try</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> user_api_key_auth</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">request</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">request</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">except</span><span class="token plain"> Exception </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> e</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> JSONResponse</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">status_code</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">401</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> content</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> call_next</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">request</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> response</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token decorator annotation punctuation" style="color:#393A34">@staticmethod</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">_is_prometheus_metrics_endpoint</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">request</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Request</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"/metrics"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> request</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">url</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">path</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">False</span><br></span></code></pre></div></div></div></div></details>
<p>Looks harmless. Subclass <code>BaseHTTPMiddleware</code>, implement <code>dispatch()</code>, done. This is what you will see in Starlette's documentation<sup><a href="https://docs.litellm.ai/blog/fastapi-middleware-performance#footnote-1">1</a></sup>.</p>
<!-- -->
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-basehttpmiddleware-actually-does">What BaseHTTPMiddleware Actually Does<a href="https://docs.litellm.ai/blog/fastapi-middleware-performance#what-basehttpmiddleware-actually-does" class="hash-link" aria-label="Direct link to What BaseHTTPMiddleware Actually Does" title="Direct link to What BaseHTTPMiddleware Actually Does">​</a></h2>
<p>When you write a <code>dispatch()</code> method, you'd expect the request to flow straight through your function and out the other side. What actually happens is much more involved.</p>
<p>On every request, even a pure passthrough (meaning nothing happens), <code>BaseHTTPMiddleware</code> creates <strong>7 intermediate objects and tasks</strong>:</p>
<div class="pipelineWrapper_CM5g"><div class="pipelineLabel_D7du">7 steps per request</div><div class="pipeline_TYHo"><div class="stageWrapper_uu9t"><div class="stage_ghjK stageActive_OZgp" role="button" tabindex="0"><div class="stageNumber_q67i">1</div><div class="stageLabel_eQoj">Request Wrapping</div><div class="stageSubtitle_GkOJ">_CachedRequest</div></div></div><div class="stageWrapper_uu9t"><div class="stage_ghjK" role="button" tabindex="0"><div class="stageNumber_q67i">2</div><div class="stageLabel_eQoj">Sync Event</div><div class="stageSubtitle_GkOJ">anyio.Event()</div></div></div><div class="stageWrapper_uu9t"><div class="stage_ghjK" role="button" tabindex="0"><div class="stageNumber_q67i">3</div><div class="stageLabel_eQoj">Memory Stream</div><div class="stageSubtitle_GkOJ">create_memory_object_stream()</div></div></div><div class="stageWrapper_uu9t"><div class="stage_ghjK" role="button" tabindex="0"><div class="stageNumber_q67i">4</div><div class="stageLabel_eQoj">Task Group</div><div class="stageSubtitle_GkOJ">create_task_group()</div></div></div><div class="stageWrapper_uu9t"><div class="stage_ghjK" role="button" tabindex="0"><div class="stageNumber_q67i">5</div><div class="stageLabel_eQoj">Background Task</div><div class="stageSubtitle_GkOJ">task_group.start_soon(coro)</div></div></div><div class="stageWrapper_uu9t"><div class="stage_ghjK" role="button" tabindex="0"><div class="stageNumber_q67i">6</div><div class="stageLabel_eQoj">Nested Task Group</div><div class="stageSubtitle_GkOJ">receive_or_disconnect()</div></div></div><div class="stageWrapper_uu9t"><div class="stage_ghjK" role="button" tabindex="0"><div class="stageNumber_q67i">7</div><div class="stageLabel_eQoj">Response Wrapping</div><div class="stageSubtitle_GkOJ">_StreamingResponse</div></div></div></div><div class="codePanel_YiAf"></div></div>
<p>It wraps the request in a new object to track body state, creates a synchronization event, allocates an in-memory channel to pass messages between your middleware and the inner app, sets up a task group to manage the lifecycle, and then runs your actual route handler in a <em>separate background task</em> when you call <code>call_next()</code>. The response body then flows back through that in-memory channel, gets re-wrapped in a streaming response object, and finally reaches the caller. That's a lot.</p>
<p>For a middleware that for us, does nothing on 99.9% of requests, paying this cost doesn't make sense.</p>
<p>Compare that to a pure ASGI middleware, which we can have just check the request path and continue along.</p>
<div class="pipelineWrapper_CM5g"><div class="pipelineLabel_D7du">2 steps per request</div><div class="pipeline_TYHo pipelineTwoCol_OfD8"><div class="stageWrapper_uu9t"><div class="stage_ghjK stageNoClick_nUno stageActiveGreen_rW5l"><div class="stageNumber_q67i">1</div><div class="stageLabel_eQoj">Scope Check</div><div class="stageSubtitle_GkOJ">scope["type"] != "http"</div></div></div><div class="stageWrapper_uu9t"><div class="stage_ghjK stageNoClick_nUno"><div class="stageNumber_q67i">2</div><div class="stageLabel_eQoj">Direct Call</div><div class="stageSubtitle_GkOJ">await self.app(scope, receive, send)</div></div></div></div></div>
<p>Our middleware is doing something really simple. For the vast majority of requests it doesn't need to do anything at all but just let the request pass through. It doesn't need task groups, memory streams, or cancel scopes. It needs a function call.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="comparing-both">Comparing Both<a href="https://docs.litellm.ai/blog/fastapi-middleware-performance#comparing-both" class="hash-link" aria-label="Direct link to Comparing Both" title="Direct link to Comparing Both">​</a></h2>
<p>We replaced the <code>BaseHTTPMiddleware</code> subclass with a pure ASGI middleware. To benchmark the difference, we used Apache Bench<sup><a href="https://docs.litellm.ai/blog/fastapi-middleware-performance#footnote-2">2</a></sup> to compare both configurations of LiteLLM's middleware stack: the old setup (1 pure ASGI + 1 <code>BaseHTTPMiddleware</code>) against the new setup (2 pure ASGI).</p>
<p>A minimal FastAPI app serves <code>GET /health</code> → <code>PlainTextResponse("ok")</code>. The endpoint does zero work to isolate the middleware overhead: any difference between configs is purely the cost of the middleware plumbing itself. Both middlewares are just calling the next layer. Same work, different base class.</p>
<p>Apache Bench (<code>ab</code>) fires requests at the server with 1,000 concurrent connections and a single uvicorn worker. One worker means one event loop, so the benchmark directly measures how each middleware design handles concurrent load on a single thread.</p>
<div class="benchmarkWrapper_SEev"><div class="benchmarkConfig_wDd7">50,000 requests · 1,000 concurrent · 1 worker</div><div class="benchmarkColumns_C5PV"><div class="benchmarkColumn_gN2m"><div class="columnTitle_Fzyb columnTitleBefore_RzCh">Before (1 ASGI + 1 BaseHTTP)</div><div class="flowStack_TMO_"><div class="dotsCanvas_rrGq"></div><div class="flowLayer_XD53">ab client</div><div class="flowArrow__5jf">↓</div><div class="flowLayer_XD53">uvicorn · 1 worker</div><div class="flowArrow__5jf">↓</div><div class="flowLayer_XD53">ASGI Middleware</div><div class="flowArrow__5jf">↓</div><div class="flowLayer_XD53 flowLayerWarning_ayjQ">BaseHTTPMiddleware<span class="overheadTag_AZ5N">← overhead</span></div><div class="flowArrow__5jf">↓</div><div class="flowLayer_XD53">GET /health → "ok"</div></div><div class="statsRow_p9rK"><div class="stat_TuZf"><div class="statValue_iwYZ">0</div><div class="statLabel_FWHG">RPS</div></div><div class="stat_TuZf"><div class="statValue_iwYZ">0</div><div class="statLabel_FWHG">Completed</div></div><div class="stat_TuZf"><div class="statValue_iwYZ">21<!-- -->ms</div><div class="statLabel_FWHG">P50</div></div></div><div class="progressBar_nIGl"><div class="progressFill_KFtX progressFillBefore_k_WL" style="width:0%"></div></div></div><div class="benchmarkColumn_gN2m"><div class="columnTitle_Fzyb columnTitleAfter_j950">After (2x Pure ASGI)</div><div class="flowStack_TMO_"><div class="dotsCanvas_rrGq"></div><div class="flowLayer_XD53">ab client</div><div class="flowArrow__5jf">↓</div><div class="flowLayer_XD53">uvicorn · 1 worker</div><div class="flowArrow__5jf">↓</div><div class="flowLayer_XD53">ASGI Middleware</div><div class="flowArrow__5jf">↓</div><div class="flowLayer_XD53">ASGI Middleware</div><div class="flowArrow__5jf">↓</div><div class="flowLayer_XD53">GET /health → "ok"</div></div><div class="statsRow_p9rK"><div class="stat_TuZf"><div class="statValue_iwYZ">0</div><div class="statLabel_FWHG">RPS</div></div><div class="stat_TuZf"><div class="statValue_iwYZ">0</div><div class="statLabel_FWHG">Completed</div></div><div class="stat_TuZf"><div class="statValue_iwYZ">13<!-- -->ms</div><div class="statLabel_FWHG">P50</div></div></div><div class="progressBar_nIGl"><div class="progressFill_KFtX progressFillAfter_Pyl7" style="width:0%"></div></div></div></div><div class="summaryStats_A1Vo"><div class="summaryItem_VKff"><div class="summaryValue_Qjnn">+74%</div><div class="summaryLabel_yLpM">Throughput (RPS)</div></div><div class="summaryItem_VKff"><div class="summaryValue_Qjnn">-38%</div><div class="summaryLabel_yLpM">Median Latency (P50)</div></div></div><div class="collapsible_WMul"><button class="collapsibleToggle_TQlk"><span class="collapsibleChevron_a6_J">▶</span>Per-run data (3 runs each)</button><div class="collapsibleContent_WC4_"><table class="dataTable_sb07"><thead><tr><th>Config</th><th>Run</th><th>RPS</th><th>P50 (ms)</th></tr></thead><tbody><tr><td>Before (1 ASGI + 1 BaseHTTP)</td><td>1</td><td>3,596</td><td>21</td></tr><tr><td>Before (1 ASGI + 1 BaseHTTP)</td><td>2</td><td>3,599</td><td>21</td></tr><tr><td>Before (1 ASGI + 1 BaseHTTP)</td><td>3</td><td>4,161</td><td>21</td></tr><tr><td>After (2x Pure ASGI)</td><td>1</td><td>6,504</td><td>13</td></tr><tr><td>After (2x Pure ASGI)</td><td>2</td><td>6,631</td><td>13</td></tr><tr><td>After (2x Pure ASGI)</td><td>3</td><td>6,595</td><td>13</td></tr></tbody></table></div></div></div>
<details class="details_lb9f alert alert--info details_b_Ee" data-collapsed="true"><summary>Try it yourself</summary><div><div class="collapsibleContent_i85q"><p>Save the script below as <code>benchmark_middleware.py</code>, then run:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain"># Terminal 1 — start the "before" server (1 ASGI + 1 BaseHTTPMiddleware)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">python benchmark_middleware.py --middleware mixed</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Terminal 2 — benchmark it</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ab -n 50000 -c 1000 http://localhost:8000/health</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Stop the server, then start the "after" server (2x pure ASGI)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">python benchmark_middleware.py --middleware asgi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"># Terminal 2 — benchmark again</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ab -n 50000 -c 1000 http://localhost:8000/health</span><br></span></code></pre></div></div><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> argparse</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> uvicorn</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> fastapi </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> FastAPI</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> fastapi</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">responses </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> PlainTextResponse</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> starlette</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">middleware</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">base </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BaseHTTPMiddleware</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> starlette</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">requests </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Request</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> starlette</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">types </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> ASGIApp</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Receive</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Scope</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> Send</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">NoOpBaseHTTPMiddleware</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">BaseHTTPMiddleware</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">dispatch</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> request</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Request</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> call_next</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> call_next</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">request</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">NoOpPureASGIMiddleware</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__init__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> app</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ASGIApp</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">app </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> app</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__call__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> scope</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Scope</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> receive</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Receive</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> send</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Send</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">app</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">scope</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> receive</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> send</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">create_app</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">middleware_type</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">|</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> layers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token builtin">int</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> FastAPI</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    app </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> FastAPI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token decorator annotation punctuation" style="color:#393A34">@app</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"/health"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">health</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> PlainTextResponse</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"ok"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> middleware_type </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"mixed"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        app</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_middleware</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">NoOpBaseHTTPMiddleware</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        app</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_middleware</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">NoOpPureASGIMiddleware</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">elif</span><span class="token plain"> middleware_type </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"asgi"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> _ </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">layers</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            app</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_middleware</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">NoOpPureASGIMiddleware</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> app</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> __name__ </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"__main__"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    parser </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> argparse</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ArgumentParser</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    parser</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_argument</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"--middleware"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> choices</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"asgi"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"mixed"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> default</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    parser</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_argument</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"--layers"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">type</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">int</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> default</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">2</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    parser</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_argument</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"--port"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">type</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">int</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> default</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">8000</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    args </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> parser</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">parse_args</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    app </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> create_app</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">middleware_type</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">args</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">middleware</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> layers</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">args</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">layers</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    uvicorn</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">app</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> host</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"0.0.0.0"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> port</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">args</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">port</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> workers</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> log_level</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"warning"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div></div></details>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="our-change">Our Change<a href="https://docs.litellm.ai/blog/fastapi-middleware-performance#our-change" class="hash-link" aria-label="Direct link to Our Change" title="Direct link to Our Change">​</a></h2>
<p>Here's what we replaced it with:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">PrometheusAuthMiddleware</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__init__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> app</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ASGIApp</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">app </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> app</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">async</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__call__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> scope</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Scope</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> receive</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Receive</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> send</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Send</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> scope</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">!=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"http"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">or</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"/metrics"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> scope</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"path"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">app</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">scope</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> receive</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> send</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">require_auth_for_metrics_endpoint </span><span class="token keyword" style="color:#00009f">is</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            request </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Request</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">scope</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> receive</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            api_key </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> request</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">headers</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"Authorization"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">or</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">try</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> user_api_key_auth</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">request</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">request</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">api_key</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">except</span><span class="token plain"> Exception </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> e</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token comment" style="color:#999988;font-style:italic"># send 401 directly via ASGI protocol</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">app</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">scope</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> receive</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> send</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>For the 99.9% of requests that aren't hitting <code>/metrics</code>, the middleware is now one dict lookup, one string check, and one function call. No objects allocated, no tasks spawned.</p>
<p>It's important to evaluate if the tools you're using are the right fit for the job as your software grows and handles more responsiblity. We're now putting in a static analysis check to prevent this from happening again with any newly introduced middlewares. If we find the use case is necessary then that's okay and we'll reevalute but for everything LiteLLM needs to do at the moment it's not.</p>
<p>This middleware change was one part of a broader optimization effort on the LiteLLM proxy. Across all optimizations combined, we've measured about a <strong>30% reduction in proxy overhead</strong> over the past two weeks.</p>
<hr>
<a id="footnote-1"></a>
<p><sup>1</sup> <a href="https://starlette.dev/middleware/#basehttpmiddleware" target="_blank" rel="noopener noreferrer">Starlette Middleware — BaseHTTPMiddleware</a></p>
<a id="footnote-2"></a>
<p><sup>2</sup> <a href="https://httpd.apache.org/docs/2.4/programs/ab.html" target="_blank" rel="noopener noreferrer">Apache HTTP server benchmarking tool (<code>ab</code>)</a></p>]]></content:encoded>
            <category>performance</category>
            <category>fastapi</category>
            <category>middleware</category>
        </item>
        <item>
            <title><![CDATA[Improve release stability with 24 hour load tests]]></title>
            <link>https://docs.litellm.ai/blog/litellm-observatory</link>
            <guid>https://docs.litellm.ai/blog/litellm-observatory</guid>
            <pubDate>Fri, 06 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[How we built a long-running, release-validation system to catch regressions before they reach users.]]></description>
            <content:encoded><![CDATA[<p><img decoding="async" loading="lazy" src="https://raw.githubusercontent.com/AlexsanderHamir/assets/main/Screenshot%202026-01-31%20175355.png" alt="LiteLLM Observatory" class="img_ev3q"></p>
<p>As LiteLLM adoption has grown, so have expectations around reliability, performance, and operational safety. Meeting those expectations requires more than correctness-focused tests, it requires validating how the system behaves over time, under real-world conditions.</p>
<p>This post introduces <strong>LiteLLM Observatory</strong>, a long-running release-validation system we built to catch regressions before they reach users.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="why-we-built-the-observatory">Why We Built the Observatory<a href="https://docs.litellm.ai/blog/litellm-observatory#why-we-built-the-observatory" class="hash-link" aria-label="Direct link to Why We Built the Observatory" title="Direct link to Why We Built the Observatory">​</a></h2>
<p>LiteLLM operates at the intersection of external providers, long-lived network connections, and high-throughput workloads. While our unit and integration tests do an excellent job validating correctness, they are not designed to surface issues that only appear after extended operation.</p>
<p>A subtle lifecycle edge case discovered in v1.81.3 reinforced the need for stronger release validation in this area.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="a-real-world-lifecycle-edge-case">A Real-World Lifecycle Edge Case<a href="https://docs.litellm.ai/blog/litellm-observatory#a-real-world-lifecycle-edge-case" class="hash-link" aria-label="Direct link to A Real-World Lifecycle Edge Case" title="Direct link to A Real-World Lifecycle Edge Case">​</a></h2>
<p>In v1.81.3, we shipped a fix for an HTTP client memory leak. The change passed unit and integration tests and behaved correctly in short-lived runs.</p>
<p>The issue that surfaced was not caused by a single incorrect line of logic, but by how multiple components interacted over time:</p>
<ul>
<li>A cached <code>httpx</code> client was configured with a 1-hour TTL</li>
<li>When the cache expired, the underlying HTTP connection was closed as expected</li>
<li>A higher-level client continued to hold a reference to that connection</li>
<li>Subsequent requests failed with:</li>
</ul>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Cannot send a request, as the client has been closed</span><br></span></code></pre></div></div>
<p><strong>Before (with bug):</strong></p>
<table><thead><tr><th>Provider</th><th>Requests</th><th>Success</th><th>Failures</th><th>Fail %</th></tr></thead><tbody><tr><td>OpenAI</td><td>720,000</td><td>432,000</td><td>288,000</td><td>40%</td></tr><tr><td>Azure</td><td>692,000</td><td>415,200</td><td>276,800</td><td>40%</td></tr></tbody></table>
<p><strong>After (fixed):</strong></p>
<table><thead><tr><th>Provider</th><th>Requests</th><th>Success</th><th>Failures</th><th>Fail %</th></tr></thead><tbody><tr><td>OpenAI</td><td>1,200,000</td><td>1,199,988</td><td>12</td><td>0.001%</td></tr><tr><td>Azure</td><td>1,150,000</td><td>1,149,982</td><td>18</td><td>0.002%</td></tr></tbody></table>
<p>Our focus moving forward is on being the first to detect issues, even when they aren’t covered by unit tests. LiteLLM Observatory is designed to surface latency regressions, OOMs, and failure modes that only appear under real traffic patterns in <strong>our own production deployments</strong> during release validation.</p>
<hr>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-the-observatory-works">How the Observatory Works<a href="https://docs.litellm.ai/blog/litellm-observatory#how-the-observatory-works" class="hash-link" aria-label="Direct link to How the Observatory Works" title="Direct link to How the Observatory Works">​</a></h3>
<p><a href="https://github.com/BerriAI/litellm-observatory" target="_blank" rel="noopener noreferrer">LiteLLM Observatory</a> is a testing service that runs long-running tests against our LiteLLM deployments. We trigger tests by sending API requests, and results are automatically sent to Slack when tests complete.</p>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="how-tests-run">How Tests Run<a href="https://docs.litellm.ai/blog/litellm-observatory#how-tests-run" class="hash-link" aria-label="Direct link to How Tests Run" title="Direct link to How Tests Run">​</a></h4>
<ol>
<li>
<p><strong>Start a Test</strong>: We send a request to the Observatory API with:</p>
<ul>
<li>Which LiteLLM deployment to test (URL and API key)</li>
<li>Which test to run (e.g., <code>TestOAIAzureRelease</code>)</li>
<li>Test settings (which models to test, how long to run, failure thresholds)</li>
</ul>
</li>
<li>
<p><strong>Smart Queueing</strong>:</p>
<ul>
<li>The system checks whether we are attempting to run the exact same test more than once</li>
<li>If a duplicate test is already running or queued, we receive an error to avoid wasting resources</li>
<li>Otherwise, the test is added to a queue and runs when capacity is available (up to 5 tests can run concurrently by default)</li>
</ul>
</li>
<li>
<p><strong>Instant Response</strong>: The API responds immediately—we do not wait for the test to finish. Tests may run for hours, but the request itself completes in milliseconds.</p>
</li>
<li>
<p><strong>Background Execution</strong>:</p>
<ul>
<li>The test runs in the background, issuing requests against our LiteLLM deployment</li>
<li>It tracks request success and failure rates over time</li>
<li>When the test completes, results are automatically posted to our Slack channel</li>
</ul>
</li>
</ol>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="example-the-openai--azure-reliability-test">Example: The OpenAI / Azure Reliability Test<a href="https://docs.litellm.ai/blog/litellm-observatory#example-the-openai--azure-reliability-test" class="hash-link" aria-label="Direct link to Example: The OpenAI / Azure Reliability Test" title="Direct link to Example: The OpenAI / Azure Reliability Test">​</a></h4>
<p>The <code>TestOAIAzureRelease</code> test is designed to catch a class of bugs that only surface after sustained runtime:</p>
<ul>
<li><strong>Duration</strong>: Runs continuously for 3 hours</li>
<li><strong>Behavior</strong>: Cycles through specified models (such as <code>gpt-4</code> and <code>gpt-3.5-turbo</code>), issuing requests continuously</li>
<li><strong>Why 3 Hours</strong>: This helps catch issues where HTTP clients degrade or fail after extended use (for example, a bug observed in LiteLLM v1.81.3)</li>
<li><strong>Pass / Fail Criteria</strong>: The test passes if fewer than 1% of requests fail. If the failure rate exceeds 1%, the test fails and we are notified in Slack</li>
<li><strong>Key Detail</strong>: The same HTTP client is reused for the entire run, allowing us to detect lifecycle-related bugs that only appear under prolonged reuse</li>
</ul>
<h4 class="anchor anchorWithStickyNavbar_LWe7" id="when-we-use-it">When We Use It<a href="https://docs.litellm.ai/blog/litellm-observatory#when-we-use-it" class="hash-link" aria-label="Direct link to When We Use It" title="Direct link to When We Use It">​</a></h4>
<ul>
<li><strong>Before Deployments</strong>: Run tests before promoting a new LiteLLM version to production</li>
<li><strong>Routine Validation</strong>: Schedule regular runs (daily or weekly) to catch regressions early</li>
<li><strong>Issue Investigation</strong>: Run tests on demand when we suspect a deployment issue</li>
<li><strong>Long-Running Failure Detection</strong>: Identify bugs that only appear under sustained load, beyond what short smoke tests can reveal</li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="complementing-unit-tests">Complementing Unit Tests<a href="https://docs.litellm.ai/blog/litellm-observatory#complementing-unit-tests" class="hash-link" aria-label="Direct link to Complementing Unit Tests" title="Direct link to Complementing Unit Tests">​</a></h3>
<p>Unit tests remain a foundational part of our development process. They are fast and precise, but they don’t cover:</p>
<ul>
<li>Real provider behavior</li>
<li>Long-lived network interactions</li>
<li>Resource lifecycle edge cases</li>
<li>Time-dependent regressions</li>
</ul>
<p>LiteLLM Observatory complements unit tests by validating the system as it actually runs in production-like environments.</p>
<hr>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="looking-ahead">Looking Ahead<a href="https://docs.litellm.ai/blog/litellm-observatory#looking-ahead" class="hash-link" aria-label="Direct link to Looking Ahead" title="Direct link to Looking Ahead">​</a></h3>
<p>Reliability is an ongoing investment.</p>
<p>LiteLLM Observatory is one of several systems we’re building to continuously raise the bar on release quality and operational safety. As LiteLLM evolves, so will our validation tooling, informed by real-world usage and lessons learned.</p>
<p>We’ll continue to share those improvements openly as we go.</p>]]></content:encoded>
            <category>testing</category>
            <category>observability</category>
            <category>reliability</category>
            <category>releases</category>
        </item>
        <item>
            <title><![CDATA[Day 0 Support: Claude Opus 4.6]]></title>
            <link>https://docs.litellm.ai/blog/claude_opus_4_6</link>
            <guid>https://docs.litellm.ai/blog/claude_opus_4_6</guid>
            <pubDate>Thu, 05 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Day 0 support for Claude Opus 4.6 on LiteLLM AI Gateway - use across Anthropic, Azure, Vertex AI, and Bedrock.]]></description>
            <content:encoded><![CDATA[<p>LiteLLM now supports Claude Opus 4.6 on Day 0. Use it across Anthropic, Azure, Vertex AI, and Bedrock through the LiteLLM AI Gateway.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="docker-image">Docker Image<a href="https://docs.litellm.ai/blog/claude_opus_4_6#docker-image" class="hash-link" aria-label="Direct link to Docker Image" title="Direct link to Docker Image">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker pull ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6</span><br></span></code></pre></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---anthropic">Usage - Anthropic<a href="https://docs.litellm.ai/blog/claude_opus_4_6#usage---anthropic" class="hash-link" aria-label="Direct link to Usage - Anthropic" title="Direct link to Usage - Anthropic">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">opus</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> anthropic/claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">opus</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/ANTHROPIC_API_KEY</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---azure">Usage - Azure<a href="https://docs.litellm.ai/blog/claude_opus_4_6#usage---azure" class="hash-link" aria-label="Direct link to Usage - Azure" title="Direct link to Usage - Azure">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">opus</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> azure_ai/claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">opus</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/AZURE_AI_API_KEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_base</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/AZURE_AI_API_BASE  </span><span class="token comment" style="color:#999988;font-style:italic"># https://&lt;resource&gt;.services.ai.azure.com</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e AZURE_AI_API_KEY=$AZURE_AI_API_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e AZURE_AI_API_BASE=$AZURE_AI_API_BASE \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---vertex-ai">Usage - Vertex AI<a href="https://docs.litellm.ai/blog/claude_opus_4_6#usage---vertex-ai" class="hash-link" aria-label="Direct link to Usage - Vertex AI" title="Direct link to Usage - Vertex AI">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">opus</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> vertex_ai/claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">opus</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">vertex_project</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/VERTEX_PROJECT</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">vertex_location</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> us</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">east5</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e VERTEX_PROJECT=$VERTEX_PROJECT \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e GOOGLE_APPLICATION_CREDENTIALS=/app/credentials.json \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/credentials.json:/app/credentials.json \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="usage---bedrock">Usage - Bedrock<a href="https://docs.litellm.ai/blog/claude_opus_4_6#usage---bedrock" class="hash-link" aria-label="Direct link to Usage - Bedrock" title="Direct link to Usage - Bedrock">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">LiteLLM Proxy</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">opus</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">6</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> bedrock/anthropic.claude</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">opus</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">4</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">6</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">v1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">aws_access_key_id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/AWS_ACCESS_KEY_ID</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">aws_secret_access_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/AWS_SECRET_ACCESS_KEY</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">aws_region_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> us</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">east</span><span class="token punctuation" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">1</span><br></span></code></pre></div></div><p><strong>2. Start the proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run -d \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -v $(pwd)/config.yaml:/app/config.yaml \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  --config /app/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Test it!</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "what llm are you"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="advanced-features">Advanced Features<a href="https://docs.litellm.ai/blog/claude_opus_4_6#advanced-features" class="hash-link" aria-label="Direct link to Advanced Features" title="Direct link to Advanced Features">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="compaction">Compaction<a href="https://docs.litellm.ai/blog/claude_opus_4_6#compaction" class="hash-link" aria-label="Direct link to Compaction" title="Direct link to Compaction">​</a></h3>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">/chat/completions</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">/v1/messages</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p>Litellm supports enabling compaction for the new claude-opus-4-6.</p><p><strong>Enabling Compaction</strong></p><p>To enable compaction, add the <code>context_management</code> parameter with the <code>compact_20260112</code> edit type:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "What is the weather in San Francisco?"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "context_management": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "edits": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "type": "compact_20260112"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  },</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "max_tokens": 100</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div><p>All the parameters supported for context_management by anthropic are supported and can be directly added. Litellm automatically adds the <code>compact-2026-01-12</code> beta header in the request.</p></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p>Enable compaction to reduce context size while preserving key information. LiteLLM automatically adds the <code>compact-2026-01-12</code> beta header when compaction is enabled.</p><div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p><strong>Provider Support:</strong> Compaction is supported on Anthropic, Azure AI, and Vertex AI. It is <strong>not supported</strong> on Bedrock (Invoke or Converse APIs).</p></div></div><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/v1/messages' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'x-api-key: sk-12345' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'content-type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "max_tokens": 4096,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "content": "Hi"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "context_management": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "edits": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                "type": "compact_20260112"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div></div></div>
<p><strong>Response with Compaction Block</strong></p>
<p>The response will include the compaction summary in <code>provider_specific_fields.compaction_blocks</code>:</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"id"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"chatcmpl-a6c105a3-4b25-419e-9551-c800633b6cb2"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"created"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1770357619</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"model"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"claude-opus-4-6"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"object"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"chat.completion"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"choices"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"finish_reason"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"length"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"index"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"message"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"content"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"I don't have access to real-time data, so I can't provide the current weather in San Francisco. To get up-to-date weather information, I'd recommend checking:\n\n- **Weather websites** like weather.com, accuweather.com, or wunderground.com\n- **Search engines** – just Google \"San Francisco weather\"\n- **Weather apps** on your phone (e.g., Apple Weather, Google Weather)\n- **National"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"role"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"assistant"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"provider_specific_fields"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token property" style="color:#36acaa">"compaction_blocks"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token property" style="color:#36acaa">"type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"compaction"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token property" style="color:#36acaa">"content"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Summary of the conversation: The user requested help building a web scraper..."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"usage"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"completion_tokens"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">100</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"prompt_tokens"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">86</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"total_tokens"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">186</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p><strong>Using Compaction Blocks in Follow-up Requests</strong></p>
<p>To continue the conversation with compaction, include the compaction block in the assistant message's <code>provider_specific_fields</code>:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "How can I build a web scraper?"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    },</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "assistant",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          "type": "text",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          "text": "Certainly! To build a basic web scraper, you'll typically use a programming language like Python along with libraries such as `requests` (for fetching web pages) and `BeautifulSoup` (for parsing HTML). Here's a basic example:\n\n```python\nimport requests\nfrom bs4 import BeautifulSoup\n\nurl = 'https://example.com'\nresponse = requests.get(url)\nsoup = BeautifulSoup(response.text, 'html.parser')\n\n# Extract and print all text\ntext = soup.get_text()\nprint(text)\n```\n\nLet me know what you're interested in scraping or if you need help with a specific website!"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "provider_specific_fields": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "compaction_blocks": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "type": "compaction",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "content": "Summary of the conversation: The user asked how to build a web scraper, and the assistant gave an overview using Python with requests and BeautifulSoup."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    },</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "How do I use it to scrape product prices?"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "context_management": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "edits": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "type": "compact_20260112"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  },</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "max_tokens": 100</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div>
<p><strong>Streaming Support</strong></p>
<p>Compaction blocks are also supported in streaming mode. You'll receive:</p>
<ul>
<li><code>compaction_start</code> event when a compaction block begins</li>
<li><code>compaction_delta</code> events with the compaction content</li>
<li>The accumulated <code>compaction_blocks</code> in <code>provider_specific_fields</code></li>
</ul>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="adaptive-thinking">Adaptive Thinking<a href="https://docs.litellm.ai/blog/claude_opus_4_6#adaptive-thinking" class="hash-link" aria-label="Direct link to Adaptive Thinking" title="Direct link to Adaptive Thinking">​</a></h3>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>When using <code>reasoning_effort</code> with Claude Opus 4.6, all values (<code>low</code>, <code>medium</code>, <code>high</code>) are mapped to <code>thinking: {type: "adaptive"}</code>. To use explicit thinking budgets with <code>type: "enabled"</code>, pass the native <code>thinking</code> parameter directly (see "Native thinking param" tab below).</p></div></div>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">/chat/completions</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">/v1/messages</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">Native thinking param</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p>LiteLLM supports adaptive thinking through the <code>reasoning_effort</code> parameter:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "Solve this complex problem: What is the optimal strategy for..."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "reasoning_effort": "high"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p>Use the <code>thinking</code> parameter with <code>type: "adaptive"</code> to enable adaptive thinking mode:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/v1/messages' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'x-api-key: sk-12345' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'content-type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "max_tokens": 16000,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "thinking": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "type": "adaptive"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    },</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "content": "Explain why the sum of two even numbers is always even."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p>Use the <code>thinking</code> parameter directly for adaptive thinking via the SDK:</p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> litellm</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> litellm</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"anthropic/claude-opus-4-6"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Solve this complex problem: What is the optimal strategy for..."</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  thinking</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"adaptive"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="effort-levels">Effort Levels<a href="https://docs.litellm.ai/blog/claude_opus_4_6#effort-levels" class="hash-link" aria-label="Direct link to Effort Levels" title="Direct link to Effort Levels">​</a></h3>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">/chat/completions</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">/v1/messages</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p>Four effort levels available: <code>low</code>, <code>medium</code>, <code>high</code> (default), and <code>max</code>. Pass directly via the <code>output_config</code> parameter:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "Explain quantum computing"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "output_config": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "effort": "medium"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div><p>You can use reasoning effort plus output_config to have more control on the model.</p></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p>Four effort levels available: <code>low</code>, <code>medium</code>, <code>high</code> (default), and <code>max</code>. Pass directly via the <code>output_config</code> parameter:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/v1/messages' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'x-api-key: sk-12345' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'content-type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "max_tokens": 4096,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "content": "Explain quantum computing"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "output_config": {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        "effort": "medium"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="1m-token-context-beta">1M Token Context (Beta)<a href="https://docs.litellm.ai/blog/claude_opus_4_6#1m-token-context-beta" class="hash-link" aria-label="Direct link to 1M Token Context (Beta)" title="Direct link to 1M Token Context (Beta)">​</a></h3>
<p>Opus 4.6 supports 1M token context. Premium pricing applies for prompts exceeding 200k tokens ($10/$37.50 per million input/output tokens). LiteLLM supports cost calculations for 1M token contexts.</p>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">/chat/completions</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">/v1/messages</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p>To use the 1M token context window, you need to forward the <code>anthropic-beta</code> header from your client to the LLM provider.</p><p><strong>Step 1: Enable header forwarding in your config</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">general_settings</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">forward_client_headers_to_llm_api</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><br></span></code></pre></div></div><p><strong>Step 2: Send requests with the beta header</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'anthropic-beta: context-1m-2025-08-07' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "Analyze this large document..."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p>To use the 1M token context window, you need to forward the <code>anthropic-beta</code> header from your client to the LLM provider.</p><p><strong>Step 1: Enable header forwarding in your config</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">general_settings</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">forward_client_headers_to_llm_api</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><br></span></code></pre></div></div><p><strong>Step 2: Send requests with the beta header</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/v1/messages' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'x-api-key: sk-12345' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'anthropic-beta: context-1m-2025-08-07' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'content-type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "max_tokens": 16000,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "content": "Analyze this large document..."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div><div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>tip</div><div class="admonitionContent_BuS1"><p>You can combine multiple beta headers by separating them with commas:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">--header 'anthropic-beta: context-1m-2025-08-07,compact-2026-01-12'</span><br></span></code></pre></div></div></div></div></div></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="us-only-inference">US-Only Inference<a href="https://docs.litellm.ai/blog/claude_opus_4_6#us-only-inference" class="hash-link" aria-label="Direct link to US-Only Inference" title="Direct link to US-Only Inference">​</a></h3>
<p>Available at 1.1× token pricing. LiteLLM automatically tracks costs for US-only inference.</p>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">/chat/completions</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">/v1/messages</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p>Use the <code>inference_geo</code> parameter to specify US-only inference:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "What is the capital of France?"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "inference_geo": "us"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div><p>LiteLLM will automatically apply the 1.1× pricing multiplier for US-only inference in cost tracking.</p></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p>Use the <code>inference_geo</code> parameter to specify US-only inference:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/v1/messages' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'x-api-key: sk-12345' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'content-type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "max_tokens": 4096,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "content": "What is the capital of France?"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "inference_geo": "us"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div><p>LiteLLM will automatically apply the 1.1× pricing multiplier for US-only inference in cost tracking.</p></div></div></div>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="fast-mode">Fast Mode<a href="https://docs.litellm.ai/blog/claude_opus_4_6#fast-mode" class="hash-link" aria-label="Direct link to Fast Mode" title="Direct link to Fast Mode">​</a></h3>
<div class="theme-admonition theme-admonition-info admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>info</div><div class="admonitionContent_BuS1"><p>Fast mode is <strong>only supported on the Anthropic provider</strong> (<code>anthropic/claude-opus-4-6</code>). It is not available on Azure AI, Vertex AI, or Bedrock.</p></div></div>
<p><strong>Pricing:</strong></p>
<ul>
<li>Standard: $5 input / $25 output per MTok</li>
<li>Fast: $30 input / $150 output per MTok (6× premium)</li>
</ul>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">/chat/completions</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">/v1/messages</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/chat/completions' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Content-Type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'Authorization: Bearer $LITELLM_KEY' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      "content": "Refactor this module..."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "max_tokens": 4096,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "speed": "fast"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div><p><strong>Using OpenAI SDK:</strong></p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> openai</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">client </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> openai</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">OpenAI</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    api_key</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"your-litellm-key"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    base_url</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"http://0.0.0.0:4000"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">completions</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"claude-opus-4-6"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Refactor this module..."</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    max_tokens</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">4096</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    extra_body</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"speed"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"fast"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div><p><strong>Using LiteLLM SDK:</strong></p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"anthropic/claude-opus-4-6"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Refactor this module..."</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    max_tokens</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">4096</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    speed</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"fast"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div><p>LiteLLM automatically tracks the higher costs for fast mode in usage and cost calculations.</p></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl --location 'http://0.0.0.0:4000/v1/messages' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'x-api-key: sk-12345' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--header 'content-type: application/json' \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">--data '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "claude-opus-4-6",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "max_tokens": 4096,</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "speed": "fast",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "role": "user",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            "content": "Refactor this module..."</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}'</span><br></span></code></pre></div></div><p>LiteLLM automatically:</p><ul>
<li>Adds the <code>fast-mode-2026-02-01</code> beta header</li>
<li>Tracks the 6× premium pricing in cost calculations</li>
</ul></div></div></div>]]></content:encoded>
            <category>anthropic</category>
            <category>claude</category>
            <category>opus 4.6</category>
        </item>
        <item>
            <title><![CDATA[Achieving Sub-Millisecond Proxy Overhead]]></title>
            <link>https://docs.litellm.ai/blog/sub-millisecond-proxy-overhead</link>
            <guid>https://docs.litellm.ai/blog/sub-millisecond-proxy-overhead</guid>
            <pubDate>Mon, 02 Feb 2026 10:00:00 GMT</pubDate>
            <description><![CDATA[Our Q1 performance target and architectural direction for achieving sub-millisecond proxy overhead on modest hardware.]]></description>
            <content:encoded><![CDATA[<p><img decoding="async" loading="lazy" src="https://raw.githubusercontent.com/AlexsanderHamir/assets/main/Screenshot%202026-02-02%20172554.png" alt="Sidecar architecture: Python control plane vs. sidecar hot path" class="img_ev3q"></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="introduction">Introduction<a href="https://docs.litellm.ai/blog/sub-millisecond-proxy-overhead#introduction" class="hash-link" aria-label="Direct link to Introduction" title="Direct link to Introduction">​</a></h2>
<p>Our Q1 performance target is to aggressively move toward sub-millisecond proxy overhead on a single instance with 4 CPUs and 8 GB of RAM, and to continue pushing that boundary over time. Our broader goal is to make LiteLLM inexpensive to deploy, lightweight, and fast. This post outlines the architectural direction behind that effort.</p>
<p>Proxy overhead refers to the latency introduced by LiteLLM itself, independent of the upstream provider.</p>
<p>To measure it, we run the same workload directly against the provider and through LiteLLM at identical QPS (for example, 1,000 QPS) and compare the latency delta. To reduce noise, the load generator, LiteLLM, and a mock LLM endpoint all run on the same machine, ensuring the difference reflects proxy overhead rather than network latency.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="where-were-coming-from">Where We're Coming From<a href="https://docs.litellm.ai/blog/sub-millisecond-proxy-overhead#where-were-coming-from" class="hash-link" aria-label="Direct link to Where We're Coming From" title="Direct link to Where We're Coming From">​</a></h2>
<p>Under the same benchmark originally conducted by <a href="https://www.tensorzero.com/docs/gateway/benchmarks" target="_blank" rel="noopener noreferrer">TensorZero</a>, LiteLLM previously failed at around 1,000 QPS.</p>
<p>That is no longer the case. Today, LiteLLM can be stress-tested at 1,000 QPS with no failures and can scale up to 5,000 QPS without failures on a 4-CPU, 8-GB RAM single instance setup.</p>
<p>This establishes a more up to date baseline and provides useful context as we continue working on proxy overhead and overall performance.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="design-choice">Design Choice<a href="https://docs.litellm.ai/blog/sub-millisecond-proxy-overhead#design-choice" class="hash-link" aria-label="Direct link to Design Choice" title="Direct link to Design Choice">​</a></h2>
<p>Achieving sub-millisecond proxy overhead with a Python-based system requires being deliberate about where work happens.</p>
<p>Python is a strong fit for flexibility and extensibility: provider abstraction, configuration-driven routing, and a rich callback ecosystem. These are areas where development velocity and correctness matter more than raw throughput.</p>
<p>At higher request rates, however, certain classes of work become expensive when executed inside the Python process on every request. Rather than rewriting LiteLLM or introducing complex deployment requirements, we adopt an optional <strong>sidecar architecture</strong>.</p>
<p>This architectural change is how we intend to make LiteLLM <strong>permanently fast</strong>. While it supports our near-term performance targets, it is a long-term investment.</p>
<p>Python continues to own:</p>
<ul>
<li>Request validation and normalization</li>
<li>Model and provider selection</li>
<li>Callbacks and integrations</li>
</ul>
<p>The sidecar owns <strong>performance-critical execution</strong>, such as:</p>
<ul>
<li>Efficient request forwarding</li>
<li>Connection reuse and pooling</li>
<li>Enforcing timeouts and limits</li>
<li>Aggregating high-frequency metrics</li>
</ul>
<p>This separation allows each component to focus on what it does best: Python acts as the control plane, while the sidecar handles the hot path.</p>
<hr>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="why-the-sidecar-is-optional">Why the Sidecar Is Optional<a href="https://docs.litellm.ai/blog/sub-millisecond-proxy-overhead#why-the-sidecar-is-optional" class="hash-link" aria-label="Direct link to Why the Sidecar Is Optional" title="Direct link to Why the Sidecar Is Optional">​</a></h3>
<p>The sidecar is intentionally <strong>optional</strong>.</p>
<p>This allows us to ship it incrementally, validate it under real-world workloads, and avoid making it a hard dependency before it is fully battle-tested across all LiteLLM features.</p>
<p>Just as importantly, this ensures that self-hosting LiteLLM remains simple. The sidecar is bundled and started automatically, requires no additional infrastructure, and can be disabled entirely. From a user's perspective, LiteLLM continues to behave like a single service.</p>
<p>As of today, the sidecar is an optimization, not a requirement.</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="conclusion">Conclusion<a href="https://docs.litellm.ai/blog/sub-millisecond-proxy-overhead#conclusion" class="hash-link" aria-label="Direct link to Conclusion" title="Direct link to Conclusion">​</a></h2>
<p>Sub-millisecond proxy overhead is not achieved through a single optimization, but through architectural changes.</p>
<p>By keeping Python focused on orchestration and extensibility, and offloading performance-critical execution to a sidecar, we establish a foundation for making LiteLLM <strong>permanently fast over time</strong>—even on modest hardware such as a 1-CPU, 2-GB RAM instance, while keeping deployment and self-hosting simple.</p>
<p>This work extends beyond Q1, and we will continue sharing benchmarks and updates as the architecture evolves.</p>]]></content:encoded>
            <category>performance</category>
            <category>architecture</category>
        </item>
        <item>
            <title><![CDATA[DAY 0 Support: Gemini 3 Flash on LiteLLM]]></title>
            <link>https://docs.litellm.ai/blog/gemini_3_flash</link>
            <guid>https://docs.litellm.ai/blog/gemini_3_flash</guid>
            <pubDate>Wed, 17 Dec 2025 10:00:00 GMT</pubDate>
            <description><![CDATA[Guide to using Gemini 3 Flash on LiteLLM Proxy and SDK with day 0 support.]]></description>
            <content:encoded><![CDATA[<p>LiteLLM now supports <code>gemini-3-flash-preview</code> and all the new API changes along with it.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>If you only want cost tracking, you need no change in your current Litellm version. But if you want the support for new features introduced along with it like thinking levels, you will need to use v1.80.8-stable.1 or above.</p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="deploy-this-version">Deploy this version<a href="https://docs.litellm.ai/blog/gemini_3_flash#deploy-this-version" class="hash-link" aria-label="Direct link to Deploy this version" title="Direct link to Deploy this version">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">Docker</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">Pip</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><div class="language-showLineNumbers language-showlinenumbers codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">docker run litellm</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-showlinenumbers codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">docker run \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-e STORE_MODEL_IN_DB=True \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">-p 4000:4000 \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ghcr.io/berriai/litellm:main-v1.80.8-stable.1</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-showLineNumbers language-showlinenumbers codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockTitle_OeMC">pip install litellm</div><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-showlinenumbers codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">pip install litellm==1.80.8.post1</span><br></span></code></pre></div></div></div></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-new">What's New<a href="https://docs.litellm.ai/blog/gemini_3_flash#whats-new" class="hash-link" aria-label="Direct link to What's New" title="Direct link to What's New">​</a></h2>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="1-new-thinking-levels-thinkinglevel-with-minimal--medium">1. New Thinking Levels: <code>thinkingLevel</code> with MINIMAL &amp; MEDIUM<a href="https://docs.litellm.ai/blog/gemini_3_flash#1-new-thinking-levels-thinkinglevel-with-minimal--medium" class="hash-link" aria-label="Direct link to 1-new-thinking-levels-thinkinglevel-with-minimal--medium" title="Direct link to 1-new-thinking-levels-thinkinglevel-with-minimal--medium">​</a></h3>
<p>Gemini 3 Flash introduces granular thinking control with <code>thinkingLevel</code> instead of <code>thinkingBudget</code>.</p>
<ul>
<li><strong>MINIMAL</strong>: Ultra-lightweight thinking for fast responses</li>
<li><strong>MEDIUM</strong>: Balanced thinking for complex reasoning</li>
<li><strong>HIGH</strong>: Maximum reasoning depth</li>
</ul>
<p>LiteLLM automatically maps the OpenAI <code>reasoning_effort</code> parameter to Gemini's <code>thinkingLevel</code>, so you can use familiar <code>reasoning_effort</code> values (<code>minimal</code>, <code>low</code>, <code>medium</code>, <code>high</code>) without changing your code!</p>
<h3 class="anchor anchorWithStickyNavbar_LWe7" id="2-thought-signatures">2. Thought Signatures<a href="https://docs.litellm.ai/blog/gemini_3_flash#2-thought-signatures" class="hash-link" aria-label="Direct link to 2. Thought Signatures" title="Direct link to 2. Thought Signatures">​</a></h3>
<p>Like <code>gemini-3-pro</code>, this model also includes thought signatures for tool calls. LiteLLM handles signature extraction and embedding internally. <a href="https://docs.litellm.ai/blog/gemini_3#thought-signatures">Learn more about thought signatures</a>.</p>
<p><strong>Edge Case Handling</strong>: If thought signatures are missing in the request, LiteLLM adds a dummy signature ensuring the API call doesn't break</p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="supported-endpoints">Supported Endpoints<a href="https://docs.litellm.ai/blog/gemini_3_flash#supported-endpoints" class="hash-link" aria-label="Direct link to Supported Endpoints" title="Direct link to Supported Endpoints">​</a></h2>
<p>LiteLLM provides <strong>full end-to-end support</strong> for Gemini 3 Flash on:</p>
<ul>
<li>✅ <code>/v1/chat/completions</code> - OpenAI-compatible chat completions endpoint</li>
<li>✅ <code>/v1/responses</code> - OpenAI Responses API endpoint (streaming and non-streaming)</li>
<li>✅ <a href="https://docs.litellm.ai/docs/anthropic_unified"><code>/v1/messages</code></a> - Anthropic-compatible messages endpoint</li>
<li>✅ <code>/v1/generateContent</code> – <a href="https://docs.litellm.ai/docs/generateContent.md">Google Gemini API</a> compatible endpoint
All endpoints support:</li>
<li>Streaming and non-streaming responses</li>
<li>Function calling with thought signatures</li>
<li>Multi-turn conversations</li>
<li>All Gemini 3-specific features</li>
<li>Converstion of provider specific thinking related param to thinkingLevel</li>
</ul>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="quick-start">Quick Start<a href="https://docs.litellm.ai/blog/gemini_3_flash#quick-start" class="hash-link" aria-label="Direct link to Quick Start" title="Direct link to Quick Start">​</a></h2>
<div class="tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">SDK</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">PROXY</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">LOW</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">MEDIUM (NEW)</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">HIGH</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><p><strong>Basic Usage with MEDIUM thinking (NEW)</strong></p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># No need to make any changes to your code as we map openai reasoning param to thinkingLevel</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gemini/gemini-3-flash-preview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Solve this complex math problem: 25 * 4 + 10"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    reasoning_effort</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"medium"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># NEW: MEDIUM thinking level</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choices</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">content</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p><strong>1. Setup config.yaml</strong></p><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">model_list</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">model_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">flash</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">litellm_params</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gemini/gemini</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">flash</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">preview</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">api_key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> os.environ/GEMINI_API_KEY</span><br></span></code></pre></div></div><p><strong>2. Start proxy</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">litellm --config /path/to/config.yaml</span><br></span></code></pre></div></div><p><strong>3. Call with MEDIUM thinking</strong></p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -X POST http://localhost:4000/v1/chat/completions \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Content-Type: application/json" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -H "Authorization: Bearer &lt;YOUR-LITELLM-KEY&gt;" \</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  -d '{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "model": "gemini-3-flash",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "messages": [{"role": "user", "content": "Complex reasoning task"}],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    "reasoning_effort": "medium"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }'</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">``'</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&lt;/TabItem&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&lt;/Tabs&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">---</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">## All `reasoning_effort` Levels</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&lt;Tabs&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&lt;TabItem value="minimal" label="MINIMAL"&gt;</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">**Ultra-fast, minimal reasoning**</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">```python</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">from litellm import completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response = completion(</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model="gemini/gemini-3-flash-preview",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages=[{"role": "user", "content": "What's 2+2?"}],</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    reasoning_effort="minimal",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">)</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p><strong>Simple instruction following</strong></p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gemini/gemini-3-flash-preview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Write a haiku about coding"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    reasoning_effort</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"low"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p><strong>Balanced reasoning for complex tasks</strong> ✨</p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gemini/gemini-3-flash-preview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Analyze this dataset and find patterns"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    reasoning_effort</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"medium"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># NEW!</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><p><strong>Maximum reasoning depth</strong></p><div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gemini/gemini-3-flash-preview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Prove this mathematical theorem"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    reasoning_effort</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"high"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div></div></div></div>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="key-features">Key Features<a href="https://docs.litellm.ai/blog/gemini_3_flash#key-features" class="hash-link" aria-label="Direct link to Key Features" title="Direct link to Key Features">​</a></h2>
<p>✅ <strong>Thinking Levels</strong>: MINIMAL, LOW, MEDIUM, HIGH<br>
<!-- -->✅ <strong>Thought Signatures</strong>: Track reasoning with unique identifiers<br>
<!-- -->✅ <strong>Seamless Integration</strong>: Works with existing OpenAI-compatible client<br>
<!-- -->✅ <strong>Backward Compatible</strong>: Gemini 2.5 models continue using <code>thinkingBudget</code></p>
<hr>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="installation">Installation<a href="https://docs.litellm.ai/blog/gemini_3_flash#installation" class="hash-link" aria-label="Direct link to Installation" title="Direct link to Installation">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">pip install litellm --upgrade</span><br></span></code></pre></div></div>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> litellm</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> litellm </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> completion</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> completion</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"gemini/gemini-3-flash-preview"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    messages</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Your question here"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    reasoning_effort</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"medium"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Use MEDIUM thinking</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">response</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>If using this model via vertex_ai, keep the location as global as this is the only supported location as of now.</p></div></div>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id="reasoning_effort-mapping-for-gemini-3"><code>reasoning_effort</code> Mapping for Gemini 3+<a href="https://docs.litellm.ai/blog/gemini_3_flash#reasoning_effort-mapping-for-gemini-3" class="hash-link" aria-label="Direct link to reasoning_effort-mapping-for-gemini-3" title="Direct link to reasoning_effort-mapping-for-gemini-3">​</a></h2>
<table><thead><tr><th>reasoning_effort</th><th>thinking_level</th></tr></thead><tbody><tr><td><code>minimal</code></td><td><code>minimal</code></td></tr><tr><td><code>low</code></td><td><code>low</code></td></tr><tr><td><code>medium</code></td><td><code>medium</code></td></tr><tr><td><code>high</code></td><td><code>high</code></td></tr><tr><td><code>disable</code></td><td><code>minimal</code></td></tr><tr><td><code>none</code></td><td><code>minimal</code></td></tr></tbody></table>]]></content:encoded>
            <category>gemini</category>
            <category>day 0 support</category>
            <category>llms</category>
        </item>
    </channel>
</rss>