Run Claude Code on Windows with Ollama | No API Key Required

AI Chat Article Summarypowered by Claude

"Claude Code is useful, but can I run it on my own PC without worrying about API usage costs?"——Windows users arrive at this question for a variety of reasons: cost, confidential data, offline use, and more. The short answer is that you can run Claude Code with your own local model through Ollama. The key enabler was Ollama adding support for Anthropic's API format. This article explains how it works, the concrete setup steps for Windows, how to choose a model, and what to watch out for — all based on official information.

結論powered by Claude

"I want to connect Claude Code to Ollama on Windows and use it without Anthropic's pay-per-use billing"——that goal became realistic when Ollama v0.14.0 and later added compatibility with the Anthropic Messages API. Simply run Ollama as a local API server and point Claude Code's destination (ANTHROPIC_BASE_URL) to http://localhost:11434, and an open-weight model running on your machine handles all the coding.

The key is setting three environment variables (such as ANTHROPIC_AUTH_TOKEN=ollama) and launching with claude --model <model-name>. The auth token can be any dummy string — no Anthropic API key is needed.

That said, because Claude Code requires a wide context window, the official recommendation is a model with at least 32K–64K tokens. Keep in mind that what is actually running is a local model like qwen3-coder, not Claude itself.

目次 (11)

How Running Claude Code with Ollama Works

Claude Code is normally a command-line tool that connects to Anthropic's endpoint and sends requests to Claude. However, it is structured so that the destination URL can be overridden via an environment variable. This is where Ollama's compatibility matters: according to the Ollama official blog, "Ollama v0.14.0 and later is compatible with the Anthropic Messages API."

In other words, run Ollama as an API server at http://localhost:11434 and point Claude Code there — Claude Code thinks it is sending requests to Anthropic, but in reality the local Ollama instance is responding. The crucial point is that what is running at that moment is not Claude itself, but an open-weight local model such as qwen3-coder or gpt-oss:20b. Because Claude's weights are not public and cannot be run locally (see: Can Claude Run Locally?), the Ollama setup is best understood as "running Claude Code's UI and agent capabilities on a different model."

The compatible API supports messages, streaming, system prompts, tool calls, extended thinking, and vision, and can also be used from the Anthropic SDK for Python and JavaScript.

What You Need — Windows Hardware Requirements

Running local models comfortably requires adequate machine performance. Ollama's official guidance recommends enabling WSL2 and GPU passthrough for best performance. Without a GPU, models can still run on CPU, but response speed drops significantly and more RAM is required.

Another easy-to-overlook factor is context length. The Ollama official documentation explicitly states: "Claude Code requires a large context window. We recommend at least 64K tokens" (the blog lists 32K as the lower bound). Local models sometimes have a narrow default context, so manually increasing the context length is generally a prerequisite. Cloud models automatically run at full capacity.

Setup Steps on Windows

Below is the standard flow for manually configuring environment variables. Open PowerShell as an administrator and follow along.

Step 1: Install Ollama

Download the Windows installer (.exe) from ollama.com, run it, and complete the setup. After installation, Ollama starts as a resident service and listens at http://localhost:11434.

Step 2: Download the Model You Want to Use

Fetch a local model suited for coding. The official blog highlights these examples:

ollama pull qwen3-coder
# or
ollama pull gpt-oss:20b

Step 3: Install Claude Code

On Windows PowerShell, you can install Claude Code with the following command. It runs the official Claude Code Windows installer. Node.js is required as a prerequisite — install it first if needed. For details, see the Claude Code official documentation.

irm https://claude.ai/install.ps1 | iex

Step 4: Set Environment Variables

Point Claude Code's destination to Ollama. Set the following three variables in PowerShell:

$env:ANTHROPIC_AUTH_TOKEN="ollama"
$env:ANTHROPIC_API_KEY=""
$env:ANTHROPIC_BASE_URL="http://localhost:11434"

ANTHROPIC_AUTH_TOKEN can be the dummy value "ollama", and ANTHROPIC_API_KEY should be left empty. If an existing ANTHROPIC_API_KEY is set, Claude Code will try to connect to Anthropic's own endpoint, so you need to clear this variable when using Ollama. Do not use a real Anthropic API key.

Step 5: Launch with the Model Specified

Start Claude Code by specifying the name of the model you downloaded.

claude --model qwen3-coder

You can also pass the variables inline without setting them beforehand:

$env:ANTHROPIC_AUTH_TOKEN="ollama"; $env:ANTHROPIC_BASE_URL="http://localhost:11434"; $env:ANTHROPIC_API_KEY=""; claude --model qwen3-coder

The official documentation also describes a shorthand launch command, ollama launch claude (with ollama launch claude --model <model-name> to specify a model). If manually setting environment variables feels tedious, try this first.

Choosing Between Local and Cloud Models

Through Ollama, you can choose between local models that run entirely on your machine and cloud models that run on Ollama's cloud infrastructure.

  • Local models: qwen3-coder, gpt-oss:20b, etc. Data never leaves your machine and everything works offline, but performance depends on your hardware and context length must be adjusted manually.
  • Cloud models: glm-4.7:cloud, minimax-m2.1:cloud, etc. (ending in :cloud). These run fast even on underpowered local GPUs, and context automatically operates at full capacity.

If data confidentiality and offline access are your top priorities, go local. If your hardware is not powerful enough and you just want smooth performance, go cloud. The idea of using a third-party model through Claude Code also shares common ground with using GPT with Claude Code via a proxy.

Caveats — Feature Limitations and Context Length

The setup is useful, but a few things are important to understand upfront.

First, because what is running is a different model and not Claude, behaviors optimized specifically for Claude will not be reproduced as-is. Agentic tool execution and code quality depend entirely on the capability of the local model you choose. Second, insufficient context length makes it easy for Claude Code to lose track mid-conversation. As noted above, ensure at least 32K–64K tokens and explicitly extend the setting for local models. Third, CPU-only environments are unlikely to deliver practical response speeds. For regular use, consider a GPU or using a :cloud model.

Summary

Here are the key points for connecting Claude Code to Ollama on Windows:

  1. Install Ollama v0.14.0 or later and have it listen at http://localhost:11434.
  2. Fetch a model such as qwen3-coder with ollama pull.
  3. Install Claude Code and set ANTHROPIC_AUTH_TOKEN=ollama, an empty ANTHROPIC_API_KEY, and ANTHROPIC_BASE_URL=http://localhost:11434.
  4. Launch with claude --model <model-name> (or ollama launch claude).

The value of this setup is the ability to plug an open-weight model running on your own machine into the Claude Code workflow without an Anthropic API key. Choose between local and cloud models based on your priorities around cost, confidentiality, and offline access.

Sources

参考になったら ♡
Clauder Navi 編集部
@clauder_navi

Anthropic の Claude / Claude Code を中心に、日本のエンジニア向けに最新動向と実務 を毎日発信。 運営方針 は メディアについて をご覧ください。