Streaming Responses
Summary — Key Takeaways for This Lesson
- Streaming is a method that receives Claude's response incrementally as it is being generated. Users can start reading without waiting for the full response.
- Use
client.messages.stream()in the SDK. Internally, it uses Server-Sent Events (SSE). - In Python, you can iterate over text chunks via
stream.text_stream. In TypeScript, you can use the.on("text", ...)event handler. - Streaming itself incurs no additional cost. The number of tokens consumed is identical to non-streaming.
- Code examples treat the source as the primary reference.
目次 (5)
Why Use Streaming
The standard messages.create() returns a response only after the entire output has been generated.
For long outputs, users may have to wait several seconds to tens of seconds.
With streaming, generated text arrives instantly,
dramatically improving perceived speed in chat UIs and similar interfaces.
Streaming is implemented using SSE (Server-Sent Events) source.
Python — Streaming Code Example
The following is based on the official documentation's code example source.
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
max_tokens=1024,
messages=[{"role": "user", "content": "日本の歴史について教えてください"}],
model="claude-opus-4-7",
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
text_stream is a generator that yields text chunks (strings) in sequence.
Specifying flush=True ensures immediate output without buffering.
TypeScript — Streaming Code Example
In TypeScript, chunks are received via the .on("text", ...) event handler
source.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
await client.messages
.stream({
messages: [{ role: "user", content: "日本の歴史について教えてください" }],
model: "claude-opus-4-7",
max_tokens: 1024
})
.on("text", (text) => {
process.stdout.write(text);
});
Retrieving the Final Message
If you need the complete Message object (including usage) after the stream finishes,
you can retrieve it with stream.get_final_message() (Python) or
stream.finalMessage() (TypeScript).
# Python — Retrieving the complete message after stream completion
with client.messages.stream(...) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_message()
print(final.usage) # Check token usage
When to Use Each Approach
| Method | Best Use Cases |
|---|---|
| Non-streaming | Batch processing, heavy post-processing, when the full response is needed |
| Streaming | Chat UIs, terminal output, real-time display of long generated text |