Claude Prompt Engineering Basics | 15 Template Patterns That Improve Accuracy
For developers and editors who want to noticeably improve Claude's output quality, this guide covers 15 prompt design patterns recommended by Anthropic's official documentation. From the 3-block structure and XML tags to Few-shot, Chain of Thought, prefill, role assignment, and output format enforcement — each template's use case and common pitfalls are explained in a practical, hands-on order you can apply starting tomorrow.
The foundation of every prompt is the "Role → Context → Instructions" 3-block structure. Simply separating who Claude should act as, what it should assume, and what it should do from the very first input dramatically reduces variability in output. This is where the presence or absence of a pattern makes the biggest difference in accuracy.
For structuring long content, XML tags work best. For consistent output formatting, Few-shot examples are ideal. For complex reasoning, Chain of Thought is most effective on its own. Standard practice is to keep tag names consistent, use 3–5 examples for Few-shot, and separate thinking from conclusions using thinking and answer tags for CoT.
When you need to enforce JSON output, prefilling the assistant turn is the key technique. When you need to raise the expertise level of responses, explicit role assignment makes the difference. Since example bias directly leads to output bias, always mix typical, boundary, and edge cases into your examples — this is a prerequisite for maintaining quality.
目次 (30)
- Pattern 1: The 3-Block Structure — Role + Context + Instructions for Professional Quality from Day One
- Why the 3-Block Structure Works
- Pattern 2: XML Tags for Structure — <context> <code> <instructions> Boost Accuracy for Long Inputs
- Standard XML Tags to Know
- Pattern 3: Few-shot — Align Output Format with 2–3 Correct Examples
- Balancing the Number and Quality of Examples
- Pattern 4: Chain of Thought — "Think Step by Step" Raises Accuracy on Hard Problems
- Separating Thinking from Conclusions
- Pattern 5: Prefill — Plant the Start of the Assistant Turn to Force Output Format
- Pattern 6: Explicit Output Format Specification — "In JSON," "As a Table," "Within 100 Characters"
- Choosing the Right Format for Each Use Case
- Pattern 7: Persisting Roles via System Prompts
- Distinguishing Between system and user
- Pattern 8: Role Assignment — "Senior Engineer" or "Proofreader" Aligns Style and Granularity
- Double Accuracy with Role + Audience
- Pattern 9: Explicit Constraints — "Do not use X," "Maximum N characters," "In plain English"
- Write Constraints Along 4 Axes
- Pattern 10: Step-back Prompting — Ask Claude to Think from Abstract Principles First
- When to Use Step-back vs. CoT
- Pattern 11: Self-critique — "Critically Review Your Own Answer" Raises Quality by One Level
- Where Self-critique Works and Where It Doesn't
- Pattern 12: Temperature Control — 0 for Determinism, 0.7–1.0 for Creativity
- Practical Temperature Patterns for Business Use
- Pattern 13: max_tokens / Output Length Control — The Last Line of Defense Against Runaway Output
- Double Defense with In-prompt Instructions and API Parameters
- Pattern 14: Write in Positive Form, Not Negative — "Do This" Instead of "Don't Do That"
- Examples of Rewriting Negatives as Positives
- Pattern 15: Allow "I Don't Know" — The Single Most Powerful Line for Suppressing Hallucination
- Additional Instructions for Required Citations and Sources
- Sources (Primary Information)
Pattern 1: The 3-Block Structure — Role + Context + Instructions for Professional Quality from Day One
The fundamental form to master first is the "Role → Context → Instructions" 3-block structure. Claude synthesizes conversation history and system messages to determine how to respond, so when the first input clearly separates "who Claude is," "what to assume," and "what to do," off-target responses drop dramatically. Conversely, when these three elements are bundled together, Claude bears the extra load of inferring preconditions, which increases variability in output.
[Role] You are a senior Python engineer.
[Context] I am a machine learning beginner building a
classifier with scikit-learn, but accuracy has plateaued at 60%.
[Instructions] Improve the following code and explain
the rationale behind each change in English.
(code here)
Why the 3-Block Structure Works
Just as humans need clarity when taking on a new task, LLMs also produce higher accuracy when their "role," "context (current situation)," and "task" are clearly defined. In particular, omitting the context causes Claude to default to generic, safe responses — with excessive preambles for beginners or overly verbose explanations for advanced users. Adding just 1–2 lines of "reader's skill level, environment, and constraints" to the context stabilizes the granularity of responses considerably.
Pattern 2: XML Tags for Structure — <context> <code> <instructions> Boost Accuracy for Long Inputs
Claude has a strong ability to interpret XML tags. This is because Anthropic's training data contained many XML-style delimiters, which means tags communicate "this is a reference," "this is a command," "this is an output example" more reliably than Markdown headings. When passing multiple long documents with instructions, or mixing references with instructions, using XML tags is officially recommended.
<context>
I am a scikit-learn beginner.
</context>
<code>
from sklearn.ensemble import RandomForestClassifier
...
</code>
<instructions>
1. Point out issues with the current code
2. Provide improved code
3. Explain the reasons for changes in English
</instructions>
Standard XML Tags to Know
Mastering these five — <context>, <instructions>, <example>, <document>, and <code> — covers the vast majority of use cases. When passing multiple documents, nesting <document index="1"> inside <documents> is standard practice, allowing Claude to reference source numbers in its responses. Note that tag names don't follow a strict specification — custom names work fine — but the key is to keep naming consistent and always match opening and closing tags source.
Pattern 3: Few-shot — Align Output Format with 2–3 Correct Examples
Showing 2–3 correct examples is far more effective than explaining in prose how you want the output written. Known as few-shot prompting, this powerful technique communicates output format, tone, vocabulary level, and granularity all at once. Even in domains where zero-shot leaves Claude uncertain, showing 3 examples lets it follow the format without hesitation.
Please classify the following in this format.
<example>
Input: "Today was the best day ever"
Output: {"sentiment": "positive", "score": 0.9}
</example>
<example>
Input: "I failed again..."
Output: {"sentiment": "negative", "score": 0.2}
</example>
Input: "I finally finished work"
Output:
Balancing the Number and Quality of Examples
Anthropic's official documentation recommends 3–5 examples. A single example provides weak pattern inference, while more than 10 increases input tokens and cost. Including one "typical case," one "boundary case," and one "ambiguous case" in your examples significantly improves judgment on edge cases. Conversely, using only biased examples causes Claude to learn those biases, so example selection directly affects output quality.
Pattern 4: Chain of Thought — "Think Step by Step" Raises Accuracy on Hard Problems
For complex reasoning tasks, simply adding "think through this step by step, then give your conclusion at the end" improves accuracy source. This is called Chain of Thought (CoT), and it is especially effective for tasks requiring multi-step reasoning such as math, logic, multi-step planning, and debugging code. When asked for an immediate answer, Claude tends to be pulled toward the most frequent pattern, but when it writes out its reasoning process, it can catch errors midway and self-correct.
Separating Thinking from Conclusions
In practice, the standard approach is to instruct Claude to "think inside <thinking> tags, then provide the final answer inside an <answer> tag." This makes it easy to extract only the final answer even when Claude reasons at length. In the Claude 4.X series, an Extended Thinking feature is available via the API that enables deep internal reasoning without explicit CoT instructions, but for lightweight models or everyday use, manually written CoT instructions deliver sufficient accuracy.
Pattern 5: Prefill — Plant the Start of the Assistant Turn to Force Output Format
In the API, you can plant the beginning of an assistant role response at the end of messages. This is the standard technique for forcing JSON output starting with { — by pre-supplying the first characters of the response, you compel Claude to "continue from here." It is also extremely effective for eliminating responses that include preambles or code fences.
# Python SDK example
messages=[
{"role": "user", "content": "Return the above as JSON"},
{"role": "assistant", "content": "{"}
]
This causes Claude to start writing from after {, omitting Markdown code fences and preambles source (note: prefill is deprecated in Opus 4.7; use with Sonnet / Haiku). Planting an opening XML tag (<analysis>) or a Markdown heading (# Conclusion) at the start is also standard, and prevents output format failures in scenarios that require downstream processing. However, over-planting can over-constrain Claude, so the rule of thumb is to keep the seed minimal — around 1–10 characters.
Pattern 6: Explicit Output Format Specification — "In JSON," "As a Table," "Within 100 Characters"
State the expected format explicitly at the end of the instructions. "Return it in the following JSON schema" is more reliable than a vague "organize this." Instructions placed at the end of a prompt receive Claude's final attention and are most faithfully followed — so "giving a final nudge on format" is a practical technique.
Please follow this JSON schema for output:
{
"title": "string",
"tags": ["string"],
"summary": "string (within 100 characters)"
}
Choosing the Right Format for Each Use Case
For downstream programmatic processing, JSON is standard. For human-facing documents, Markdown works best. For comparing multiple options, a Markdown table (with explicit column names) is ideal. Providing a schema significantly reduces missing fields and type violations, but more fields mean longer outputs and higher costs — so the rule is to include only what you truly need. If you plan to parse the output downstream, combining this with the Tool Use (Function Calling) feature is an even more robust option.
Pattern 7: Persisting Roles via System Prompts
Adding "You are an editor-in-chief" (or similar) to the API's system parameter eliminates the need to re-declare the role with each user message. The same effect is available in Claude.ai Projects, removing the burden of copy-pasting the role at the start of every prompt.
Distinguishing Between system and user
The system parameter should contain fixed conventions that apply throughout the conversation — such as role, rules, prohibitions, and output format assumptions. The user parameter should contain the current task and target data. Mixing the two causes Claude to become uncertain about which to prioritize as the conversation grows longer, leading to increased rule violations. Additionally, system prompts are easier to reuse with Anthropic API's prompt caching, so placing a lengthy ruleset in the system prompt offers benefits in both latency and cost.
Pattern 8: Role Assignment — "Senior Engineer" or "Proofreader" Aligns Style and Granularity
The more specifically you define the role, the higher the output quality. "A senior ML engineer with 5 years of scikit-learn experience" is more precise than "an expert" — it instantly aligns writing style, assumed background knowledge, and the granularity of technical terms. Leaving the role vague forces Claude to infer who it is writing for and what to write, leading it toward safe, generic explanations.
Double Accuracy with Role + Audience
Pairing "You are ○○" with "The reader is △△" stabilizes the depth of explanation. For example, writing "You are a senior ML engineer. The reader is a backend engineer who can write Python but has no machine learning experience" uniquely determines the assumed vocabulary, whether diagrams are needed, and how much code to include. It is the same instinct as briefing a colleague at work — putting both the speaker's position and the audience's profile into words is the key.
Pattern 9: Explicit Constraints — "Do not use X," "Maximum N characters," "In plain English"
Pairing positive instructions with prohibitions, limits, and target audience stops runaway output. Claude tends toward helpfulness and, left unchecked, will add preambles, caveats, and supplementary examples. Conversely, specifying what must not be written, what information to exclude, and what length to respect nearly eliminates unnecessary additions.
Write Constraints Along 4 Axes
In practice, keeping these four axes in mind covers constraints completely: "length (max N characters / N paragraphs)," "vocabulary (use words a middle schooler can understand / no technical jargon)," "exclusions (write code only / no disclaimers)," and "audience (for IT department managers)." In particular, length should always be quantified — "within 800 characters," "within 5 lines," "exactly 3 items" — so that Claude can self-regulate length as it writes.
Pattern 10: Step-back Prompting — Ask Claude to Think from Abstract Principles First
For complex problems, instructing Claude to "first write the high-level principle or concept in one line, then provide concrete recommendations" organizes its line of reasoning source. Called step-back prompting, this technique has Claude verbalize the underlying principles, categories, or decision criteria before jumping to specifics — helping it escape shallow, template-like responses.
When to Use Step-back vs. CoT
While Chain of Thought solves problems "step by step," step-back takes the approach of "raising the abstraction level to gain perspective." For design decisions, strategic planning, and requirements clarification — domains where multiple valid answers exist — step-back leads to deeper analysis. Conversely, for arithmetic or fixed-procedure logic verification, too much step-back tends to be roundabout. Switching between depth of reasoning and level of abstraction based on task characteristics is the mark of an advanced user.
Pattern 11: Self-critique — "Critically Review Your Own Answer" Raises Quality by One Level
Following up an initial response with "point out 3 weaknesses in this answer, then provide an improved version" significantly raises quality. Embedding self-review within a single turn is particularly effective for coding tasks, as Claude checks and fixes the typical gaps, redundancy, and incorrect assumptions common in first drafts.
Where Self-critique Works and Where It Doesn't
For tasks with clear evaluation criteria and room for improvement on review — such as proofreading, code refactoring, and design review — the effect is outstanding. However, for factual research (e.g., "what year did X happen?"), Claude may perform "confident" self-evaluation based on incorrect information, making it dangerous to rely on self-critique alone for reliability. The sound approach is to combine self-critique with separate verification against primary sources for factual matters.
Pattern 12: Temperature Control — 0 for Determinism, 0.7–1.0 for Creativity
Control temperature via the API. Use 0–0.3 for factual answers and code generation; use 0.7–1.0 for brainstorming and copywriting. In Claude.ai, there is no temperature setting, so explicitly prompt "provide multiple alternatives." The closer to 0, the more Claude favors high-probability tokens, making responses more reproducible for the same input. The closer to 1, the broader the probability distribution it samples from, creating more variation in expression.
Practical Temperature Patterns for Business Use
A two-tier setup works well professionally: 0–0.2 for "classification, extraction, summarization, code generation," and 0.7–0.9 for "planning, copywriting, naming." Keep top_p at the default (1.0) and adjust only temperature first. When gathering multiple ideas in brainstorming, hitting the same prompt 5 times at temperature 0.9, then using one more request at 0.0 to consolidate and organize the results — a "diverge → converge 2-stage prompt" — offers a good balance of quality and reproducibility.
Pattern 13: max_tokens / Output Length Control — The Last Line of Defense Against Runaway Output
For long-text tasks, always set max_tokens. Also state explicitly in the body: "Cut verbose preambles and keep only the main point within 800 characters." The max_tokens parameter serves as a hard limit on the API side — a last line of defense that prevents costs from spiraling due to infinite loops or unexpectedly long outputs.
Double Defense with In-prompt Instructions and API Parameters
Standard practice is to combine both: "state the target length in the prompt" (e.g., within 800 characters) and "set a physical upper limit with the API's max_tokens" (e.g., 2000 tokens). The former alone leaves room for Claude to loosely interpret the instruction; the latter alone risks cutting output mid-sentence. Together, output naturally concludes near the target length, with a physical cutoff as a fallback — a true double defense. Since the maximum output length varies by Claude model, always check the official specs for Sonnet, Haiku, and Opus respectively.
Pattern 14: Write in Positive Form, Not Negative — "Do This" Instead of "Don't Do That"
"Return it in plain text" is more reliable than "don't use code fences." LLMs including Claude respond more consistently to positive instructions source. Negative instructions require Claude to mentally picture the prohibited action first, which can actually cause the prohibited element to appear in the output.
Examples of Rewriting Negatives as Positives
"Don't answer in Japanese → Always answer in English," "Don't use code fences → Return as plain text on a single line," "Don't write too long → Summarize only the key points within 200 characters" — prohibitions can almost always be translated into positive form. When you truly need to convey a prohibition, phrase it as a positive alternative instruction followed by a brief note: "If [condition], please respond with [alternative] (do not use any other format)" — this reduces confusion for Claude.
Pattern 15: Allow "I Don't Know" — The Single Most Powerful Line for Suppressing Hallucination
Adding "If you don't have the information, please say 'I don't know'" dramatically reduces unfounded speculation. This one line is essential for factual research and summarization tasks — a single sentence can significantly suppress hallucination. Since LLMs are fundamentally designed to generate "the most plausible continuation," without explicitly permitting "I don't know" as an answer, Claude tends to write confidently even about things it doesn't know.
Additional Instructions for Required Citations and Sources
To further improve factual accuracy, explicitly restrict Claude's sources: "Answer only from the content in the provided <document>, and respond with 'Not mentioned in the materials' for anything not covered there." By preventing Claude from supplementing with external knowledge, reliability dramatically improves for use cases that require restricted sources — such as internal documents, contracts, and product specifications. For maximum response reliability, the canonical approach is to combine Pattern 15 with Pattern 6 (output format specification), having Claude output citation IDs alongside response text in JSON format.
Sources (Primary Information)
The following are the primary sources directly referenced in creating this article. Since Anthropic's official documentation is continuously updated, always verify the latest accurate information at each link. Best practices in prompt design can shift subtly with each model generation, so consulting the official guide for the model you are using — Sonnet 4.6, Opus 4.7, Haiku 4.5, etc. — is the prudent approach.