Customer support chatbots have sucked for years because they forget context halfway through a conversation. Claude 3.5 Sonnet changes that with a 200,000-token context window — enough to hold an entire support session, product documentation, and previous chat history without breaking a sweat. Released in October 2024, it’s Anthropic’s fastest model yet, and it actually remembers what your customer complained about five minutes ago.
This tutorial walks through setting up Claude 3.5 Sonnet for customer support via the Anthropic API. You’ll learn how to manage conversation memory, structure prompts for consistent responses, and optimize latency. No fluff, just working code and concrete examples you can deploy today.
nn
n
By the end of this tutorial, you’ll have a working customer support system that maintains context across multiple messages, references previous interactions, and responds in under two seconds. The setup handles common support scenarios — product questions, troubleshooting, order tracking — while staying on-brand and escalating complex issues to human agents when needed.
You’ll need an Anthropic API key, basic Python knowledge, and about 30 minutes. The system works with any customer support platform that accepts API integrations — Zendesk, Intercom, or your custom frontend.
nnnn
n
First, grab an API key from console.anthropic.com. Anthropic charges $3 per million input tokens and $15 per million output tokens for Claude 3.5 Sonnet. A typical support conversation with 10 back-and-forth messages costs around $0.02 to $0.05 depending on how much context you feed in.
Install the Anthropic Python SDK and set your API key as an environment variable. The SDK handles rate limiting and retries automatically, which matters when you’re processing dozens of concurrent support chats.
n
pip install anthropicnexport ANTHROPIC_API_KEY='your-api-key-here'
n
Import the library and initialize the client. This connects to Anthropic’s API and validates your key before you start sending messages.
n
import anthropicnnclient = anthropic.Anthropic(n api_key="your-api-key-here"n)
nn
Pro tip ✅
Store your API key in environment variables or a secrets manager, never hardcode it. One leaked key can burn through your monthly budget in hours if someone finds it.
nn
n
The system prompt defines your AI agent’s personality, knowledge boundaries, and escalation rules. Claude 3.5 Sonnet performs best with clear, specific instructions — not vague corporate speak about being helpful and friendly.
Here’s a working system prompt for a fictional e-commerce store. Adapt the product details, policies, and tone to match your business.
n
You are a customer support agent for TechGear, an online electronics store. Your role is to help customers with orders, product questions, and basic troubleshooting.nnKEY INFORMATION:n- Standard shipping takes 3-5 business days, express is 1-2 daysn- Return window is 30 days from deliveryn- Warranty covers manufacturing defects, not accidental damagen- Order tracking available at techgear.com/track with order numbernnTONE:n- Professional but conversationaln- Empathetic when customers are frustratedn- Concise — get to the solution quicklynnESCALATION:nTransfer to human agent if:n- Customer is angry or threatening legal actionn- Issue involves refunds over $500n- Technical problem you cannot solve with standard troubleshootingn- Customer explicitly requests human agentnnWhen escalating, summarize the issue clearly for the human agent.nnNEVER:n- Make up product specs or policiesn- Promise what you cannot delivern- Argue with customersn- Share other customers' information
n
This prompt gives Claude boundaries and specific behaviors. The escalation rules prevent the AI from handling situations it shouldn’t, while the tone guidance keeps responses consistent across conversations.
nn
Warning ⚠️
Test your system prompt extensively before going live. AI models can interpret instructions in unexpected ways, especially edge cases like sarcasm or ambiguous requests.
nn
n
Claude 3.5 Sonnet’s 200k token context window lets you include entire conversation histories without summarization. Structure your API calls with a system message and a list of user/assistant message pairs.
Here’s how to maintain context across multiple turns. Each message alternates between user and assistant roles, and Claude uses the full history to understand references like “the order I mentioned earlier.”
n
message = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=1024,n system="You are a customer support agent for TechGear...",n messages=[n {"role": "user", "content": "I ordered a laptop three days ago but haven't received tracking info."},n {"role": "assistant", "content": "I understand your concern. Let me help you locate your order. Could you provide your order number? It's usually in your confirmation email and starts with TG-."},n {"role": "user", "content": "It's TG-8392047"}n ]n)nnprint(message.content[0].text)
n
The model references the previous exchange when responding to the order number. It knows the customer is asking about tracking for a laptop ordered three days ago without restating everything.
nnnn
n
Feed relevant product specs, FAQs, or troubleshooting guides directly into the system prompt or first user message. Claude 3.5 Sonnet can parse and reference lengthy documentation within its 200k token window.
For dynamic knowledge bases, inject relevant sections based on the customer’s question. If they ask about laptop battery life, include laptop specs. If they mention returns, include return policy text.
n
product_docs = """nLAPTOP SPECIFICATIONS:n- Model: TechGear Pro 15n- Battery: Up to 12 hours mixed usen- Charging: USB-C 65W, 0-80% in 60 minutesn- Warranty: 1 year standard, 3 years availablennCOMMON ISSUES:n- Battery draining fast: Check background apps, disable unused featuresn- Won't charge: Try different USB-C cable, test wall outletn- Overheating: Ensure vents are clear, use on hard surfacen"""nnmessage = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=1024,n system=f"You are a customer support agent for TechGear...
PRODUCT KNOWLEDGE:
{product_docs}",n messages=[n {"role": "user", "content": "My new laptop battery only lasts 4 hours. Is that normal?"}n ]n)
n
Claude references the documentation and knows normal battery life is 12 hours, so it can troubleshoot effectively instead of giving generic advice.
nn
Pro tip ✅
Keep documentation concise and structured. Claude handles long context well, but focused information produces faster, more accurate responses than dumping your entire knowledge base.
nn
n
For returning customers, store previous conversation history in your database and append it to new messages. This lets Claude reference past interactions — “As we discussed last week” or “I see you previously had an issue with shipping.”
Structure stored conversations as JSON with timestamps and message pairs. When a customer returns, load their history and include relevant context.
n
import jsonnfrom datetime import datetimenn# Example stored conversation from databasenprevious_conversation = [n {"role": "user", "content": "I'm having trouble with my wireless mouse.", "timestamp": "2026-03-24T14:32:00Z"},n {"role": "assistant", "content": "I can help with that. What specific issue are you experiencing?", "timestamp": "2026-03-24T14:32:15Z"},n {"role": "user", "content": "It disconnects every few minutes.", "timestamp": "2026-03-24T14:33:00Z"},n {"role": "assistant", "content": "That's often a battery or interference issue. Try replacing batteries first.", "timestamp": "2026-03-24T14:33:20Z"}n]nn# New conversation with contextnmessage = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=1024,n system="You are a customer support agent for TechGear. You have access to this customer's previous conversations.",n messages=previous_conversation + [n {"role": "user", "content": "Hi, I'm back. I replaced the batteries but the mouse still disconnects."}n ]n)nnprint(message.content[0].text)
n
Claude understands the customer is continuing a previous troubleshooting session and moves to the next diagnostic step instead of starting from scratch.
nn
Note 💡
Limit loaded history to relevant conversations. Including 50 unrelated past chats wastes tokens and slows responses. Load the most recent session plus any specifically relevant interactions.
nn
n
Claude 3.5 Sonnet typically responds in 500ms to 2 seconds depending on context length and server load. You can’t guarantee sub-second latency across all scenarios, but you can optimize.
Keep max_tokens reasonable. If you only need a paragraph, set max_tokens to 512 instead of 4096. Shorter generation limits mean faster responses.
n
message = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=512, # Faster for short responsesn system="You are a customer support agent...",n messages=[{"role": "user", "content": "What's your return policy?"}]n)
n
For FAQ-style questions where speed matters more than nuance, cache your system prompt. Anthropic’s prompt caching feature (currently in beta) stores frequently used prompts and reduces latency for repeated requests.
n
# Prompt caching example (beta feature)nmessage = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=512,n system=[n {n "type": "text",n "text": "You are a customer support agent for TechGear...",n "cache_control": {"type": "ephemeral"}n }n ],n messages=[{"role": "user", "content": "Do you ship internationally?"}]n)
n
Cached prompts can cut latency by 30-50% for subsequent requests with the same system prompt. This works best for high-volume support scenarios where you’re processing many similar queries.
nn
Warning ⚠️
Don’t sacrifice response quality for speed. A 1.5-second accurate answer beats a 0.5-second useless one. Optimize after you’ve validated accuracy.
nn
n
Claude should recognize when it’s out of its depth and hand off cleanly to human agents. Build escalation detection into your system prompt and parse responses for escalation flags.
Here’s a pattern that works. Add clear escalation markers to your system prompt, then check for those markers in Claude’s responses.
n
system_prompt = """nYou are a customer support agent for TechGear.nnIf you need to escalate to a human agent, start your response with [ESCALATE] followed by a brief summary of the issue.nnExample: [ESCALATE] Customer requesting refund for $800 damaged laptop. Needs immediate human review.n"""nnmessage = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=1024,n system=system_prompt,n messages=[n {"role": "user", "content": "This is ridiculous! I want a full refund RIGHT NOW or I'm calling my lawyer!"}n ]n)nnresponse_text = message.content[0].textnnif response_text.startswith("[ESCALATE]"):n # Extract summary and route to human agentn summary = response_text.replace("[ESCALATE]", "").strip()n # Your escalation logic heren route_to_human_agent(summary)nelse:n # Send AI response to customern send_to_customer(response_text)
n
This gives human agents context when they take over. They see a clear summary instead of having to read through the entire conversation to understand what went wrong.
nn
n
Here are five working prompts for typical customer support situations. Copy-paste and adapt them to your business.
nn
n
message = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=512,n system="You are a customer support agent for TechGear. Standard shipping is 3-5 business days. Express is 1-2 days. Tracking available at techgear.com/track.",n messages=[n {"role": "user", "content": "Where's my order? I placed it 4 days ago."}n ]n)
nn
n
product_catalog = """nAVAILABLE LAPTOPS:n- TechGear Pro 15: $1299, 12hr battery, best for professionalsn- TechGear Air 13: $899, 10hr battery, ultra-portable, best for studentsn- TechGear Max 17: $1899, 8hr battery, high-performance, best for gaming/video editingn"""nnmessage = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=1024,n system=f"You are a customer support agent for TechGear. Help customers choose products based on their needs.
{product_catalog}",n messages=[n {"role": "user", "content": "I need a laptop for college. Budget around $900, needs good battery life."}n ]n)
nnnn
n
troubleshooting_guide = """nWIRELESS MOUSE TROUBLESHOOTING:n1. Check batteries - replace with fresh onesn2. Verify USB receiver is plugged in securelyn3. Test on different surface (not glass or reflective)n4. Check for wireless interference (move away from routers, other wireless devices)n5. Try different USB portn6. Re-pair mouse with receiver using pairing buttonn"""nnmessage = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=1024,n system=f"You are a customer support agent for TechGear. Guide customers through troubleshooting step by step.
{troubleshooting_guide}",n messages=[n {"role": "user", "content": "My wireless mouse keeps freezing and skipping around the screen."}n ]n)
nn
n
return_policy = """nRETURN POLICY:n- 30 days from delivery for returnsn- Item must be unused with original packagingn- Refund processed within 5-7 business days after receiving returnn- Customer pays return shipping unless item is defectiven- Initiate return at techgear.com/returns with order numbern"""nnmessage = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=1024,n system=f"You are a customer support agent for TechGear. Process return requests according to policy.
{return_policy}",n messages=[n {"role": "user", "content": "I want to return the keyboard I bought. It's too loud for my office."}n ]n)
nn
n
warranty_info = """nWARRANTY COVERAGE:n- Standard: 1 year from purchase daten- Covers manufacturing defects onlyn- Does NOT cover: accidental damage, liquid damage, normal wear and tear, cosmetic damagen- Requires proof of purchase (order number or receipt)n- Claim process: submit photos of issue + proof of purchase to support@techgear.comn- Turnaround: 3-5 business days for approval, then replacement or repairn"""nnmessage = client.messages.create(n model="claude-3-5-sonnet-20241022",n max_tokens=1024,n system=f"You are a customer support agent for TechGear. Help customers with warranty claims.
{warranty_info}",n messages=[n {"role": "user", "content": "My laptop screen has lines through it and I only bought it 3 months ago."}n ]n)
nn
Pro tip ✅
Test every prompt with difficult edge cases — angry customers, ambiguous requests, attempts to get the AI to break policy. Claude is generally robust, but you need to know where it might slip up.
nn
n
When conversations stretch past 50 messages, even 200k tokens can fill up. Implement smart context trimming that keeps recent messages and critical earlier context while dropping mundane middle exchanges.
Keep the most recent 10-15 message pairs plus any messages flagged as important (order numbers, specific product mentions, promises made). Drop generic pleasantries and confirmations from the middle of long conversations.
n
def trim_conversation(messages, max_recent=15):n """n Keep most recent messages and any flagged as important.n Drop generic middle messages to stay under token limit.n """n important_keywords = ['order number', 'TG-', 'promised', 'refund', 'manager']n n recent = messages[-max_recent:]n older = messages[:-max_recent]n n important_older = [n msg for msg in older n if any(keyword.lower() in msg['content'].lower() for keyword in important_keywords)n ]n n return important_older + recent
n
This preserves context that matters — the order number mentioned 30 messages ago — while cutting token usage by 40-60% in long conversations.
nn
Note 💡
Monitor token usage per conversation in your logs. If you’re regularly hitting context limits, your conversations might be too long or your documentation too verbose.
nn
n
Track three metrics: resolution rate (issues solved without human escalation), average response latency, and customer satisfaction scores. Claude’s performance improves as you refine system prompts based on real interactions.
Log every conversation with metadata: resolved vs escalated, latency, tokens used, customer feedback if available. Review escalated conversations weekly to identify patterns — are certain types of questions consistently routed to humans? Update your system prompt or add knowledge base sections to handle them.
A/B test prompt variations with live traffic. Try different tones, different escalation thresholds, different levels of technical detail. Small changes in phrasing can significantly impact resolution rates.
nn
n
You now have a production-ready customer support system powered by Claude 3.5 Sonnet. It maintains context across conversations, references product knowledge, escalates appropriately, and responds fast enough for live chat. The system handles common support scenarios autonomously while routing complex issues to human agents with proper context.
This isn’t a replacement for human support — it’s a force multiplier. Your human agents focus on complicated edge cases while Claude handles the routine questions that eat 70% of support time. Deploy carefully, monitor closely, and iterate based on real customer interactions. The AI gets better as you tune it to your specific business.
