Skip to content
Claude

How to Build a Multi-Turn Customer Service Chatbot with Claude API in 30 Minutes

Build a multi-turn customer service chatbot with the Claude API in 30 minutes — memory, context switching, and graceful fallbacks included, no backend required.

12 min read
How to Build a Multi-Turn Customer Service Chatbot with Claude API in 30 Minutes

Most customer service chatbot tutorials either assume you have a week to spare or a DevOps team standing by. This one assumes you have 30 minutes, a text editor, and a Claude API key. That’s genuinely enough to go from nothing to a fully functional multi-turn chatbot that remembers what users said, switches topics without melting down, and handles questions it can’t answer without making your users feel abandoned.

This tutorial uses the Claude API directly — specifically the Messages API, which handles multi-turn conversation natively through message history. No vector databases, no Redis, no backend server. Just a clean Python script that you can prototype locally and deploy to a static host or serverless function when you’re ready to go live. The model we’ll use throughout is Claude Sonnet 4.6, which hits the right balance between speed, cost, and quality for a customer service workload.

By the end of this tutorial, you’ll have a chatbot that maintains conversational context across turns, can switch between topics (say, from billing to returns) without losing thread, and responds gracefully when it genuinely doesn’t know the answer — instead of hallucinating something confidently wrong.

What You’ll Actually Build

Before diving into code, it’s worth being precise about what “multi-turn with memory” means in practice. The Claude Messages API works by sending the entire conversation history with every request. Claude doesn’t remember previous messages on its own — you’re responsible for keeping that history array intact and passing it back each time. That sounds like overhead, but it’s actually what gives you complete control over what the model “knows” at any given point in the conversation.

The chatbot you’ll build has three concrete capabilities. First, it maintains a messages list that grows with each exchange, so when a user says “what about the refund policy for that?” the model knows what “that” refers to. Second, it uses a system prompt to define the agent’s persona, knowledge boundaries, and fallback behavior. Third, it includes a lightweight context-detection layer that can recognize topic switches and optionally inject a context note into the conversation.

Requirements

You need Python 3.8 or higher, the anthropic Python SDK (install it with pip install anthropic), and an Anthropic API key from console.anthropic.com. That’s the complete list. No database, no framework, no cloud account required for the prototype stage. You can have this running in a terminal window faster than most SaaS tools take to load their onboarding flow.

Step 1 — Write the System Prompt

The system prompt is where your chatbot’s personality, scope, and fallback behavior live. This is the highest-leverage thing you’ll write in this entire tutorial, and most people under-invest in it. A weak system prompt produces a chatbot that hallucinates policies, goes off-topic, and confidently answers questions it should defer. A strong one keeps everything tight.

Here’s a production-ready system prompt for a hypothetical e-commerce customer service agent:

You are Aria, a customer service agent for ShopNova, an online retailer specializing in home goods and furniture.

YOUR RESPONSIBILITIES:
- Help customers with order status, returns, refunds, shipping, and product questions
- Escalate complex billing disputes to the human support team
- Keep responses concise — 2-4 sentences unless the question requires more detail

YOUR KNOWLEDGE:
- Return window: 30 days from delivery, items must be unused and in original packaging
- Refunds: processed within 5-7 business days after return is received
- Shipping: standard 5-7 days, express 2-3 days, free standard shipping on orders over $75
- Order tracking: customers can track at shopnova.com/track using their order number

WHEN YOU DON'T KNOW SOMETHING:
- Never guess or fabricate order details, policies, or product specifications
- Say clearly: "I don't have that information, but our support team at support@shopnova.com can help within 24 hours."
- Do NOT apologize excessively — one acknowledgment is enough, then redirect

TONE: Friendly but efficient. You respect the customer's time. No filler phrases like "Great question!" or "Absolutely!"

Notice what that system prompt does explicitly: it defines the scope of knowledge, sets the fallback behavior with exact language, and even tells the model which filler phrases to avoid. That last part sounds fussy until you realize how fast those phrases erode user trust in a real support interaction.

Pro tip ✅

The fallback instruction is the most important line in your system prompt. Without it, Claude will try to be helpful by guessing — and a confidently wrong answer about a refund policy is worse than a direct “I don’t know.” Always define exactly what the model should say and do when it hits the edge of its knowledge.

Step 2 — Build the Conversation Loop

Here’s the core of the chatbot. This script initializes the message history, runs a terminal-based conversation loop, and passes the full history to Claude on every turn:

import anthropic

client = anthropic.Anthropic()

SYSTEM_PROMPT = """[paste your system prompt from Step 1 here]"""

def chat(message_history: list, user_input: str) -> str:
    message_history.append({
        "role": "user",
        "content": user_input
    })
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=SYSTEM_PROMPT,
        messages=message_history
    )
    
    assistant_message = response.content[0].text
    
    message_history.append({
        "role": "assistant",
        "content": assistant_message
    })
    
    return assistant_message

def main():
    print("ShopNova Support — type 'quit' to exitn")
    history = []
    
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() in ["quit", "exit", "bye"]:
            print("Aria: Thanks for contacting ShopNova. Have a good one.")
            break
        if not user_input:
            continue
            
        response = chat(history, user_input)
        print(f"Aria: {response}n")

if __name__ == "__main__":
    main()

The key move here is that message_history is a single list that gets mutated in place — both the user message and the assistant response get appended after every turn. When you call client.messages.create() again, you’re passing the full conversation to Claude, which is exactly how multi-turn memory works with this API.

Note 💡

Claude Sonnet 4.6 supports a 200,000 token context window, which means your conversation history can grow quite long before you hit any limits. For a typical customer service interaction — say, 20-30 exchanges — you’re nowhere near that ceiling. If you’re building for longer sessions, add a history truncation function that drops the oldest messages beyond a threshold.

Step 3 — Add Context Switching Detection

Context switching is the moment a user pivots mid-conversation: they’ve been asking about a return, and suddenly they want to know about a new order. Without handling this explicitly, some chatbot implementations get confused and blend contexts. With Claude’s native message history approach, the model handles most of this naturally — but you can make it more robust by injecting a lightweight context signal when you detect a topic shift.

TOPIC_KEYWORDS = {
    "returns": ["return", "send back", "wrong item", "defective", "damaged"],
    "refunds": ["refund", "money back", "charge", "charged", "billing"],
    "shipping": ["shipping", "delivery", "tracking", "arrived", "package", "delayed"],
    "orders": ["order", "purchase", "bought", "ordered", "order number"]
}

def detect_topic(text: str) -> str | None:
    text_lower = text.lower()
    for topic, keywords in TOPIC_KEYWORDS.items():
        if any(kw in text_lower for kw in keywords):
            return topic
    return None

def chat_with_context(message_history: list, user_input: str) -> str:
    topic = detect_topic(user_input)
    
    # If we detect a new topic, add a subtle context note
    if topic:
        enriched_input = f"[Topic: {topic}] {user_input}"
    else:
        enriched_input = user_input
    
    return chat(message_history, enriched_input)

This adds a topic tag to the user message before it enters the history. Claude won’t treat it as literal customer text — it reads it as a context signal that helps it frame the response correctly. Use chat_with_context() instead of chat() in your main loop for the full effect.

Step 4 — Build the Graceful Fallback Layer

Fallbacks in customer service chatbots have two failure modes: the bot says nothing useful, or the bot makes something up. The system prompt handles the second problem, but you also want a programmatic safety net for cases where the API call itself fails — network timeout, rate limit, unexpected error. Here’s how to wrap the chat function with a proper fallback:

import anthropic
from anthropic import APIStatusError, APIConnectionError, RateLimitError

FALLBACK_MESSAGE = (
    "I'm having trouble connecting right now. "
    "Please reach us at support@shopnova.com or try again in a moment."
)

def safe_chat(message_history: list, user_input: str) -> str:
    try:
        return chat_with_context(message_history, user_input)
    except RateLimitError:
        return "We're experiencing high volume right now. Please try again in 60 seconds."
    except APIConnectionError:
        return FALLBACK_MESSAGE
    except APIStatusError as e:
        if e.status_code == 529:  # Anthropic overload
            return "Our AI is temporarily overloaded. Try again shortly, or email support@shopnova.com."
        return FALLBACK_MESSAGE
    except Exception:
        return FALLBACK_MESSAGE

Replace the chat(history, user_input) call in your main loop with safe_chat(history, user_input) and you’re covered. The key principle: always give the user a concrete next step, not just “something went wrong.”

Warning ⚠️

Don’t catch all exceptions silently in production without logging them somewhere. The code above is clean for a tutorial, but in a real deployment you want at least a logging.error() call in each except block so you know what’s actually failing. Silent failures are how chatbots develop mysterious reputations for being unreliable.

Step 5 — Test with Realistic Conversation Flows

Run your script and test it against these conversation scenarios, which cover the main edge cases for a customer service agent:

Test 1: Basic multi-turn memory

User: I want to return a lamp I bought last week
Aria: [explains return policy]
User: How long will the refund take after you get it back?
Aria: [should reference "it" = the lamp correctly, not ask for clarification]

Test 2: Context switch mid-conversation

User: My order still hasn't arrived — it's been 10 days
Aria: [asks for order number or explains shipping times]
User: Actually forget that, I want to order a new couch. Do you have free shipping?
Aria: [should pivot cleanly to shipping info without continuing to address the delayed order]

Test 3: Out-of-scope question triggering graceful fallback

User: Can you tell me the exact GPS coordinates of your warehouse?
Aria: [should trigger the "I don't have that information" fallback, not guess]

Test 4: Ambiguous pronoun resolution across turns

User: I bought a coffee table and a sofa. The table arrived damaged.
Aria: [acknowledges the damaged table]
User: Can I return it?
Aria: [should correctly identify "it" as the table, not the sofa]

Pro tip ✅

Test 4 is the one most chatbot tutorials skip, and it’s where multi-turn systems actually break down. If Claude gets the pronoun resolution wrong on your first test, it’s usually a system prompt problem — add a line like “When the customer uses pronouns like ‘it’ or ‘that,’ refer to the most recently mentioned item in the conversation history.”

Complete Working Template

Here’s the full assembled script — everything from Steps 1-4 combined into a single file you can save as chatbot.py and run immediately:

import anthropic
from anthropic import APIStatusError, APIConnectionError, RateLimitError
import logging

logging.basicConfig(level=logging.ERROR)
logger = logging.getLogger(__name__)

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from environment

SYSTEM_PROMPT = """You are Aria, a customer service agent for ShopNova, an online retailer specializing in home goods and furniture.

YOUR RESPONSIBILITIES:
- Help customers with order status, returns, refunds, shipping, and product questions
- Escalate complex billing disputes to the human support team
- Keep responses concise — 2-4 sentences unless the question requires more detail

YOUR KNOWLEDGE:
- Return window: 30 days from delivery, items must be unused and in original packaging
- Refunds: processed within 5-7 business days after return is received
- Shipping: standard 5-7 days, express 2-3 days, free standard shipping on orders over $75
- Order tracking: customers can track at shopnova.com/track using their order number

WHEN YOU DON'T KNOW SOMETHING:
- Never guess or fabricate order details, policies, or product specifications
- Say: "I don't have that information, but our support team at support@shopnova.com can help within 24 hours."
- Do NOT apologize excessively

TONE: Friendly but efficient. No filler phrases like "Great question!" or "Absolutely!" """

TOPIC_KEYWORDS = {
    "returns": ["return", "send back", "wrong item", "defective", "damaged"],
    "refunds": ["refund", "money back", "charge", "charged", "billing"],
    "shipping": ["shipping", "delivery", "tracking", "arrived", "package", "delayed"],
    "orders": ["order", "purchase", "bought", "ordered", "order number"]
}

FALLBACK_MESSAGE = "I'm having trouble connecting right now. Please reach us at support@shopnova.com or try again in a moment."

def detect_topic(text: str) -> str | None:
    text_lower = text.lower()
    for topic, keywords in TOPIC_KEYWORDS.items():
        if any(kw in text_lower for kw in keywords):
            return topic
    return None

def chat(message_history: list, user_input: str) -> str:
    topic = detect_topic(user_input)
    enriched_input = f"[Topic: {topic}] {user_input}" if topic else user_input
    
    message_history.append({"role": "user", "content": enriched_input})
    
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=SYSTEM_PROMPT,
        messages=message_history
    )
    
    assistant_message = response.content[0].text
    message_history.append({"role": "assistant", "content": assistant_message})
    return assistant_message

def safe_chat(message_history: list, user_input: str) -> str:
    try:
        return chat(message_history, user_input)
    except RateLimitError:
        logger.error("Rate limit hit")
        return "We're experiencing high volume right now. Please try again in 60 seconds."
    except APIConnectionError as e:
        logger.error(f"Connection error: {e}")
        return FALLBACK_MESSAGE
    except APIStatusError as e:
        logger.error(f"API status error {e.status_code}: {e}")
        if e.status_code == 529:
            return "Our AI is temporarily overloaded. Try again shortly, or email support@shopnova.com."
        return FALLBACK_MESSAGE
    except Exception as e:
        logger.error(f"Unexpected error: {e}")
        return FALLBACK_MESSAGE

def main():
    print("ShopNova Support — type 'quit' to exitn")
    history = []
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() in ["quit", "exit", "bye"]:
            print("Aria: Thanks for contacting ShopNova. Have a good one.")
            break
        if not user_input:
            continue
        response = safe_chat(history, user_input)
        print(f"Aria: {response}n")

if __name__ == "__main__":
    main()

Pro tip ✅

Set your API key as an environment variable — export ANTHROPIC_API_KEY=your_key_here — rather than hardcoding it in the script. The anthropic.Anthropic() constructor picks it up automatically. Hardcoding credentials is how keys end up in GitHub repositories, and then in credential-stuffing attacks.

Avoid 🚫

Don’t let the message history grow indefinitely in a production deployment. A 200,000 token context window sounds generous until you’re paying per token across thousands of concurrent sessions. Add a simple history trimmer that keeps only the system prompt plus the last N turns (10-15 is usually sufficient for customer service) once the conversation exceeds a threshold.

Where to Take This Next

In 30 minutes, you’ve built something real: a stateful, context-aware customer service agent that handles the three hardest parts of conversational AI — memory, topic switching, and graceful failure — without any backend infrastructure. The script runs locally, costs nothing beyond API calls, and is ready to plug into a Flask endpoint, a Vercel serverless function, or a simple web interface with about 20 more lines of code.

The natural next steps depend on what you’re building toward. If this is a production deployment, the priorities are adding proper session management (each user gets their own history array), rate limiting per user, and logging conversation data to understand where the fallbacks are actually triggering. If this is a prototype you’re demoing, wrap it in Gradio (pip install gradio) and you’ll have a shareable web UI in under five minutes. The core architecture doesn’t change — just the transport layer around it. The chatbot you built here is genuinely the hard part, and you’re already done with it.

author avatar
promptyze

promptyze

ADMINISTRATOR