Multi-turn conversations
Everything we've built answers one question and then quits. But a real support bot is a conversation: the customer asks, ShopBot answers, they ask a follow-up, and it all hangs together. In this lesson we turn the one-shot script into an actual chat — and it's a smaller change than you'd think, because we already did the hard part.
We've been here before — at a smaller scale
Remember from The harness loop, step by step: the model has no memory, so the harness re-sends the whole conversation every round, and the harness is the memory. Inside a single turn, that conversation already grew — the question, the tool calls, the tool results — so the model could build on what it had just learned.
Multi-turn is the exact same idea, one level up: instead of starting a fresh conversation for each customer message, we keep the same conversation alive across turns. The customer's second question lands in a list that still contains everything from the first.
Two loops, not one
The harness now has two nested loops:
- the inner loop — the agentic loop we already have: ask the model, run any tools, repeat until a final answer. This handles one turn.
- the outer loop — read a message from the customer, run the inner loop, show the answer, and wait for the next message.
So we pull the inner loop out into its own function, answer_one_turn:
def answer_one_turn(client, conversation) -> str:
"""Run the loop until the model gives a final answer for this turn."""
for round_number in range(1, MAX_ROUNDS + 1):
response = client.messages.create(
model=MODEL, max_tokens=1024, system=SYSTEM_PROMPT,
tools=TOOL_SCHEMAS, messages=conversation,
)
conversation.append({"role": "assistant", "content": response.content})
if response.stop_reason != "tool_use":
return "".join(b.text for b in response.content if b.type == "text")
# ...run the requested tools and append their results... (unchanged)
conversation.append({"role": "user", "content": tool_results})
It's the loop from before, with one tweak: instead of printing the final answer
and returning, it returns the text so the outer loop can decide what to do with
it. Notice it takes conversation as an argument and keeps appending to it.
The chat loop
The outer loop is the new part — and it's short:
def main():
client = Anthropic()
# ONE conversation for the whole chat — it survives across turns.
conversation = []
print("ShopBot — ask about your order. Type 'quit' to exit.\n")
while True:
user_message = input("You: ").strip()
if user_message.lower() in {"quit", "exit"}:
break
conversation.append({"role": "user", "content": user_message})
answer = answer_one_turn(client, conversation)
print(f"\nShopBot: {answer}\n")
The crucial line is conversation = [] sitting outside the while loop. It's
created once and reused for every turn. Each customer message is appended to it,
and answer_one_turn keeps appending the model's replies and tool results — so
the list just keeps growing across the whole chat.
There's still no memory inside the model. The "memory" is entirely that
conversation list the harness holds and re-sends every single call. Multi-turn
chat is just that list outliving a single question.
Why it feels like it remembers
Because the earlier turns are still in the conversation, the model can use them. Watch a follow-up that needs no new tool call:
You: Where's my order?
ShopBot: Your running shoes (order #4521) are out for delivery — they arrive today!
You: what was the order number again?
ShopBot: It's order #4521.
ShopBot answered the second question without calling any tool — the order number was already in the conversation from the first turn. The model just read it back. That's the whole payoff of keeping the conversation alive.
Tidying up the chat
A couple of small things make it usable as a terminal program:
- An exit — typing
quitorexitbreaks the loop. - Graceful end-of-input — catching
EOFError/KeyboardInterrupt(Ctrl-D / Ctrl-C) so the program ends cleanly instead of throwing. - Ignoring empty input — a stray Enter shouldn't call the model.
One thing to watch: the conversation only grows
That ever-growing list is also a cost. Every turn re-sends everything before it, so a long chat means bigger, slower, more expensive model calls — and eventually you hit the model's context limit.
For a short support chat this is fine. For long-running conversations, real systems trim or summarise old turns — e.g. keep the last N messages, or replace old history with a short summary. The harness owns the conversation, so the harness owns that policy too.
Recap
- A real chatbot is multi-turn: many messages that build on each other.
- The trick is simply keeping one
conversationlist alive across turns, created outside the chat loop — the same "harness is the memory" idea, scaled up. - Split the code into an inner loop (
answer_one_turn, one question) and an outer chat loop (read message → answer → repeat). - The model "remembers" only because earlier turns are still in the list it's re-sent every call.
- That list grows forever, so long conversations need a trim/summarise strategy.
You've gone from a single model call all the way to a stateful, multi-turn ShopBot with validated, permissioned tools, real data, and proper logging — a complete agent harness.