Skip to main content

The harness loop, step by step

In What is an agent harness? we said the harness runs a loop: the model asks for an action, the harness performs it, feeds the result back, and repeats until there's a final answer. That sentence hides a lot. In this lesson we slow it right down and look at every moving part — still in plain English, with a little pseudocode to make it concrete. No real SDK yet; that comes later.

The loop in one picture

Here's the whole cycle. Everything else in this lesson is just zooming into one of these arrows:

┌─────────────────────────────────────────────┐
│ │
▼ │
┌─────────┐ "call look_up_orders" ┌─────────┐ │
│ MODEL │ ────────────────────────► │ HARNESS │ │
│ (thinks)│ │ (acts) │ │
└─────────┘ ◄──────────────────────── └─────────┘ │
│ "here's the result" │ │
│ │ │
│ └────────┘
│ "final answer" (no tool needed)

USER

The model and the harness pass control back and forth. The loop keeps spinning as long as the model keeps asking for tools. The moment the model replies with a plain answer instead of a tool request, the loop ends and the user sees the reply.

The four moving parts

Before the pseudocode, meet the four things the loop juggles:

  1. The conversation — the running list of everything said so far: the user's question, the model's tool requests, and the tool results. This list grows every turn.
  2. The tools — the list of actions the harness is willing to perform, e.g. look_up_orders and get_tracking_status. (We'll define what a "tool" actually is in the next lesson — for now, think "a function the harness can run.")
  3. The model call — handing the conversation + tool list to the model and getting back its next move.
  4. The tool execution — the harness actually running the requested function and capturing what it returns.
The model has no memory

An LLM doesn't remember the previous turn. Each time you call it, you must hand it the whole conversation so far. That's why "the conversation" is a list the harness carries and re-sends every loop. The harness is the memory.

The loop, as pseudocode

This is not real code for any language — it's a sketch of the logic:

conversation = [ user_message ] # start with the question
tools = [ look_up_orders, get_tracking_status ]

repeat:
response = ask_model(conversation, tools) # the model's next move

if response is a final answer:
show_to_user(response.text) # done — leave the loop
stop

if response is a tool request:
result = run(response.tool, response.arguments) # harness does the work
conversation.add(response) # remember what the model asked
conversation.add(result) # remember what came back
# ...and loop again

Read it top to bottom: ask the model, check what it wants. If it's an answer, we're done. If it's a tool request, the harness runs the tool, records both the request and the result onto the conversation, and goes around again — now with more information than before.

Walking ShopBot through it

Let's run our "Where's my order? I bought running shoes last week." example through that pseudocode, one trip around the loop at a time.

Round 1

  • conversation = the customer's question.
  • ask_model(...) → the model replies: "call look_up_orders for this customer." That's a tool request, not an answer.
  • The harness runs look_up_orders → gets back: order #4521, running shoes, status: shipped, tracking 1Z999.
  • Both the request and the result get added to conversation. Loop again.

Round 2

  • conversation now includes the order data.
  • ask_model(...) → the model replies: "call get_tracking_status(1Z999)." Another tool request.
  • The harness calls the carrier → "Out for delivery, arrives today."
  • Added to conversation. Loop again.

Round 3

  • conversation now has the question, the order, and the tracking status.
  • ask_model(...) → the model has everything it needs, so it replies with a final answer: "Your running shoes are out for delivery — they arrive today!"
  • No tool request this time → the loop stops and the user sees the reply.

Notice the pattern: each round the conversation got richer, and the model used that to decide its next move. The model chose the steps; the harness carried them out and kept score.

The most important question: when does it stop?

A loop that never ends is a runaway bill (and a frozen chat). The loop stops when the model returns a final answer instead of a tool request — that's the natural exit, and it's how Round 3 ended above.

But in the real world you don't rely on that alone. A safe harness also adds guardrails, such as:

  • a maximum number of rounds (e.g. stop after 10 loops no matter what), and
  • a stop if a tool keeps failing, so it doesn't retry forever.
The model is in charge of steps, not of stopping

It's tempting to trust the model to always wrap up cleanly. Don't. The harness owns the loop, so the harness owns the limits. "Connects and controls," from the intro lesson — this is the controls part in action.

What we deliberately skipped

To keep the loop clear, we hand-waved a few things we'll come back to:

  • What a "tool" actually is — how the model even knows look_up_orders exists and what arguments it takes. (Next lesson.)
  • What ask_model and run look like in real code — with a real model SDK. (Later lesson.)

Recap

  • The harness runs a loop: ask the model → if it wants a tool, run it and add the result to the conversation → repeat → stop on a final answer.
  • The conversation is a growing list the harness re-sends every round, because the model has no memory of its own.
  • The model decides the steps; the harness executes them and owns when to stop (final answer, max rounds, repeated failures).
  • ShopBot took three rounds: look up the order → check tracking → answer.

Next up: what exactly is a tool? We'll define look_up_orders properly so the model knows it exists, what it does, and what to pass it.