Skip to main content

Handling tool errors

Our harness works — when everything goes right. But tools touch the messy real world: databases time out, APIs go down, the model asks for a tool that doesn't exist or passes a bad argument. In Running the loop for real, any of those would crash the whole program mid-conversation. This lesson fixes that.

It's the first step into making a harness production-ready.

What can go wrong

Think about the moment the harness runs a tool. Several things can fail:

  • The tool's own code throws — the database is unreachable, the carrier's API returns a 500, the network drops.
  • Bad arguments — the model fills in an input that doesn't make sense, and the function blows up.
  • An unknown tool — the model asks for a tool name that isn't in our table. (Rare, but possible.)

In the previous lesson's code, all of these raise a Python exception that nobody catches — so the harness dies and the customer is left hanging.

The key idea: a failed tool is information, not a crash

Here's the mindset shift. When a tool fails, that's not the end of the world — it's just a result the model should know about. A good harness catches the failure, turns it into a message, and hands it back to the model as the tool's result. The model can then apologise, try a different tool, or ask the customer for more detail.

In other words: the loop keeps going. The harness owns the loop (see The harness loop, step by step), so the harness owns failures too.

Tell the model it failed

The Anthropic API lets a tool result carry an is_error flag. Setting it tells the model "this didn't return data — it failed," so the model treats it as a problem to handle rather than as real information.

Guarding the unknown tool

First, a small safety check in run_tool — if the model names a tool we don't have, raise a clear error instead of a cryptic KeyError:

def run_tool(name: str, arguments: dict) -> dict:
"""The harness 'acts': run the function the model asked for."""
print(f" [harness] running {name}({arguments})")
if name not in TOOL_FUNCTIONS:
raise ValueError(f"Unknown tool: {name!r}")
function = TOOL_FUNCTIONS[name]
return function(**arguments)

Catching the failure in the loop

Now the important part. We wrap the tool call in try/except. On success, nothing changes from the previous lesson. On failure, we capture the error message, mark it as an error, and still add it to the results we send back:

for block in response.content:
if block.type == "tool_use":
# A tool can fail (bad arguments, an API down, an unknown name).
# We catch that, tell the model what went wrong, and let it
# decide what to do — instead of crashing the whole harness.
try:
result = run_tool(block.name, block.input)
content = str(result)
is_error = False
except Exception as exc:
content = f"Tool failed: {exc}"
is_error = True
print(f" [harness] {content}")

tool_results.append(
{
"type": "tool_result",
"tool_use_id": block.id,
"content": content,
"is_error": is_error,
}
)

Two things make this work:

  • The loop never breaks. Whether the tool succeeded or failed, we always append a tool_result and continue. The model always gets a turn to react.
  • is_error is honest. A successful result is marked False; a caught exception is marked True with the reason in content.

What the model does with it

Say get_tracking_status throws because the carrier API is down. Instead of crashing, the harness now sends back something like:

tool_result (is_error=True): "Tool failed: carrier API timed out"

The model reads that on the next round and can respond gracefully:

"I found your order — running shoes, order #4521 — but I'm having trouble reaching the carrier for live tracking right now. Want me to try again, or email you the update once it's back?"

That's a far better experience than a stack trace and a dead chatbot. The customer still gets the part that worked, plus an honest explanation of the part that didn't.

Don't leak internals to the customer

We hand the raw error to the model, which is fine — but the model writes the customer-facing reply, so it naturally translates "API timed out" into friendly language. In a real system you'd also be careful never to surface secrets or stack traces directly to end users.

What we deliberately skipped

  • Retries — automatically trying a failed tool again before giving up.
  • Validating arguments before running a tool, using the input schema.
  • Permissions — deciding whether a tool is even allowed to run.

These are the next steps in making the harness production-ready, and we'll get to them.

Recap

  • Real tools fail — APIs go down, arguments are bad, names are wrong. Unhandled, any of these crashes the harness.
  • Treat a failed tool as information for the model, not a fatal error: catch it, describe it, and feed it back as a tool_result.
  • Use the is_error flag so the model knows the result was a failure.
  • The loop keeps running, so the model can recover gracefully — try again, switch tools, or explain the problem to the customer.

Next up: stopping bad tool calls before they run — validating the model's arguments against the schema, and deciding which tools are even allowed.