Handling tool errors
Our harness works — when everything goes right. But tools touch the messy real world: databases time out, APIs go down, the model asks for a tool that doesn't exist or passes a bad argument. In Running the loop for real, any of those would crash the whole program mid-conversation. This lesson fixes that.
It's the first step into making a harness production-ready.
What can go wrong
Think about the moment the harness runs a tool. Several things can fail:
- The tool's own code throws — the database is unreachable, the carrier's API returns a 500, the network drops.
- Bad arguments — the model fills in an input that doesn't make sense, and the function blows up.
- An unknown tool — the model asks for a tool name that isn't in our table. (Rare, but possible.)
In the previous lesson's code, all of these raise a Python exception that nobody catches — so the harness dies and the customer is left hanging.
The key idea: a failed tool is information, not a crash
Here's the mindset shift. When a tool fails, that's not the end of the world — it's just a result the model should know about. A good harness catches the failure, turns it into a message, and hands it back to the model as the tool's result. The model can then apologise, try a different tool, or ask the customer for more detail.
In other words: the loop keeps going. The harness owns the loop (see The harness loop, step by step), so the harness owns failures too.
The Anthropic API lets a tool result carry an is_error flag. Setting it tells
the model "this didn't return data — it failed," so the model treats it as a
problem to handle rather than as real information.
Guarding the unknown tool
First, a small safety check in run_tool — if the model names a tool we don't
have, raise a clear error instead of a cryptic KeyError:
def run_tool(name: str, arguments: dict) -> dict:
"""The harness 'acts': run the function the model asked for."""
print(f" [harness] running {name}({arguments})")
if name not in TOOL_FUNCTIONS:
raise ValueError(f"Unknown tool: {name!r}")
function = TOOL_FUNCTIONS[name]
return function(**arguments)
Catching the failure in the loop
Now the important part. We wrap the tool call in try/except. On success,
nothing changes from the previous lesson. On failure, we capture the error message, mark it
as an error, and still add it to the results we send back:
for block in response.content:
if block.type == "tool_use":
# A tool can fail (bad arguments, an API down, an unknown name).
# We catch that, tell the model what went wrong, and let it
# decide what to do — instead of crashing the whole harness.
try:
result = run_tool(block.name, block.input)
content = str(result)
is_error = False
except Exception as exc:
content = f"Tool failed: {exc}"
is_error = True
print(f" [harness] {content}")
tool_results.append(
{
"type": "tool_result",
"tool_use_id": block.id,
"content": content,
"is_error": is_error,
}
)
Two things make this work:
- The loop never breaks. Whether the tool succeeded or failed, we always
append a
tool_resultand continue. The model always gets a turn to react. is_erroris honest. A successful result is markedFalse; a caught exception is markedTruewith the reason incontent.
What the model does with it
Say get_tracking_status throws because the carrier API is down. Instead of
crashing, the harness now sends back something like:
tool_result (is_error=True): "Tool failed: carrier API timed out"
The model reads that on the next round and can respond gracefully:
"I found your order — running shoes, order #4521 — but I'm having trouble reaching the carrier for live tracking right now. Want me to try again, or email you the update once it's back?"
That's a far better experience than a stack trace and a dead chatbot. The customer still gets the part that worked, plus an honest explanation of the part that didn't.
We hand the raw error to the model, which is fine — but the model writes the customer-facing reply, so it naturally translates "API timed out" into friendly language. In a real system you'd also be careful never to surface secrets or stack traces directly to end users.
What we deliberately skipped
- Retries — automatically trying a failed tool again before giving up.
- Validating arguments before running a tool, using the input schema.
- Permissions — deciding whether a tool is even allowed to run.
These are the next steps in making the harness production-ready, and we'll get to them.
Recap
- Real tools fail — APIs go down, arguments are bad, names are wrong. Unhandled, any of these crashes the harness.
- Treat a failed tool as information for the model, not a fatal error: catch
it, describe it, and feed it back as a
tool_result. - Use the
is_errorflag so the model knows the result was a failure. - The loop keeps running, so the model can recover gracefully — try again, switch tools, or explain the problem to the customer.
Next up: stopping bad tool calls before they run — validating the model's arguments against the schema, and deciding which tools are even allowed.