Docs | AI Spaceship

Every tool so far has been faking it. look_up_orders didn't look anything up — it returned a hardcoded dict. That was the right call while we learned the loop, tools, errors, validation, and permissions: a fake tool kept the focus on the harness. But ShopBot can't help a real customer with invented data. In this lesson we give it a real backend and point the tools at it.

The plan: a separate store service

In the real world, an agent rarely owns the data it works with. The orders live in some existing system — a database behind an internal API — and the agent is just another client that calls it. We'll mirror that:

a small ecommerce store service: a SQLite database with a FastAPI HTTP API,
the harness's tools make HTTP calls to that service.

Keeping the store separate from the harness isn't an accident — it's the point. The agent and the data it uses are different systems. The harness doesn't reach into a database; it calls an API, exactly as it would against a real company backend.

Why a real HTTP API, not just a database call

We could have the tools open the SQLite file directly. Going through an HTTP API instead matches reality: the orders system is owned by someone else, exposed as endpoints, and our agent is one of many callers. It also means the store can enforce its own rules, independent of the agent.

The store, briefly

The database has the tables you'd expect — customers, orders, tracking, and refunds — seeded with the very order ShopBot has been talking about all series:

# orders seed row: order_number, customer_id, item, amount, status, tracking
("4521", 1, "running shoes", 59.99, "shipped", "1Z999")

The API exposes one endpoint per thing a tool needs to do — and they line up one-to-one with the tools from What is a tool?:

Tool	Endpoint
`look_up_orders`	`GET /orders?customer=...`
`get_tracking_status`	`GET /tracking/{tracking_number}`
`issue_refund`	`POST /refunds`

Here's one endpoint, so you can see there's nothing magic — it's a normal query:

@app.get("/orders")
def list_orders(customer: str) -> list[dict]:
    """Look up a customer's orders by their email or numeric id."""
    conn = connect()
    rows = conn.execute(
        "SELECT o.* FROM orders o JOIN customers c ON c.id = o.customer_id "
        "WHERE c.email = ? OR c.id = ?",
        (customer, customer),
    ).fetchall()
    return [dict(row) for row in rows]

The only change the tools needed

This is the satisfying part. Remember a tool is two halves: the schema the model sees, and the function the harness runs. To go from mock to real, we changed only the function bodies. The schemas didn't move a comma.

Before — fake data:

def look_up_orders(customer: str) -> dict:
    """Pretend to look up a customer's most recent order."""
    return {
        "order_number": "4521",
        "item": "running shoes",
        "status": "shipped",
        "tracking_number": "1Z999",
    }

After — a real HTTP call:

def look_up_orders(customer: str) -> list:
    """Look up a customer's orders from the store API."""
    response = httpx.get(
        f"{STORE_URL}/orders", params={"customer": customer}, timeout=TIMEOUT
    )
    response.raise_for_status()
    return response.json()

The model never noticed

Because the schema is unchanged, the model sees the exact same tool it always did. It has no idea the data went from fake to real — and it shouldn't. How a tool gets its answer is the harness's business, not the model's. That clean line is what let us swap the implementation without touching anything else.

Everything we built still applies — and now it's real

The earlier production-ready lessons were written against mock tools, but they were really preparing for this moment:

Handling tool errors — raise_for_status() throws if the store returns a 404 or the service is down. That used to be hypothetical; now it genuinely happens, and the loop's try/except turns it into something the model can explain to the customer.
Validating arguments — still guards every call before it leaves for the network.
Permissions — issue_refund still asks a human first; only after approval does it POST /refunds and actually move money.

Nothing about the harness loop changed. We just made the tools honest.

Running the whole thing

It's now two processes — the store, and the agent that calls it:

# Terminal 1 — the store
cd ecommerce-store
uv run db.py                # once, to create + seed the database
uv run uvicorn main:app     # serves http://127.0.0.1:8000

# Terminal 2 — the agent
cd agent-harness
export ANTHROPIC_API_KEY=sk-ant-...
uv run shopbot.py

Now when ShopBot answers "Where's my order?", the order it reports really came out of a database. Stop the store and run it again, and you'll watch the error-handling path light up for real.

Recap

Mock tools were perfect for learning the harness; a real assistant needs real data.
We added a separate store service (SQLite + a FastAPI HTTP API) and pointed the tools at it — mirroring how agents call existing backends in the real world.
Going from mock to real meant changing only the tool function bodies; the schemas — and therefore the model's view — stayed identical.
Error handling, validation, and permissions all carry over unchanged, and now operate on genuine successes and failures.

That's a complete, real ShopBot: a harness loop driving validated, permissioned tools against a live backend. From here you can add more tools, more endpoints, and richer approval rules — the shape stays the same.