An LLM-powered finance copilot we built to run our own books. Ask "where did Q1 marketing spend go?" and get a sourced answer in plain English, not another dashboard you have to read and interpret yourself.
QuickBooks holds the books. Plaid pulls the bank feeds. Each is a source of truth in its own register, and "what did we actually spend in March?" can give three different answers depending which one you ask. The first deliverable was the canonical ledger, Postgres with pgvector for retrieval, that resolves the registers into a single set of transactions. Sync workers keep it current daily.
Once the ledger is the one place to look, every question that follows has a chance of producing the same answer twice in a row.
A LangGraph orchestrator runs the answer step. Given a question, "where did Q1 marketing spend go?", it plans which slice of the ledger to pull, which vendors to consolidate, which time window to scope. Only then does the LLM see a prompt; and the prompt comes pre-loaded with the rows that justify the answer. The plan is the judgment; the LLM's job is to explain it in plain English.
That sequencing is the whole point. Most LLM-on-data setups fail because the model gets a question and starts guessing. Here it gets a question, a plan, and the data, and produces an answer that's auditable rather than just plausible.
The Accountant is a real production system we use every working day. Ask a question; get a plain-English answer with the underlying transactions cited row by row. Nothing is invented, every number traces back. We built it for ourselves because the commodity options stopped at "here's a dashboard, you figure it out." Now it doubles as the cleanest demo we have of what an AI workflow integration actually looks like in practice.
A financial copilot that drifts is a liability. We run a small eval suite over canonical question-answer pairs after every model or prompt change, watch the source-citation rate, and re-vector the retrieval index whenever the ledger schema shifts. New vendors, new accounts, and new categories all flow through the sync workers without breaking the queries written against the old shape. The system stays the system: useful daily, sourced always, never plausibly wrong.
LangGraph orchestrates the query, planning which data to pull before the model ever sees a prompt, so answers stay grounded in real rows. Postgres holds the canonical ledger and, with pgvector, the retrieval index. QuickBooks and Plaid keep the books current. The hard part was never the model. It was making every answer traceable back to a transaction.
30 minutes, no pitch. Mike runs the call.