Problem
A chat box is not enough. Useful AI products need routing, grounding, tools, safety checks, and repeatable evaluations.
What I built
A static but product-realistic cockpit that exposes each stage of an LLM workflow so a reviewer can see the engineering decisions behind the answer.
Why it is technically interesting
It models the parts that make AI systems reliable: structured outputs, traces, context assembly, tool policies, and quality gates.
Reliability choices
The demo avoids secret exposure, labels mock data, separates read-only discovery from side effects, and uses eval rubrics before final output.
Feature depth
- Intent router with structured JSON decisions.
- RAG context pack with cited source chunks.
- Tool planner with read/write risk labels.
- Guardrail checks for secrets, grounding, and approval boundaries.
- Evaluation harness for groundedness, safety, usefulness, and instruction following.
AI incorporated
This project is the AI layer itself: it demonstrates how an assistant should route tasks, retrieve context, plan tools, verify output, and keep side effects behind approval gates.
This is framed as a responsible assistant layer: it explains its reasoning, keeps the human in control, and avoids pretending mock data is real production data.
Functional architecture
The project is intentionally designed around clear state, inspectable data flow, and safe upgrade paths instead of a thin screenshot-only demo.
RequestCapture the user goal, constraints, and approval requirements before any tool use.
RouteClassify intent, domains, risk level, and required workflow with structured JSON.
RetrieveAssemble the minimum relevant context pack with source snippets and citations.
PlanPropose tool calls and separate read-only discovery from side-effect actions.
GuardCheck secrets, unsupported claims, unsafe actions, and missing human approval.
ActExecute approved steps while recording inputs, outputs, and decision points.
EvaluateScore groundedness, usefulness, instruction following, and safety before final output.
RespondReturn a concise answer with traceable reasoning and clear next actions.
Reviewer proof path
A hiring reviewer should be able to see not just a chatbot screen, but the engineering controls that make the AI workflow reliable.
Example trace
Input"Improve my portfolio for AI engineering roles without breaking existing pages."
RouteDomain: portfolio; risk: medium; workflow: inspect → edit → verify → deploy.
RAGPull existing project facts, site structure, and deployment notes before writing copy.
ToolsRead files and test links automatically; require approval for destructive changes.
EvalsCheck factual grounding, role alignment, link health, and console errors before release.
What this proves
The demo models production AI patterns without exposing real keys or pretending mock data is live.
System design: explicit stages instead of one opaque prompt.
Backend thinking: trace API, provider adapter, vector store, and tool registry are defined as upgrade paths.
Safety: side effects are separated from read-only discovery and routed through approval gates.
Quality: outputs are evaluated for groundedness, usefulness, safety, and instruction following.
Next production milestones
MVPStatic workflow cockpit and interaction model.
BackendFastAPI trace API, provider adapter, vector store, and tool registry.
ProductionObservability, eval dataset, red-team cases, auth, and rate limits.