Building an AI Agent Orchestration System That Shipped to 200 Users on Day One

How I architected Champion's five-agent investor simulation platform — multi-agent design patterns, second-brain memory systems, AWS infrastructure, and shipping to 200 users on launch day.

launch day looked scary on paper. 200 founders signed up and started pitching AI investors at once. but the backend held. no incidents. no latency spikes. no failed agent calls.

that wasn't luck. that was architecture.

i'm Mohd Mursaleen, a full-stack engineer based in Bengaluru. i was one of three core engineers at Champion (meetchamp.in). this is what i built, what worked, and what i would change.

What Champion Does

Champion lets founders upload pitch decks and company data, then practice investor conversations with AI personas. one persona can be a skeptical Series A VC. another can be an operator angel. another can be a strategic corporate investor.

hard part is not one reply. hard part is keeping each persona coherent across long, multi-turn conversations while retrieval, synthesis, and memory all run together across multiple agents.

The Five-Agent Architecture

core system is a 5-agent pipeline orchestrated by a LangGraph state machine.

1. Intake Agent — Takes uploaded pitch inputs (PDF/slides/text), parses them, and extracts structured signals: business model, market size, traction, team context.

2. Persona Agent — Owns voice, thesis, and behavior for the investor persona. This is the agent the founder actually talks to.

3. Knowledge Agent — Handles retrieval from company knowledge + investor domain context. Uses Neo4j graph relationships + vector search.

4. Memory Agent — Tracks conversation facts and writes persistent memory ("Series A in 6 months", "team has 3 engineers") across sessions.

5. Synthesis Agent — Updates internal persona state after every founder turn: interest level, concerns, and next probes.

The State Machine Design

big design decision: agents don't call each other directly. they read/write a shared state object.

class ConversationState(TypedDict):
    """Shared state passed between all agents in the pipeline."""
    session_id: str
    founder_message: str
    persona_id: str
    retrieved_context: list[str]
    memory_facts: list[str]
    persona_state: dict  # interest, concerns, etc.
    response: str
    turn_number: int

each agent is basically a pure function: state in, state update out. LangGraph routes execution.

this made two things easy:

test agents in isolation
debug a failure by replaying a single turn's input state

turn flow looked like this:

founder_input → knowledge_retrieval → memory_retrieval
      ↓                                      ↓
  synthesis ←────────────────────────────────┘
      ↓
  persona_response → memory_update

The "Second Brain" Memory System

memory was harder than agent logic. context windows are finite. but founders return across multiple sessions over multiple days. the persona still has to remember what happened earlier.

we used two layers:

Short-term: sliding window of last 12 turns, summarized every 4 turns to control token growth.

Long-term: Neo4j graph where Memory Agent writes persistent facts as typed relationships. example: (Founder {name: "Priya"}) -[:MENTIONED]-> (Milestone {event: "Series A in 6 months", turn: 3}).

at each new turn, Knowledge Agent queries relevant long-term facts and injects them into retrieval context. this made memory feel continuous without burning the context window.

AWS Infrastructure and Why We Used SQS

each investor session could run up to 45 minutes. we needed per-turn statefulness and horizontal scaling across sessions.

architecture:

ALB (Application Load Balancer) routes WebSocket traffic to EC2
SQS queues pipeline jobs so HTTP latency isn't tightly coupled to agent compute latency
EC2 Auto Scaling adds instances when queue depth rises

SQS was the key on launch day. without it, slow runs would block everyone behind them. with it, API returned job IDs fast and frontend polled. users saw a 2-4s "typing" indicator instead of a frozen screen, which is better UX anyway.

What Broke During Testing

three real failure modes:

Agent timeouts: one slow retrieval call could cascade into synthesis timeout. fix: per-agent timeout + graceful fallback (shorter synthesis response when retrieval is slow).

Persona drift: after 20+ turns, some personas contradicted their own thesis. fix: persona anchor (immutable persona facts synthesis cannot overwrite).

Memory graph fan-out: too many facts per turn made retrieval noisy. fix: relevance filter dropping facts below 0.7 cosine similarity to current topic.

Launch Day

peak was 200 concurrent sessions. SQS hit ~800 pending jobs in hour one. EC2 Auto Scaling added 3 instances. queue drained in 8 minutes. zero incidents.

system held because each component had a narrow job and explicit failure mode. slow Knowledge Agent? Persona Agent degrades gracefully. noisy memory retrieval? Synthesis Agent filters.

if you're working on similar systems and want the voice side, read my real-time speech-to-speech post.