Metadata

ID:

2025-11-25-gary-rostam-mahabadi-ai-architecture-walkthrough

Participants:

Call/Meeting Summary: AI Agent Architecture Deep Dive & Collaboration Discussion

Overview

Rostam hosted a LinkedIn Space event to walk through agent architecture fundamentals and real-world implementations. The conversation covered core agent components (system prompts, tools, RAG, evaluations), a detailed ticketing agent architecture walkthrough, advanced optimization techniques, and discussions about potential collaboration, education strategy, and building an AI consulting agency together.

Key Topics Discussed

Agent Architecture Fundamentals

Core Components of Every Agent:

System Prompt: Defines who the agent is, its role, and guardrails. Critical for preventing hallucinations and ensuring appropriate responses
Tools: What the agent has access to (RAG, databases, APIs, MCPs, HubSpot, Linear, live chat)
RAG (Retrieval Augmented Generation): Knowledge base access through vector databases
Evaluations & Observability: Using LangSmith to track performance, costs, latency, and accuracy

Rostam's Philosophy on Agents:

"An agent is a human being. What would you task a human being with doing, and how would you instruct that human being to do that task with the given resources?"

Key Insight on Agent Complexity:

"Anyone can build an agent. Agent is not difficult. It's the finesse and the expertise comes with how you architect the system."

Ticketing Agent Architecture Walkthrough

Goal: Reduce number of tickets in a support system (using HubSpot as example)

Four-Step Architecture:

Identify All Data:
- Extract all completed tickets (questions, answers, feedback)
- Build self-improving system to understand why tickets are generated
- Create knowledge base from historical Q&A pairs
Knowledge Base Building (RAG Pipeline):
- Take all documentation (technical setup, general questions, industry questions)
- Put into vector database for similarity search
- Rostam recommends RAGI (Raggy) for their reclassification and indexing capabilities
- When ticket is completed, extract core ideas and create 10 different ways to ask the question
- Run through RAG to see if answer exists in knowledge base
- If not in knowledge base → create Linear ticket to add to docs
- If in knowledge base but user query problem → gather failed queries to improve system prompt
Tool Definition:
- RAG access for knowledge base queries
- HubSpot integration to create tickets
- Live chat with human employees for immediate escalation
- All tools feed back into self-improvement cycle
Evals & Observability:
- Use LangSmith for tracing, annotations, datasets
- Track: time to first token, reasoning chain, tools used, costs per step, total latency
- Human-in-the-loop annotations to label why responses were good/bad
- Feed bad responses back into system prompt improvement cycle

Advanced Optimization - Redis Caching Strategy:

Generate top 10 Q&A pairs per category (technical, product, etc.) each morning via cron job
Store in Redis with vector similarities
When user asks question, check Redis first (95% similarity threshold)
If match found, return immediately without running RAG pipeline
Track which questions get asked most, remove unused ones, generate new ones
Provides fast answers for common questions while reducing RAG pipeline load

RAG Pipeline Optimization Levers:

Top K Value: How many chunks to retrieve (trade-off: more context = higher latency but potentially higher accuracy)
Similarity Ratio: How contextually similar results must be (higher = fewer results, less latency, but might miss relevant context)
Reclassification: Reorganize results by contextual relevance
Breadth vs Depth: Search across documents vs deep within documents
Metadata Filtering: Filter by patient data, internal docs, etc. using metadata keys
Partitioning: Separate knowledge bases by API key (e.g., customer-facing vs internal docs)

System Prompt Engineering & Guardrails

Critical Components:

User persona definition (who the agent is, what role it plays)
Tool usage instructions (when to use RAG, when NOT to use it)
Confidence intervals: "If you don't have high confidence that you have the knowledge within your general knowledge, please utilize the RAG tool"
Guardrails: Industry-specific restrictions (e.g., medical companies cannot provide medical advice via OpenAI API - it's illegal)
Test suites to break guardrails and improve system prompts cyclically

Self-Improving Guardrail System:

Daily test suite with 50+ creative ways to break guardrails
Feed broken guardrails back into system prompt improvement
Critical for preventing malicious access to system prompts, trade secrets, or inappropriate responses

Memory & State Management

Types of Memory:

Short-term memory: Information needed for every query (e.g., "I take Lipitor" - relevant to all medical questions)
Long-term memory: Historical context (e.g., "I've had back pain for 3 years" - relevant when asking about exercise routines)
Thread Summarization: Summarize last 10 messages to reduce token usage while maintaining context

Memory Implementation:

Rostam has used Mem0 but also built custom solutions with pgVector
Store memories in partitioned database with JSON objects
Extract knowledge events from conversations to store in long-term memory

Multi-Agent Architecture

Concept: Specialized agents with single responsibilities rather than general agents

Agent 1: Responsibility X (e.g., investor questions)
Agent 2: Responsibility Y (e.g., patient support for clinical trials)
Agent 3: Responsibility Z (e.g., patient feedback collection)

Routing System:

LLM with structured output determines which agent should handle query
Before each step, check if query is still relevant to current agent
If not relevant, break out and route to different agent
User never knows they're talking to different agents - feels like continuous conversation

Example Use Case - Cancer Clinic:

Patient support agent: Helps elderly patients (70-80 years old) who can't access doctors immediately, reducing trial dropout rates
Patient feedback agent: Generates questions on-the-fly to gather more data for clinical trials (currently doctors only see patients every 2-3 weeks)

Technical Stack & Tools

Vercel AI SDK (Rostam's Preferred):

Modular and provider-agnostic
Can switch between AI providers if one goes down (built-in fail-safe)
Supports multiple RAG databases (pgVector, Pinecone, Quadrant)
Built-in retry logic and step limits
Deploys well on Vercel but also supports GCP, AWS, Docker

LangChain & LangSmith Ecosystem:

LangChain: Prompt management and A/B testing (update prompts without redeploying)
LangSmith: Observability, tracing, annotations, datasets
LangGraph: Node-based workflow system with mermaid diagrams
Subgraphs: Complex nested agent systems
Best for enterprise-grade systems already using LangChain

Mastra: TypeScript-focused agentic building platform (Rostam getting into it)

RAGI (Raggy): Rostam's preferred RAG service

Out-of-the-box automation OR fine-grained control
Reclassification and indexing capabilities
Handles sentence splitting, vector storage, retrieval optimization

MCPs (Model Context Protocol): API integrations (GitHub, Neon, external databases)

Real-World Project: Medical QA Agent (Function)

Architecture Overview:

Thread title generation from user question
Pull patient medical data
Extract memories (e.g., medication changes, lab results)
RAG subgraph for relevant context
Assistant calls LLM with medical tools
Tools execute function calls
Cache management for patient data (check if cache broken, store or retrieve)

Complex Challenge - Longitudinal Data:

2 tests per year × 5 years × 180-1000+ biomarkers = 10,000+ data points
Problem: How to meaningfully answer "Am I getting better with my heart health?"
Solution: Pipeline to summarize longitudinal data into semantic text (clinical reports)
Classify questions as longitudinal vs general, pull from appropriate dataset
Rostam notes: "RAGI, no one solved this. They're all working on it right now. It's a really, really, really difficult problem."

Project Timeline:

3.5 weeks, 18-hour days
Released beta version of single agent, then built multi-agent system
Deadline pressure from investors wanting to launch on specific date
Rostam hired Natalie (Gauntlet cohort) and another person for Function

Agent Graph / Kevin Situation

Rostam Cut Ties with Kevin:

Kevin wanted Rostam as employee (not co-founder) with 1-3% equity
Rostam bringing in $100K+ contracts and co-founder-level work
Kevin offering $150K salary - Rostam: "I'm taking a pay cut for you. Like, no, no. I know what I'm worth"
Rostam: "I'm going to rebuild what he built, which I literally told him everything, like how to build everything. So I know how to build it all."

Rostam's New Direction:

Building his own marketplace/platform
Automating more of the pipeline
Solving scalability issues (Slack channels cost $5/user/month, not scalable)
Core ideology from original Agent Graph team (Roger, Adam, Pat, Lamar) but with his own flavor

Gary's Journey & Alpha Experience

Why Gary Left Alpha:

Proposed big education conference, Joe was too busy
Started being entrepreneurial, identifying pain points, matching builders with opportunities
Hosted event at Zach Levi's Ranch with 120 people (including Kanye West's manager) with week's notice
Reality check: Poor internal processes for starting new schools
Random middle manager seized authority over events, canceled Gary's event
Realization: "A lot of these people they're not actually there to really make an impact they're there to have a job and they see people as threats to their job"
Left and took spiritual sabbatical

Rostam's Response:

"Aren't you guys having the same goal? That's so stupid... I'm never scared of a job like I don't think I have to be ever scared of having a job in this like with the knowledge that we have"

Potential Collaboration & Business Strategy

Hackathon Partnership:

Rostam invited Gary to join December 13th NVIDIA hackathon team
Previous teammate left because he didn't know anything about agents/LLMs
Rostam wants clear communication and alignment throughout process
Prize: DJI device ($4,000) that can fine-tune up to 200B parameter models
NVIDIA offered Rostam podcast opportunity if he builds something cool

Education & Content Strategy Discussion:

Gary's Vision:

Best education arm that's also marketing flywheel and recruiting flywheel
Explain to people: "If you're this quality, this quality, this quality... explore this path"
Free workshops, courses, content
Differentiator: Real experience with billion-dollar companies, not just YouTubers teaching theory
Gary: "I ran a digital first education nonprofit for three years. Posted hundreds of videos online. And I've just been waiting for the next thing to be really excited about. To evangelize. That doesn't make me feel like a trendy sellout."

Rostam's Perspective:

Education can charge for courses AND train workforce
Problem: "Why should I choose this course over another?"
Differentiator: Real experience with actual companies, documented case studies
Started creating case scenarios on website but at high level (no mermaid diagrams, anonymized)
Concern: "I don't want to have a 3 billion dollar company coming out and like coming after me"

Potential Agency Model:

Standardized process/SOP even if agents are different
Certification system (like Gauntlet): Submit project, AI analyzes it, get badge
Division of labor: Rostam architects, Gary handles marketing/education/BD
Target: People who can create 5-figure+ budgets for digital transformation
Avoid: "People that created their app with vibe coding" (no-code/low-code builders)

Client Strategy:

Focus on big companies (not broke startups)
Work with VCs to get portfolio company access
Rostam's fee: $200/hour for projects
Marketplace model: Charge 20% of hours, developers get $80-150/hour
Colombian developers: $4-5K/month (Harvard of Colombia), wider spread, compound benefits

AITX Community & Events

Current State:

AITX is largest AI community in Austin (besides Fiesta which is more generalized)
90% of attendees building their own SaaS, not contracting/agencies
Someone buying house to host more events (CEO getting pressured to go to San Francisco)

Gap Identified:

No events for agent builders sharing architecture/case studies
No documentation of workflows, stacks, best practices
Google search for "best architectural design systems for multi-agent architectures" returns poor results
Opportunity: Conference of agent builders talking about architecture
Database of: "This is a medical company. This is how we built it. This is why we built it."

Rostam's Interest:

Wants to host talks at the house
Bring community/network together
But concerned: "I have something of value" - don't want to give away too much
Gary's perspective: Once agency/system is set up, it becomes recruiting tool

Future of AI & Technology

Agents vs. Neural Networks:

Rostam: "Agents are less sexy because it's already, there's no exponential growth in agent development right now"
Future direction: Moving toward neural networks and custom model building
"Agents are dead" - the future is small LLMs and diffusion models

Diffusion LLMs (Future of All LLMs):

Moving away from ARM (autoregressive modeling - one token at a time like ChatGPT)
Diffusion generates everything at once, then diffuses to human-readable
Google and Inception Labs leading (Inception has Mercury)
Mercury: 1,000 tokens per second on H100, accuracy of Sonnet 3.5
For 99% of applications, speed > 3-4% accuracy bumps
Rostam: "You should sign up for their beta. Diffusion's insane"

Market Timing:

Now is the time for AI consulting
But agents will become commoditized as people learn to build them
Long-term: Training/education market will be huge

Key Quotes

On Agent Architecture:

"An agent is a human being. What would you task a human being with doing, and how would you instruct that human being to do that task with the given resources?"

"Anyone can build an agent. Agent is not difficult. It's the finesse and the expertise comes with how you architect the system."

On Business Philosophy:

"I know what I'm worth, and I know what value I bring"

"I'm never scared of a job like I don't think I have to be ever scared of having a job in this like with the knowledge that we have"

On Education Strategy:

"The differentiator, for me, is clearly... That billion-dollar company. Right. And none of this is documented, by the way. So you have to attend this course if you don't want just random YouTuber"

On Market Opportunity:

"At the end of the day, it's about your network and this agent building stuff... it's all about who you know and how you can get contacts if you can get a lot of contacts you win"

On Future Technology:

"Agents are less sexy because it's already, there's no exponential growth in agent development right now... we're kind of shifting away from this agentic architecture to small LLMs, and then if you look up diffusion LLMs, that's going to be the future of all LLMs"

Action Items / Takeaways

For Gary:

Join Rostam for December 13th NVIDIA hackathon (team up)
Continue learning agent architecture fundamentals
Build sample agents to demonstrate capability
Consider education/content strategy for AI consulting
Explore AITX events and community

For Rostam:

Continue hosting LinkedIn Spaces/educational content
Build own marketplace platform (post-Agent Graph)
Consider education arm for agency (with or without Gary)
Meet with Thompson Reuters/Westlaw Head of AI this week
Host talks at new house venue for agent builder community

Potential Collaboration:

Hackathon partnership (December 13th)
Education/content creation (Gary's strength)
Agency building (Rostam architects, Gary handles BD/marketing)
Case study documentation (anonymized, high-level)
Community events for agent builders

Strategic Insights:

Focus on big companies, not startups (more impact, better pay, VC portfolio access)
Education is both revenue stream AND recruiting tool
Real experience with billion-dollar companies is key differentiator
Network and relationships matter more than technical skills alone
Now is the time for AI consulting, but agents will commoditize - need to stay ahead

Technical Learnings

RAG Optimization:

Redis caching for common questions (95% similarity threshold)
Top K, similarity ratio, reclassification, breadth vs depth are key levers
Metadata filtering and partitioning for access control
Longitudinal data is unsolved problem in RAG

System Design:

Multi-agent architecture with routing > general agents
Memory management (short-term vs long-term)
Thread summarization to reduce tokens
Self-improving systems with feedback loops
Guardrail testing and improvement cycles

Tool Ecosystem:

Vercel AI SDK for modularity and provider-agnostic setup
LangChain/LangSmith for enterprise systems
LangGraph for complex node-based workflows
RAGI for RAG pipelines
MCPs for API integrations

Metadata

Overview​

Key Topics Discussed​

Agent Architecture Fundamentals​

Ticketing Agent Architecture Walkthrough​

System Prompt Engineering & Guardrails​

Memory & State Management​

Multi-Agent Architecture​

Technical Stack & Tools​

Real-World Project: Medical QA Agent (Function)​

Agent Graph / Kevin Situation​

Gary's Journey & Alpha Experience​

Potential Collaboration & Business Strategy​

AITX Community & Events​

Future of AI & Technology​

Key Quotes​

Action Items / Takeaways​

Technical Learnings​