Metadata
Call/Meeting Summary: AI Agent Architecture Deep Dive & Collaboration Discussion
Overview
Rostam hosted a LinkedIn Space event to walk through agent architecture fundamentals and real-world implementations. The conversation covered core agent components (system prompts, tools, RAG, evaluations), a detailed ticketing agent architecture walkthrough, advanced optimization techniques, and discussions about potential collaboration, education strategy, and building an AI consulting agency together.
Key Topics Discussed
Agent Architecture Fundamentals
Core Components of Every Agent:
- System Prompt: Defines who the agent is, its role, and guardrails. Critical for preventing hallucinations and ensuring appropriate responses
- Tools: What the agent has access to (RAG, databases, APIs, MCPs, HubSpot, Linear, live chat)
- RAG (Retrieval Augmented Generation): Knowledge base access through vector databases
- Evaluations & Observability: Using LangSmith to track performance, costs, latency, and accuracy
Rostam's Philosophy on Agents:
"An agent is a human being. What would you task a human being with doing, and how would you instruct that human being to do that task with the given resources?"
Key Insight on Agent Complexity:
"Anyone can build an agent. Agent is not difficult. It's the finesse and the expertise comes with how you architect the system."
Ticketing Agent Architecture Walkthrough
Goal: Reduce number of tickets in a support system (using HubSpot as example)
Four-Step Architecture:
-
Identify All Data:
- Extract all completed tickets (questions, answers, feedback)
- Build self-improving system to understand why tickets are generated
- Create knowledge base from historical Q&A pairs
-
Knowledge Base Building (RAG Pipeline):
- Take all documentation (technical setup, general questions, industry questions)
- Put into vector database for similarity search
- Rostam recommends RAGI (Raggy) for their reclassification and indexing capabilities
- When ticket is completed, extract core ideas and create 10 different ways to ask the question
- Run through RAG to see if answer exists in knowledge base
- If not in knowledge base → create Linear ticket to add to docs
- If in knowledge base but user query problem → gather failed queries to improve system prompt
-
Tool Definition:
- RAG access for knowledge base queries
- HubSpot integration to create tickets
- Live chat with human employees for immediate escalation
- All tools feed back into self-improvement cycle
-
Evals & Observability:
- Use LangSmith for tracing, annotations, datasets
- Track: time to first token, reasoning chain, tools used, costs per step, total latency
- Human-in-the-loop annotations to label why responses were good/bad
- Feed bad responses back into system prompt improvement cycle
Advanced Optimization - Redis Caching Strategy:
- Generate top 10 Q&A pairs per category (technical, product, etc.) each morning via cron job
- Store in Redis with vector similarities
- When user asks question, check Redis first (95% similarity threshold)
- If match found, return immediately without running RAG pipeline
- Track which questions get asked most, remove unused ones, generate new ones
- Provides fast answers for common questions while reducing RAG pipeline load
RAG Pipeline Optimization Levers:
- Top K Value: How many chunks to retrieve (trade-off: more context = higher latency but potentially higher accuracy)
- Similarity Ratio: How contextually similar results must be (higher = fewer results, less latency, but might miss relevant context)
- Reclassification: Reorganize results by contextual relevance
- Breadth vs Depth: Search across documents vs deep within documents
- Metadata Filtering: Filter by patient data, internal docs, etc. using metadata keys
- Partitioning: Separate knowledge bases by API key (e.g., customer-facing vs internal docs)
System Prompt Engineering & Guardrails
Critical Components:
- User persona definition (who the agent is, what role it plays)
- Tool usage instructions (when to use RAG, when NOT to use it)
- Confidence intervals: "If you don't have high confidence that you have the knowledge within your general knowledge, please utilize the RAG tool"
- Guardrails: Industry-specific restrictions (e.g., medical companies cannot provide medical advice via OpenAI API - it's illegal)
- Test suites to break guardrails and improve system prompts cyclically
Self-Improving Guardrail System:
- Daily test suite with 50+ creative ways to break guardrails
- Feed broken guardrails back into system prompt improvement
- Critical for preventing malicious access to system prompts, trade secrets, or inappropriate responses
Memory & State Management
Types of Memory:
- Short-term memory: Information needed for every query (e.g., "I take Lipitor" - relevant to all medical questions)
- Long-term memory: Historical context (e.g., "I've had back pain for 3 years" - relevant when asking about exercise routines)
- Thread Summarization: Summarize last 10 messages to reduce token usage while maintaining context
Memory Implementation:
- Rostam has used Mem0 but also built custom solutions with pgVector
- Store memories in partitioned database with JSON objects
- Extract knowledge events from conversations to store in long-term memory
Multi-Agent Architecture
Concept: Specialized agents with single responsibilities rather than general agents
- Agent 1: Responsibility X (e.g., investor questions)
- Agent 2: Responsibility Y (e.g., patient support for clinical trials)
- Agent 3: Responsibility Z (e.g., patient feedback collection)
Routing System:
- LLM with structured output determines which agent should handle query
- Before each step, check if query is still relevant to current agent
- If not relevant, break out and route to different agent
- User never knows they're talking to different agents - feels like continuous conversation
Example Use Case - Cancer Clinic:
- Patient support agent: Helps elderly patients (70-80 years old) who can't access doctors immediately, reducing trial dropout rates
- Patient feedback agent: Generates questions on-the-fly to gather more data for clinical trials (currently doctors only see patients every 2-3 weeks)
Technical Stack & Tools
Vercel AI SDK (Rostam's Preferred):
- Modular and provider-agnostic
- Can switch between AI providers if one goes down (built-in fail-safe)
- Supports multiple RAG databases (pgVector, Pinecone, Quadrant)
- Built-in retry logic and step limits
- Deploys well on Vercel but also supports GCP, AWS, Docker
LangChain & LangSmith Ecosystem:
- LangChain: Prompt management and A/B testing (update prompts without redeploying)
- LangSmith: Observability, tracing, annotations, datasets
- LangGraph: Node-based workflow system with mermaid diagrams
- Subgraphs: Complex nested agent systems
- Best for enterprise-grade systems already using LangChain
Mastra: TypeScript-focused agentic building platform (Rostam getting into it)
RAGI (Raggy): Rostam's preferred RAG service
- Out-of-the-box automation OR fine-grained control
- Reclassification and indexing capabilities
- Handles sentence splitting, vector storage, retrieval optimization
MCPs (Model Context Protocol): API integrations (GitHub, Neon, external databases)
Real-World Project: Medical QA Agent (Function)
Architecture Overview:
- Thread title generation from user question
- Pull patient medical data
- Extract memories (e.g., medication changes, lab results)
- RAG subgraph for relevant context
- Assistant calls LLM with medical tools
- Tools execute function calls
- Cache management for patient data (check if cache broken, store or retrieve)
Complex Challenge - Longitudinal Data:
- 2 tests per year × 5 years × 180-1000+ biomarkers = 10,000+ data points
- Problem: How to meaningfully answer "Am I getting better with my heart health?"
- Solution: Pipeline to summarize longitudinal data into semantic text (clinical reports)
- Classify questions as longitudinal vs general, pull from appropriate dataset
- Rostam notes: "RAGI, no one solved this. They're all working on it right now. It's a really, really, really difficult problem."
Project Timeline:
- 3.5 weeks, 18-hour days
- Released beta version of single agent, then built multi-agent system
- Deadline pressure from investors wanting to launch on specific date
- Rostam hired Natalie (Gauntlet cohort) and another person for Function
Agent Graph / Kevin Situation
Rostam Cut Ties with Kevin:
- Kevin wanted Rostam as employee (not co-founder) with 1-3% equity
- Rostam bringing in $100K+ contracts and co-founder-level work
- Kevin offering $150K salary - Rostam: "I'm taking a pay cut for you. Like, no, no. I know what I'm worth"
- Rostam: "I'm going to rebuild what he built, which I literally told him everything, like how to build everything. So I know how to build it all."
Rostam's New Direction:
- Building his own marketplace/platform
- Automating more of the pipeline
- Solving scalability issues (Slack channels cost $5/user/month, not scalable)
- Core ideology from original Agent Graph team (Roger, Adam, Pat, Lamar) but with his own flavor
Gary's Journey & Alpha Experience
Why Gary Left Alpha:
- Proposed big education conference, Joe was too busy
- Started being entrepreneurial, identifying pain points, matching builders with opportunities
- Hosted event at Zach Levi's Ranch with 120 people (including Kanye West's manager) with week's notice
- Reality check: Poor internal processes for starting new schools
- Random middle manager seized authority over events, canceled Gary's event
- Realization: "A lot of these people they're not actually there to really make an impact they're there to have a job and they see people as threats to their job"
- Left and took spiritual sabbatical
Rostam's Response:
"Aren't you guys having the same goal? That's so stupid... I'm never scared of a job like I don't think I have to be ever scared of having a job in this like with the knowledge that we have"
Potential Collaboration & Business Strategy
Hackathon Partnership:
- Rostam invited Gary to join December 13th NVIDIA hackathon team
- Previous teammate left because he didn't know anything about agents/LLMs
- Rostam wants clear communication and alignment throughout process
- Prize: DJI device ($4,000) that can fine-tune up to 200B parameter models
- NVIDIA offered Rostam podcast opportunity if he builds something cool
Education & Content Strategy Discussion:
Gary's Vision:
- Best education arm that's also marketing flywheel and recruiting flywheel
- Explain to people: "If you're this quality, this quality, this quality... explore this path"
- Free workshops, courses, content
- Differentiator: Real experience with billion-dollar companies, not just YouTubers teaching theory
- Gary: "I ran a digital first education nonprofit for three years. Posted hundreds of videos online. And I've just been waiting for the next thing to be really excited about. To evangelize. That doesn't make me feel like a trendy sellout."
Rostam's Perspective:
- Education can charge for courses AND train workforce
- Problem: "Why should I choose this course over another?"
- Differentiator: Real experience with actual companies, documented case studies
- Started creating case scenarios on website but at high level (no mermaid diagrams, anonymized)
- Concern: "I don't want to have a 3 billion dollar company coming out and like coming after me"
Potential Agency Model:
- Standardized process/SOP even if agents are different
- Certification system (like Gauntlet): Submit project, AI analyzes it, get badge
- Division of labor: Rostam architects, Gary handles marketing/education/BD
- Target: People who can create 5-figure+ budgets for digital transformation
- Avoid: "People that created their app with vibe coding" (no-code/low-code builders)
Client Strategy:
- Focus on big companies (not broke startups)
- Work with VCs to get portfolio company access
- Rostam's fee: $200/hour for projects
- Marketplace model: Charge 20% of hours, developers get $80-150/hour
- Colombian developers: $4-5K/month (Harvard of Colombia), wider spread, compound benefits
AITX Community & Events
Current State:
- AITX is largest AI community in Austin (besides Fiesta which is more generalized)
- 90% of attendees building their own SaaS, not contracting/agencies
- Someone buying house to host more events (CEO getting pressured to go to San Francisco)
Gap Identified:
- No events for agent builders sharing architecture/case studies
- No documentation of workflows, stacks, best practices
- Google search for "best architectural design systems for multi-agent architectures" returns poor results
- Opportunity: Conference of agent builders talking about architecture
- Database of: "This is a medical company. This is how we built it. This is why we built it."
Rostam's Interest:
- Wants to host talks at the house
- Bring community/network together
- But concerned: "I have something of value" - don't want to give away too much
- Gary's perspective: Once agency/system is set up, it becomes recruiting tool
Future of AI & Technology
Agents vs. Neural Networks:
- Rostam: "Agents are less sexy because it's already, there's no exponential growth in agent development right now"
- Future direction: Moving toward neural networks and custom model building
- "Agents are dead" - the future is small LLMs and diffusion models
Diffusion LLMs (Future of All LLMs):
- Moving away from ARM (autoregressive modeling - one token at a time like ChatGPT)
- Diffusion generates everything at once, then diffuses to human-readable
- Google and Inception Labs leading (Inception has Mercury)
- Mercury: 1,000 tokens per second on H100, accuracy of Sonnet 3.5
- For 99% of applications, speed > 3-4% accuracy bumps
- Rostam: "You should sign up for their beta. Diffusion's insane"
Market Timing:
- Now is the time for AI consulting
- But agents will become commoditized as people learn to build them
- Long-term: Training/education market will be huge
Key Quotes
On Agent Architecture:
"An agent is a human being. What would you task a human being with doing, and how would you instruct that human being to do that task with the given resources?"
"Anyone can build an agent. Agent is not difficult. It's the finesse and the expertise comes with how you architect the system."
On Business Philosophy:
"I know what I'm worth, and I know what value I bring"
"I'm never scared of a job like I don't think I have to be ever scared of having a job in this like with the knowledge that we have"
On Education Strategy:
"The differentiator, for me, is clearly... That billion-dollar company. Right. And none of this is documented, by the way. So you have to attend this course if you don't want just random YouTuber"
On Market Opportunity:
"At the end of the day, it's about your network and this agent building stuff... it's all about who you know and how you can get contacts if you can get a lot of contacts you win"
On Future Technology:
"Agents are less sexy because it's already, there's no exponential growth in agent development right now... we're kind of shifting away from this agentic architecture to small LLMs, and then if you look up diffusion LLMs, that's going to be the future of all LLMs"
Action Items / Takeaways
For Gary:
- Join Rostam for December 13th NVIDIA hackathon (team up)
- Continue learning agent architecture fundamentals
- Build sample agents to demonstrate capability
- Consider education/content strategy for AI consulting
- Explore AITX events and community
For Rostam:
- Continue hosting LinkedIn Spaces/educational content
- Build own marketplace platform (post-Agent Graph)
- Consider education arm for agency (with or without Gary)
- Meet with Thompson Reuters/Westlaw Head of AI this week
- Host talks at new house venue for agent builder community
Potential Collaboration:
- Hackathon partnership (December 13th)
- Education/content creation (Gary's strength)
- Agency building (Rostam architects, Gary handles BD/marketing)
- Case study documentation (anonymized, high-level)
- Community events for agent builders
Strategic Insights:
- Focus on big companies, not startups (more impact, better pay, VC portfolio access)
- Education is both revenue stream AND recruiting tool
- Real experience with billion-dollar companies is key differentiator
- Network and relationships matter more than technical skills alone
- Now is the time for AI consulting, but agents will commoditize - need to stay ahead
Technical Learnings
RAG Optimization:
- Redis caching for common questions (95% similarity threshold)
- Top K, similarity ratio, reclassification, breadth vs depth are key levers
- Metadata filtering and partitioning for access control
- Longitudinal data is unsolved problem in RAG
System Design:
- Multi-agent architecture with routing > general agents
- Memory management (short-term vs long-term)
- Thread summarization to reduce tokens
- Self-improving systems with feedback loops
- Guardrail testing and improvement cycles
Tool Ecosystem:
- Vercel AI SDK for modularity and provider-agnostic setup
- LangChain/LangSmith for enterprise systems
- LangGraph for complex node-based workflows
- RAGI for RAG pipelines
- MCPs for API integrations