Rostam's Detailed Process for Constructing Agents
November 25, 2025
Core Philosophy
"An agent is a human being. What would you task a human being with doing, and how would you instruct that human being to do that task with the given resources?"
Key Insight: Anyone can build an agent. The difficulty isn't in building one—it's in the finesse and expertise of how you architect the system.
The Four Fundamental Components
Every production agent has four core components:
- System Prompt - Defines who the agent is, its role, and guardrails
- Tools - What the agent has access to (RAG, databases, APIs, MCPs)
- RAG (Retrieval Augmented Generation) - Knowledge base access through vector databases
- Evaluations & Observability - Tracking performance, costs, latency, and accuracy
Step-by-Step Construction Process
For a support TICKETING agent
Phase 1: System Prompt Engineering
Purpose: Define the agent's identity, capabilities, and boundaries.
Key Elements:
-
User Persona Definition
- Who the agent is (e.g., "You are an agent of X, Y, Z company")
- What role it should play
- What services it covers (technical docs, product docs, industry knowledge)
-
Tool Usage Instructions
- When to use RAG vs. when NOT to use it
- Confidence intervals: "If you don't have high confidence that you have the knowledge within your general knowledge, please utilize the RAG tool"
- When to escalate to human support
- When to create tickets vs. answer directly
-
Guardrails
- Industry-specific restrictions (e.g., medical companies cannot provide medical advice via OpenAI API - it's illegal)
- Stay relevant to company's vertical
- Prevent hallucination and fake data generation
- Test suites with 50+ creative ways to break guardrails
- Self-improving guardrail system that runs daily
Example Guardrail Instruction:
"You should not use RAG tool when you have the general knowledge to answer the question. You should not use a tool when they ask a question that's not relevant to our vertical. You should use the tool if you don't have a high confidence interval of having the necessary data to answer in a way without creating fake data."
Phase 2: Tool Definition & Integration
Available Tool Categories:
-
RAG (Knowledge Base)
- Internal documentation
- Historical Q&A pairs
- Industry knowledge
-
Database Access
- Structured queries (Postgres, etc.)
- Patient records, user data
- Real-time information retrieval
-
External APIs
- HubSpot (ticket creation)
- Linear (development tickets)
- MCPs (Model Context Protocol) for GitHub, Neon, etc.
-
Live Chat / Human Escalation
- Transfer to human support
- Immediate response for urgent issues
Tool Selection Criteria:
- What does the agent need to accomplish?
- What systems does the company already use?
- What's the escalation path?
- How do tools feed back into self-improvement?
Phase 3: RAG Pipeline Construction
Step 1: Data Identification
- Where does the data live? (HubSpot, databases, documentation)
- What format is it in?
- What's the update frequency?
Step 2: Knowledge Base Building
- Extract all relevant documentation
- Process historical Q&A pairs
- Create vector embeddings
- Store in vector database (pgVector, Pinecone, Quadrant, or Ragie)
Step 3: RAG Optimization Levers
Top K Value:
- How many chunks to retrieve
- Trade-off: More context = higher latency but potentially higher accuracy
- Can use "plus or minus" approach (grab 5 from top, 5 from bottom to get full paragraphs)
Similarity Ratio:
- How contextually similar results must be
- Higher ratio = fewer results, less latency, but might miss relevant context
- Lower ratio = more results, more latency, potentially less accurate
Reclassification:
- Reorganize results by contextual relevance
- Pull most relevant to top of results
- Can run multiple parallel API calls and find overlap
Breadth vs. Depth:
- Search across documents vs. deep within documents
- Trade-off: Depth increases latency, reduces results, but can increase accuracy
Metadata Filtering:
- Filter by patient data, internal docs, customer-facing docs
- Use LLM to determine which metadata filters match user query
- Quick operation using small models (4o-mini, nano models)
Partitioning:
- Separate knowledge bases by API key
- Customer-facing vs. internal documentation
- Patient records (each patient only sees their own data)
Step 4: Query Alignment
- Before running RAG, align user's question to dataset
- Generate 10 different ways to ask the question
- Run all 10 through RAG to see if answer exists
- If answer exists but wasn't retrieved = RAG pipeline issue
- If answer doesn't exist = knowledge base issue
Advanced Optimization: Redis Caching Strategy
For common questions, implement a Redis cache:
-
Morning Cron Job:
- Generate top 10 Q&A pairs per category (technical, product, etc.)
- Use LLM to create questions based on documentation ambiguity
- Store in Redis with vector similarities
-
Query Time:
- Check Redis first (95% similarity threshold)
- If match found, return immediately (bypasses RAG pipeline)
- Tally which questions get asked
-
Maintenance:
- Remove questions that never get asked
- Generate new questions based on gaps
- Provides insights into why people ask certain questions
Phase 4: Evaluations & Observability
Using LangSmith (Recommended):
Tracing Capabilities:
- Time to first token (critical industry measurement)
- Reasoning chain / thinking mode
- Tools used
- Costs per step
- Total latency
- Total cost per API call
Annotations:
- Thumbs up/down from users
- Human-in-the-loop labeling: Why was response good/bad?
- Feed bad responses back into system prompt improvement
Self-Improving System:
- Collect bad response with annotation (why it was bad)
- Take system prompt, tools, question, answer, expected response
- Feed into LLM: "How do you improve the system prompt to account for this?"
- Generalize improvements (don't overfit to specific question)
- Update system prompt
- Test and iterate
Test Suites:
- Concurrent user testing (can it handle 10,000 concurrent users?)
- Cost analysis (how much for 10,000 active users asking 3 questions per second?)
- Guardrail testing (50+ creative ways to break through)
- Latency testing under load
Complete Example: Ticketing Agent Architecture
Business Goal
Reduce the number of tickets in a support system (using HubSpot as example) while understanding why tickets are generated in the first place.
Architecture Overview
User Question → Agent → [RAG Check] → [Answer or Escalate]
↓
[Self-Improvement Cycle]
↓
[Completed Ticket] → [Extract Insights] → [Update Knowledge Base]
Detailed Implementation
Step 1: Identify All Data
Data Sources:
- All tickets ever opened in HubSpot
- All completed tickets with answers
- User feedback (satisfactory or not)
- Question and answer pairs
Purpose:
- Build self-improving system
- Understand why tickets are generated
- Create knowledge base from historical data
Step 2: Knowledge Base Building (RAG Pipeline)
Process:
-
Extract Completed Tickets:
- Get all tickets with questions, answers, and feedback
- Filter for satisfactory responses only
-
AI Extraction:
- Run tickets through AI to extract:
- Core question the user asked
- Core idea/context
- Satisfactory answer
- Generate 10 different ways to ask the same question
- Run tickets through AI to extract:
-
RAG Check:
- Run all 10 question variations through RAG
- Determine if answer exists in knowledge base
- If exists but wasn't retrieved = RAG pipeline issue
- If doesn't exist = knowledge base gap
-
Decision Tree:
Ticket Completed
↓
Extract Core Ideas (AI)
↓
Generate 10 Question Variations
↓
Run Through RAG
↓
┌─────────────────┐
│ Answer Exists? │
└─────────────────┘
│ │
Yes No
│ │
│ ┌──────────────┐
│ │ RAG Issue? │
│ └──────────────┘
│ │
│ ┌──────────────┐
│ │ Knowledge │
│ │ Base Issue? │
│ └──────────────┘
│ │
│ ↓
│ Create Linear Ticket
│ "Add to Knowledge Base"
│ - Question
│ - Answer
│ - Suggested location in docs
│
↓
User Query Problem
(Question not aligned to dataset)
↓
Gather Failed Queries
↓
Improve System Prompt
(Align user queries to dataset)
Step 3: Tool Definition
Tools Available:
-
RAG Tool:
- Access to knowledge base
- Use when: Don't have high confidence in general knowledge
- Don't use when: Have general knowledge OR question not relevant to vertical
-
HubSpot Integration:
- Create tickets for users
- Use when: Cannot answer question after RAG check
- Ask user: "Would you like to be transferred to a human or open a ticket?"
-
Live Chat / Human Transfer:
- Immediate human support
- Use when: User requests or urgent issue
- Feeds back into knowledge base when resolved
-
Linear Integration (Background Agent):
- Create tickets to add content to knowledge base
- Triggered when: New question answered by human
- Includes: Question, answer, suggested location in docs
Step 4: System Prompt for Ticketing Agent
User Persona:
"You are an agent of [Company Name]. Users will be coming to you to ask questions about our services. Our services include documents for technical documents, product documents and general industry knowledge documents. Your role is to answer these questions, given the context."
Tool Usage Instructions:
"If you don't have high confidence that you have the knowledge within your general knowledge, please utilize the RAG tool to gather more context so that you can answer the question better without hallucinating, without making things up, without generating fake data."
"You should not use RAG tool when you have the general knowledge to answer the question. You should not use a tool when they ask a question that's not relevant to our vertical."
"You should use HubSpot if you cannot answer a question. Do not run RAG, but ask the user if you would like to be transferred to a human or would you like for the agent to go ahead and open up a ticket for them?"
Guardrails:
- Stay relevant to company's vertical
- Do not provide information outside of documentation
- Escalate appropriately
- Never hallucinate or make up information
Step 5: Self-Improvement Cycle
When Ticket is Completed:
-
Extract Information:
- Question user asked
- Answer that was satisfactory
- Context around the question
-
Generate Question Variations:
- Create 10 different ways to ask the question
- Run through RAG to see if answer exists
-
Determine Issue Type:
- Knowledge Base Issue: Answer doesn't exist → Create Linear ticket to add to docs
- RAG Pipeline Issue: Answer exists but wasn't retrieved → Optimize RAG parameters
- User Query Issue: Question not aligned to dataset → Improve system prompt
-
Feed Back into System:
- New content added to knowledge base
- RAG pipeline optimized
- System prompt improved
- Agent gets better over time
Circular Flow:
User Question
↓
Agent Tries to Answer (RAG)
↓
Cannot Answer → Create Ticket
↓
Human Answers Ticket
↓
Ticket Completed
↓
Extract & Process
↓
Update Knowledge Base
↓
Future Questions Answered Automatically
Step 6: Advanced Optimizations
Redis Caching for Common Questions:
-
Morning Generation:
- LLM generates top 10 Q&A pairs per category
- Based on documentation ambiguity
- Stored in Redis with vector embeddings
-
Query Time:
- Check Redis first (95% similarity)
- If match: Return immediately (bypass RAG)
- Tally usage
-
Insights:
- Track which questions are asked most
- Identify documentation gaps
- Remove unused questions, generate new ones
Benefits:
- Faster response times for common questions
- Reduced RAG pipeline load
- Insights into user behavior
- Data-driven documentation improvements
Step 7: Evaluations & Observability
Metrics to Track:
-
Performance:
- Time to first token
- Total response latency
- Accuracy rate
- Ticket reduction percentage
-
Costs:
- Cost per API call
- Cost per step (RAG, LLM calls, tool usage)
- Total cost for 10,000 concurrent users
-
Quality:
- User satisfaction (thumbs up/down)
- Human annotations (why good/bad)
- Guardrail effectiveness
- Hallucination rate
-
System Health:
- RAG retrieval accuracy
- Knowledge base coverage
- Tool usage patterns
- Error rates
Improvement Process:
- Collect bad response with annotation
- Analyze: System prompt issue? RAG issue? Tool issue?
- Feed into improvement cycle
- Test and validate
- Deploy updated system
Technical Stack Recommendations
Preferred Stack (Rostam's Choice)
Vercel AI SDK:
- Modular and provider-agnostic
- Built-in fail-safe (switch providers if one goes down)
- Supports multiple RAG databases
- Built-in retry logic and step limits
- Easy deployment (Vercel, GCP, AWS, Docker)
RAGI (Raggy):
- Out-of-the-box automation OR fine-grained control
- Reclassification and indexing
- Handles sentence splitting, vector storage, retrieval
- Rostam's preferred RAG service
LangSmith:
- Observability and tracing
- Annotations and datasets
- Cost tracking
- Performance monitoring
Alternative Stacks
LangChain + LangSmith + LangGraph:
- Best for enterprise systems already using LangChain
- Node-based workflows with mermaid diagrams
- Subgraphs for complex nested systems
- Prompt management and A/B testing
Mastra:
- TypeScript-focused agentic building platform
- Up-and-coming option
Key Principles & Best Practices
1. Never Build General Agents
"You never want to build general agents. They are the shittiest piece of garbage out there."
Instead: Build specialized agents with single responsibilities. Use multi-agent architecture with routing.
2. Always Think About Self-Improvement
Every component should feed back into making the system better:
- Completed tickets → Knowledge base updates
- Bad responses → System prompt improvements
- User queries → RAG optimization
- Common questions → Redis caching
3. Consider Trade-offs at Every Step
- Latency vs. Accuracy
- Cost vs. Performance
- Breadth vs. Depth
- Automation vs. Human-in-the-loop
4. Test Everything
- Guardrail testing (50+ creative ways to break)
- Concurrent user testing
- Cost analysis under load
- Latency testing
- Accuracy validation
5. Document Architecture Decisions
- Why you chose certain tools
- Trade-offs you made
- Optimization levers you're using
- How the system improves over time
6. Start Simple, Add Complexity Gradually
- Begin with basic RAG + system prompt
- Add tools as needed
- Implement optimizations based on data
- Scale based on actual usage patterns
Common Pitfalls to Avoid
- Over-engineering: Don't build complex multi-agent systems when a simple agent will do
- Ignoring Guardrails: Industry-specific restrictions are critical (legal, ethical, safety)
- Skipping Evaluations: You can't improve what you don't measure
- No Self-Improvement: Static systems become outdated quickly
- Poor Query Alignment: User questions must align with your dataset structure
- Ignoring Costs: Agent calls can snowball quickly (RAG → LLM → Tools → More RAG)
- No Human-in-the-Loop: Fully automated systems miss nuance and context
Scaling Considerations
Horizontal Scaling
- Deploy across multiple regions/data centers
- Use sticky sessions to maintain context
- Load balance across instances
Performance Optimization
- Redis caching for common questions
- Thread summarization to reduce tokens
- Parallel processing where possible
- Smart retry logic
Cost Management
- Monitor costs per step
- Set maximum step limits
- Use smaller models for simple tasks
- Cache aggressively
Quality Assurance
- Continuous guardrail testing
- Regular accuracy audits
- User feedback loops
- A/B testing system prompts
Conclusion
Building production-grade agents requires:
- Clear architecture - System prompt, tools, RAG, evaluations
- Self-improvement cycles - Every component feeds back
- Careful optimization - Trade-offs at every step
- Continuous testing - Guardrails, performance, accuracy
- Real-world experience - Learn from actual implementations
The ticketing agent example demonstrates how to think through:
- Business goals → Technical architecture
- Data identification → Knowledge base building
- Tool selection → Integration points
- Optimization → Performance and cost
- Evaluation → Continuous improvement
Remember: The finesse is in the architecture, not the basic building blocks. Anyone can build an agent—experts architect systems that improve over time.