Rostam's Detailed Process for Constructing Agents

November 25, 2025

Core Philosophy

"An agent is a human being. What would you task a human being with doing, and how would you instruct that human being to do that task with the given resources?"

Key Insight: Anyone can build an agent. The difficulty isn't in building one—it's in the finesse and expertise of how you architect the system.

The Four Fundamental Components

Every production agent has four core components:

System Prompt - Defines who the agent is, its role, and guardrails
Tools - What the agent has access to (RAG, databases, APIs, MCPs)
RAG (Retrieval Augmented Generation) - Knowledge base access through vector databases
Evaluations & Observability - Tracking performance, costs, latency, and accuracy

Step-by-Step Construction Process

For a support TICKETING agent

Phase 1: System Prompt Engineering

Purpose: Define the agent's identity, capabilities, and boundaries.

Key Elements:

User Persona Definition
- Who the agent is (e.g., "You are an agent of X, Y, Z company")
- What role it should play
- What services it covers (technical docs, product docs, industry knowledge)
Tool Usage Instructions
- When to use RAG vs. when NOT to use it
- Confidence intervals: "If you don't have high confidence that you have the knowledge within your general knowledge, please utilize the RAG tool"
- When to escalate to human support
- When to create tickets vs. answer directly
Guardrails
- Industry-specific restrictions (e.g., medical companies cannot provide medical advice via OpenAI API - it's illegal)
- Stay relevant to company's vertical
- Prevent hallucination and fake data generation
- Test suites with 50+ creative ways to break guardrails
- Self-improving guardrail system that runs daily

Example Guardrail Instruction:

"You should not use RAG tool when you have the general knowledge to answer the question. You should not use a tool when they ask a question that's not relevant to our vertical. You should use the tool if you don't have a high confidence interval of having the necessary data to answer in a way without creating fake data."

Phase 2: Tool Definition & Integration

Available Tool Categories:

RAG (Knowledge Base)
- Internal documentation
- Historical Q&A pairs
- Industry knowledge
Database Access
- Structured queries (Postgres, etc.)
- Patient records, user data
- Real-time information retrieval
External APIs
- HubSpot (ticket creation)
- Linear (development tickets)
- MCPs (Model Context Protocol) for GitHub, Neon, etc.
Live Chat / Human Escalation
- Transfer to human support
- Immediate response for urgent issues

Tool Selection Criteria:

What does the agent need to accomplish?
What systems does the company already use?
What's the escalation path?
How do tools feed back into self-improvement?

Phase 3: RAG Pipeline Construction

Step 1: Data Identification

Where does the data live? (HubSpot, databases, documentation)
What format is it in?
What's the update frequency?

Step 2: Knowledge Base Building

Extract all relevant documentation
Process historical Q&A pairs
Create vector embeddings
Store in vector database (pgVector, Pinecone, Quadrant, or Ragie)

Step 3: RAG Optimization Levers

Top K Value:

How many chunks to retrieve
Trade-off: More context = higher latency but potentially higher accuracy
Can use "plus or minus" approach (grab 5 from top, 5 from bottom to get full paragraphs)

Similarity Ratio:

How contextually similar results must be
Higher ratio = fewer results, less latency, but might miss relevant context
Lower ratio = more results, more latency, potentially less accurate

Reclassification:

Reorganize results by contextual relevance
Pull most relevant to top of results
Can run multiple parallel API calls and find overlap

Breadth vs. Depth:

Search across documents vs. deep within documents
Trade-off: Depth increases latency, reduces results, but can increase accuracy

Metadata Filtering:

Filter by patient data, internal docs, customer-facing docs
Use LLM to determine which metadata filters match user query
Quick operation using small models (4o-mini, nano models)

Partitioning:

Separate knowledge bases by API key
Customer-facing vs. internal documentation
Patient records (each patient only sees their own data)

Step 4: Query Alignment

Before running RAG, align user's question to dataset
Generate 10 different ways to ask the question
Run all 10 through RAG to see if answer exists
If answer exists but wasn't retrieved = RAG pipeline issue
If answer doesn't exist = knowledge base issue

Advanced Optimization: Redis Caching Strategy

For common questions, implement a Redis cache:

Morning Cron Job:
- Generate top 10 Q&A pairs per category (technical, product, etc.)
- Use LLM to create questions based on documentation ambiguity
- Store in Redis with vector similarities
Query Time:
- Check Redis first (95% similarity threshold)
- If match found, return immediately (bypasses RAG pipeline)
- Tally which questions get asked
Maintenance:
- Remove questions that never get asked
- Generate new questions based on gaps
- Provides insights into why people ask certain questions

Phase 4: Evaluations & Observability

Using LangSmith (Recommended):

Tracing Capabilities:

Time to first token (critical industry measurement)
Reasoning chain / thinking mode
Tools used
Costs per step
Total latency
Total cost per API call

Annotations:

Thumbs up/down from users
Human-in-the-loop labeling: Why was response good/bad?
Feed bad responses back into system prompt improvement

Self-Improving System:

Collect bad response with annotation (why it was bad)
Take system prompt, tools, question, answer, expected response
Feed into LLM: "How do you improve the system prompt to account for this?"
Generalize improvements (don't overfit to specific question)
Update system prompt
Test and iterate

Test Suites:

Concurrent user testing (can it handle 10,000 concurrent users?)
Cost analysis (how much for 10,000 active users asking 3 questions per second?)
Guardrail testing (50+ creative ways to break through)
Latency testing under load

Complete Example: Ticketing Agent Architecture

Business Goal

Reduce the number of tickets in a support system (using HubSpot as example) while understanding why tickets are generated in the first place.

Architecture Overview

User Question → Agent → [RAG Check] → [Answer or Escalate]
                    ↓
            [Self-Improvement Cycle]
                    ↓
        [Completed Ticket] → [Extract Insights] → [Update Knowledge Base]

Detailed Implementation

Step 1: Identify All Data

Data Sources:

All tickets ever opened in HubSpot
All completed tickets with answers
User feedback (satisfactory or not)
Question and answer pairs

Purpose:

Build self-improving system
Understand why tickets are generated
Create knowledge base from historical data

Step 2: Knowledge Base Building (RAG Pipeline)

Process:

Extract Completed Tickets:
- Get all tickets with questions, answers, and feedback
- Filter for satisfactory responses only
AI Extraction:
- Run tickets through AI to extract:
  - Core question the user asked
  - Core idea/context
  - Satisfactory answer
- Generate 10 different ways to ask the same question
RAG Check:
- Run all 10 question variations through RAG
- Determine if answer exists in knowledge base
- If exists but wasn't retrieved = RAG pipeline issue
- If doesn't exist = knowledge base gap

Decision Tree:

Ticket Completed
↓
Extract Core Ideas (AI)
↓
Generate 10 Question Variations
↓
Run Through RAG
↓
┌─────────────────┐
│ Answer Exists?  │
└─────────────────┘
   │           │
  Yes         No
   │           │
   │      ┌──────────────┐
   │      │ RAG Issue?   │
   │      └──────────────┘
   │           │
   │      ┌──────────────┐
   │      │ Knowledge    │
   │      │ Base Issue?  │
   │      └──────────────┘
   │           │
   │           ↓
   │      Create Linear Ticket
   │      "Add to Knowledge Base"
   │      - Question
   │      - Answer
   │      - Suggested location in docs
   │
   ↓
User Query Problem
(Question not aligned to dataset)
↓
Gather Failed Queries
↓
Improve System Prompt
(Align user queries to dataset)

Step 3: Tool Definition

Tools Available:

RAG Tool:
- Access to knowledge base
- Use when: Don't have high confidence in general knowledge
- Don't use when: Have general knowledge OR question not relevant to vertical
HubSpot Integration:
- Create tickets for users
- Use when: Cannot answer question after RAG check
- Ask user: "Would you like to be transferred to a human or open a ticket?"
Live Chat / Human Transfer:
- Immediate human support
- Use when: User requests or urgent issue
- Feeds back into knowledge base when resolved
Linear Integration (Background Agent):
- Create tickets to add content to knowledge base
- Triggered when: New question answered by human
- Includes: Question, answer, suggested location in docs

Step 4: System Prompt for Ticketing Agent

User Persona:

"You are an agent of [Company Name]. Users will be coming to you to ask questions about our services. Our services include documents for technical documents, product documents and general industry knowledge documents. Your role is to answer these questions, given the context."

Tool Usage Instructions:

"If you don't have high confidence that you have the knowledge within your general knowledge, please utilize the RAG tool to gather more context so that you can answer the question better without hallucinating, without making things up, without generating fake data."

"You should not use RAG tool when you have the general knowledge to answer the question. You should not use a tool when they ask a question that's not relevant to our vertical."

"You should use HubSpot if you cannot answer a question. Do not run RAG, but ask the user if you would like to be transferred to a human or would you like for the agent to go ahead and open up a ticket for them?"

Guardrails:

Stay relevant to company's vertical
Do not provide information outside of documentation
Escalate appropriately
Never hallucinate or make up information

Step 5: Self-Improvement Cycle

When Ticket is Completed:

Extract Information:
- Question user asked
- Answer that was satisfactory
- Context around the question
Generate Question Variations:
- Create 10 different ways to ask the question
- Run through RAG to see if answer exists
Determine Issue Type:
- Knowledge Base Issue: Answer doesn't exist → Create Linear ticket to add to docs
- RAG Pipeline Issue: Answer exists but wasn't retrieved → Optimize RAG parameters
- User Query Issue: Question not aligned to dataset → Improve system prompt
Feed Back into System:
- New content added to knowledge base
- RAG pipeline optimized
- System prompt improved
- Agent gets better over time

Circular Flow:

User Question
    ↓
Agent Tries to Answer (RAG)
    ↓
Cannot Answer → Create Ticket
    ↓
Human Answers Ticket
    ↓
Ticket Completed
    ↓
Extract & Process
    ↓
Update Knowledge Base
    ↓
Future Questions Answered Automatically

Step 6: Advanced Optimizations

Redis Caching for Common Questions:

Morning Generation:
- LLM generates top 10 Q&A pairs per category
- Based on documentation ambiguity
- Stored in Redis with vector embeddings
Query Time:
- Check Redis first (95% similarity)
- If match: Return immediately (bypass RAG)
- Tally usage
Insights:
- Track which questions are asked most
- Identify documentation gaps
- Remove unused questions, generate new ones

Benefits:

Faster response times for common questions
Reduced RAG pipeline load
Insights into user behavior
Data-driven documentation improvements

Step 7: Evaluations & Observability

Metrics to Track:

Performance:
- Time to first token
- Total response latency
- Accuracy rate
- Ticket reduction percentage
Costs:
- Cost per API call
- Cost per step (RAG, LLM calls, tool usage)
- Total cost for 10,000 concurrent users
Quality:
- User satisfaction (thumbs up/down)
- Human annotations (why good/bad)
- Guardrail effectiveness
- Hallucination rate
System Health:
- RAG retrieval accuracy
- Knowledge base coverage
- Tool usage patterns
- Error rates

Improvement Process:

Collect bad response with annotation
Analyze: System prompt issue? RAG issue? Tool issue?
Feed into improvement cycle
Test and validate
Deploy updated system

Technical Stack Recommendations

Preferred Stack (Rostam's Choice)

Vercel AI SDK:

Modular and provider-agnostic
Built-in fail-safe (switch providers if one goes down)
Supports multiple RAG databases
Built-in retry logic and step limits
Easy deployment (Vercel, GCP, AWS, Docker)

RAGI (Raggy):

Out-of-the-box automation OR fine-grained control
Reclassification and indexing
Handles sentence splitting, vector storage, retrieval
Rostam's preferred RAG service

LangSmith:

Observability and tracing
Annotations and datasets
Cost tracking
Performance monitoring

Alternative Stacks

LangChain + LangSmith + LangGraph:

Best for enterprise systems already using LangChain
Node-based workflows with mermaid diagrams
Subgraphs for complex nested systems
Prompt management and A/B testing

Mastra:

TypeScript-focused agentic building platform
Up-and-coming option

Key Principles & Best Practices

1. Never Build General Agents

"You never want to build general agents. They are the shittiest piece of garbage out there."

Instead: Build specialized agents with single responsibilities. Use multi-agent architecture with routing.

2. Always Think About Self-Improvement

Every component should feed back into making the system better:

Completed tickets → Knowledge base updates
Bad responses → System prompt improvements
User queries → RAG optimization
Common questions → Redis caching

3. Consider Trade-offs at Every Step

Latency vs. Accuracy
Cost vs. Performance
Breadth vs. Depth
Automation vs. Human-in-the-loop

4. Test Everything

Guardrail testing (50+ creative ways to break)
Concurrent user testing
Cost analysis under load
Latency testing
Accuracy validation

5. Document Architecture Decisions

Why you chose certain tools
Trade-offs you made
Optimization levers you're using
How the system improves over time

6. Start Simple, Add Complexity Gradually

Begin with basic RAG + system prompt
Add tools as needed
Implement optimizations based on data
Scale based on actual usage patterns

Common Pitfalls to Avoid

Over-engineering: Don't build complex multi-agent systems when a simple agent will do
Ignoring Guardrails: Industry-specific restrictions are critical (legal, ethical, safety)
Skipping Evaluations: You can't improve what you don't measure
No Self-Improvement: Static systems become outdated quickly
Poor Query Alignment: User questions must align with your dataset structure
Ignoring Costs: Agent calls can snowball quickly (RAG → LLM → Tools → More RAG)
No Human-in-the-Loop: Fully automated systems miss nuance and context

Scaling Considerations

Horizontal Scaling

Deploy across multiple regions/data centers
Use sticky sessions to maintain context
Load balance across instances

Performance Optimization

Redis caching for common questions
Thread summarization to reduce tokens
Parallel processing where possible
Smart retry logic

Cost Management

Monitor costs per step
Set maximum step limits
Use smaller models for simple tasks
Cache aggressively

Quality Assurance

Continuous guardrail testing
Regular accuracy audits
User feedback loops
A/B testing system prompts

Conclusion

Building production-grade agents requires:

Clear architecture - System prompt, tools, RAG, evaluations
Self-improvement cycles - Every component feeds back
Careful optimization - Trade-offs at every step
Continuous testing - Guardrails, performance, accuracy
Real-world experience - Learn from actual implementations

The ticketing agent example demonstrates how to think through:

Business goals → Technical architecture
Data identification → Knowledge base building
Tool selection → Integration points
Optimization → Performance and cost
Evaluation → Continuous improvement

Remember: The finesse is in the architecture, not the basic building blocks. Anyone can build an agent—experts architect systems that improve over time.

Core Philosophy​

The Four Fundamental Components​

Step-by-Step Construction Process​

Phase 1: System Prompt Engineering​

Phase 2: Tool Definition & Integration​

Phase 3: RAG Pipeline Construction​

Phase 4: Evaluations & Observability​

Complete Example: Ticketing Agent Architecture​

Business Goal​

Architecture Overview​

Detailed Implementation​

Step 1: Identify All Data​

Step 2: Knowledge Base Building (RAG Pipeline)​

Step 3: Tool Definition​

Step 4: System Prompt for Ticketing Agent​

Step 5: Self-Improvement Cycle​

Step 6: Advanced Optimizations​

Step 7: Evaluations & Observability​

Technical Stack Recommendations​

Preferred Stack (Rostam's Choice)​

Alternative Stacks​

Key Principles & Best Practices​

1. Never Build General Agents​

2. Always Think About Self-Improvement​

3. Consider Trade-offs at Every Step​

4. Test Everything​

5. Document Architecture Decisions​

6. Start Simple, Add Complexity Gradually​

Common Pitfalls to Avoid​

Scaling Considerations​

Horizontal Scaling​

Performance Optimization​

Cost Management​

Quality Assurance​

Conclusion​