Development
Discovering 10 game-changing insights for agentic RAG success
If you’ve been experimenting with large language models (LLMs) recently, chances are you’ve heard of Retrieval-Augmented Generation (RAG) and agentic systems. For the past two years, we've worked extensively on building and refining these systems across various projects, collecting valuable insights along the way! But first, let’s do a quick overview of agentic RAG.
What is Agentic RAG?
Agentic RAG enhances AI outputs by combining information retrieval with decision-making, allowing it to tackle complex queries with a proactive, context-rich approach. Beyond conversational upgrades, agentic RAG is a powerhouse for report generation, research, and data exploration. It autonomously gathers, structures, and synthesizes relevant information, verifying facts and cross-referencing sources to reduce hallucinations (the term for AI-generated but inaccurate information). This makes it a reliable partner for delivering insightful, accurate, and actionable information across various tasks.
For instance, we've leveraged agentic RAG to analyze engineering documents, identifying critical material specifications and tolerance levels, saving engineers hours of manual review. We also empowered non-analysts to query real estate data using natural language, dynamically generating reports that answer complex questions without requiring technical expertise.
In July 2024, I presented on retrieval-augmented generation (RAG) at the WeAreDevelopers World Conference in Berlin and later at Mila, the Quebec AI Institute. Over the past year, numerous discussions with clients have deepened my understanding of the complexities and possibilities these systems offer. With all of this in mind, I compiled ten essential insights—takeaways for anyone looking to improve their approach to RAG and agentic AI.
1. Data is king
In generative AI (GenAI) workflows, data quality is everything. Companies that maintain clean, well-formatted data and thorough documentation success with AI faster. Clean data allows models to reach their full potential without getting lost in messy, ambiguous inputs. Additionally, well-documented processes give a tremendous amount of guidance to these agents, and fostering a good documentation culture is incredibly rewarding. For instance, we built an agentic system for business intelligence (BI) data that navigated “gold tier” (high quality) tables effectively, thanks to detailed documentation. If your data is a mess, no amount of cutting-edge tech will make it sing!
2. Naive RAG is never enough
Our clients often ask why a simple semantic search isn’t sufficient. The answer lies in query complexity. Many queries span multiple sources or documents, and some require a sequence of reasoning steps, where one answer builds on the previous one. Imagine asking a series of questions in a naive way without considering the answers that came before. Think of it like solving a puzzle with only half the clues; simply retrieving data to augment a model response isn't enough. Agentic RAG uses “cognitive architectures” to bind all the moving parts (it’s like giving your RAG process a brain). In agentic RAG, this is also known as planning, and there are many ways to approach this issue with common approaches like ReAct (Reason+Act) or the OODA Loop which consists of four main stages:
- Observe (gather information)
- Orient (analyze the situation)
- Decide (select the best action)
- Act (execute and monitor the outcome)
This iterative process helps overcome the limitations of one-pass, input-process-output data flows typical of many current LLMs and AI systems. LangGraph, LLamaIndex, or custom orchestration can help fill this gap, providing the cohesion necessary to support more nuanced, agentic behaviours effectively.
3. RAG pipelines vary wildly
We've worked on numerous projects with organizations, each with unique needs for their RAG pipeline. This process always starts with an in-depth interview with the domain expert—the person who best understands how to explore their data effectively. For some clients, the latest data is essential; for others, proximity-based retrieval or keyword-specific pulls are a priority. RAG, as an umbrella concept, offers flexibility in both what you retrieve and how you retrieve it, depending on business goals. Sometimes GraphRAG works best for fact-heavy data, while other times it’s about gathering information from images or diagrams. The key lies in choosing the retrieval strategy that aligns best with each specific use case.
4. The real work is in scaling and sustaining agentic systems
Building a working agentic system is challenging, but keeping it performing well over time is even harder. Content ingestion and indexing take up about 40% of the work, with agent development accounting for 20%, and evaluation and monitoring filling the remaining 40%. Ensuring your pipeline consistently delivers valuable results demands tools and constant oversight—it's like finding a needle in a haystack that’s constantly growing. Evaluation itself is a creative process; other agentic systems can be used to monitor responses, and numerous techniques are available for this. Tools like LangSmith, LangFuse, Helicone (for observability), and Ragas (for evaluation) are fantastic for efficient setup, especially when iterating and experimenting.
5. Micro-agent orchestration leads to better performance
We've learned that multiple micro-agents, each handling a specific task, are more effective than a single, monolithic agent with an extensive toolkit. Micro-agents operate like distributed systems: smaller, domain-specific agents are easier to refine, manage, and combine, enabling adaptable and powerful orchestration. Frameworks like Autogen, CrewAI, and OpenAI’s experimental Swarm framework support agent orchestration, but in many cases, DIY solutions are the best fit. See point 9 for more details.
6. Don’t overdo prompt engineering
Prompt engineering only gets you so far. Instead of accounting for every edge case and spending endless hours fine-tuning prompts, focus on building architectures that handle uncertainty programmatically. Chain-of-thought prompting and few-shot examples remain valuable, but over-relying on prompt tweaks is unsustainable long-term. It’s often more effective to use features beyond prompting, like function calling and structured generation frameworks such as Outlines and Guidance. Adding self-reflection and correction procedures in your agentic flow often yields better results than aiming for the perfect zero-shot response. See Corrective RAG (CRAG) as an example of reflection.
7. Balancing cost, speed, and accuracy
When building agents, every decision will come down to balancing cost, speed, and accuracy. It's similar to the project management triangle of quality, cost, and speed, but applied to the agentic pipeline. Adding more steps to your flow may enhance accuracy, but it also raises costs. Alternatively, a faster, less expensive pipeline with basic models often sacrifices accuracy. These trade-offs should be carefully evaluated during proof-of-concept phases. A highly accurate agentic workflow is easy to showcase with multiple steps and state-of-the-art (SOTA) models, but planning with an eye on traffic, usage, and cost per query will prevent costly missteps.
8. Self-consistency is an underrated metric
Sometimes, the simplest solutions work best: to achieve consistent results, try sampling multiple outputs and checking if they converge. Consistency across attempts indicates reliability; although it requires more compute, the accuracy gains can be exponential—turning a 90% confidence response into one that approaches 99%.
9. DIY orchestration over fancy frameworks
It’s tempting to jump into using orchestration frameworks like LlamaIndex or LangGraph, but these frameworks evolve rapidly, making it challenging to keep up with new integrations and features. For proof of concept, they can be useful, but beyond that, building your own flows is often simpler and more efficient. If you're comfortable with coding, orchestrating agentic flows yourself usually requires fewer lines and offers more control. Frameworks have their place, but don’t underestimate the power of simplicity in your architecture.
10. The need for agentic systems is real
Agentic systems are the glue between human-intensive processes and efficient automation. Imagine an agent tailored to your specific needs and armed with your internal data—it’s like a secret weapon for efficiency. The return on investment is substantial, especially when it reduces repetitive labour and enables focus on creative, high-value work. Clients often avoid AI tools like Copilot or Salesforce agents because they lack integration control, requiring a custom solution—a domain expert agent capable of navigating their unique landscape. Recently, we’ve seen a wave of interest in AI for business process transformation. It’s likely that in the future, every business will have a public-facing agent, contributing to an interconnected “agentic web” where intelligent agents work seamlessly across domains.
Ready to explore agentic solutions together?
We've been developing agent-powered applications for a while now and have seen the remarkable impact they can have on productivity and growth. If you're interested in building a solution that bridges AI with your unique workflows, let's connect and see how we can bring it to life.
Did this article start to give you some ideas? We’d love to work with you! Get in touch and let’s discover what we can do together.