Original article excerpt
Server-side extracted preview paragraphs from the original source.
This post shows how to build a governed, serverless data mesh on AWS that provides the secure, scalable data foundation production agentic AI requires.
When a customer service agent autonomously queries order databases, retrieves return policies, and synthesizes answers, it needs governed access to multiple data sources across your organization. Building agentic AI applications on a modern data mesh requires fine-grained access control enforced at every layer of the data interaction chain. AI agents that autonomously discover database schemas, construct SQL queries, and synthesize data from multiple sources expose governance gaps that the single-checkpoint model built for Retrieval Augmented Generation (RAG) can’t address. Organizations need controls from tool discovery through query execution to response synthesis.
In an earlier post, Build secure RAG applications with AWS serverless data lakes, we showed how to enforce fine-grained access control (FGAC) over RAG by filtering vector search results using metadata such as business domain and security classification. That approach worked because RAG’s data interaction was simple: retrieve chunks from a pre-built vector index, filter by metadata, and present results.
This post shows how to build a governed, serverless data mesh on AWS that provides the secure, scalable data foundation production agentic AI requires. The architecture extends the original with three key changes:
The following diagram illustrates the end-to-end flow from customer request through governed data access and back. Each layer enforces its own authorization controls, so no single point of failure can expose unauthorized data. The architecture diagram shows four layers: Agent Layer with AgentCore Runtime and LangGraph agent, Gateway Layer with request and response interceptors, Tools Layer with four Lambda-backed MCP tools (get_user_tables, get_schema, run_query, kb_search), and Governed Data Mesh with S3 Tables, Athena, Lake Formation, and S3 Vectors. The arrows show data flow from customer through agent to governed data sources.
The RAG architecture enforced governance at a single checkpoint: metadata-filtered vector retrieval. That approach served RAG workloads well. Agentic patterns introduce additional steps, creating a multi-step chain where each step requires its own authorization decision. In RAG, the system queries one pre-built vector index with metadata filters at retrieval time. In agentic AI, the system discovers which tables exist, understands schemas, constructs SQL, retrieves from vector stores, and synthesizes results.
A metadata filter at a single retrieval boundary cannot govern this five-step chain. Vector databases synchronize permissions periodically, meaning revocations aren’t immediately reflected. This is an unacceptable gap when an agent is autonomously acting on data. Complex identity permissions such as role hierarchies, attribute-based access, and row-level filters can’t be expressed as straightforward metadata key-value pairs on vector chunks.
