Original article excerpt
Server-side extracted preview paragraphs from the original source.
This post demonstrates how agentic AI assistant from Amazon Quick transform data analytics into a self-service capability by using Amazon Simple Storage Service (Amazon S3) as a storage, Amazon SageMaker and AWS Glue for lakehouse, Amazon Athena for serverless SQL querying across multiple storage formats (S3 Table, Iceberg, and Parquet).
Modern enterprises face mounting challenges in extracting actionable insights from vast data lakes and lakehouses spanning petabytes of structured and unstructured data. Traditional analytics require specialized technical expertise in SQL, data modeling, and business intelligence tools, creating bottlenecks that slow decision-making across retail, financial services, healthcare, Travel & Hospitality, manufacturing and many more industries. This architecture demonstrates how agentic AI assistant from Amazon Quick transform data analytics into a self-service capability. It showcases enabling business users to query complex structured datasets and mix with unstructured data to find the valuable insights to improve their business outcomes through intuitive natural language interfaces.
To demonstrate the functionality, we built a lakehouse using the TPC-H datasets as our foundation. This integrated architecture leverages Amazon Simple Storage Service (Amazon S3) as a storage, Amazon SageMaker and AWS Glue for lakehouse, Amazon Athena for serverless SQL querying across multiple storage formats (S3 Table, Iceberg, and Parquet), and multiple features from Quick to build dashboard and conversational AI agents that provide natural language access to data insights. Through integrated knowledge bases using Amazon Quick spaces, this solution democratizes lakehouse data access for business users while preserving enterprise-grade security, governance frameworks, and the scalability required for modern data-driven decision-making across the organization.
The following diagram shows the overall design and corresponding dataflow that we implemented as part of this blog post.
Figure 1: Overall design diagram Reference following steps for the detailed end to end data flow and user interaction capabilities.
In this section, we will mimic many of the data lake features by working with external tables, which allow querying data stored in Amazon S3 without loading it into a managed storage layer. We will explore Open Table Format (OTF) tables using Apache Iceberg to consider possible ACID transactions supported tables. Amazon managed S3 Tables will be leveraged to showcase how Amazon natively supports Iceberg-compatible table management directly within S3, simplifying lakehouse architecture at scale. Throughout these exercises, we will use the industry-standard TPC-H dataset, a benchmark workload representing a realistic business data model with orders, customers, and line items to make sure our examples are both meaningful and reproducible.
We will leverage Amazon Athena for data preparation. If this is your first time using Amazon Athena, you’ll need to create an Amazon S3 bucket to store your query results. Athena uses S3 as its output location before you can run queries. Follow the official AWS getting started guide to complete this one-time setup: Getting Started with Amazon Athena. Alternately, you can use Managed query results feature.