Story

Opening the briefing

Loading the article brief, supporting context, and related editorial blocks.

Embed the world: Multimodal AI for searchable aerial imagery at scale | AI BriefWire

AI BriefWire / Briefing

AWS Machine Learning BlogInfrastructureCore AITopicHeat 65Thread

Embed the world: Multimodal AI for searchable aerial imagery at scale

AWS developed a multimodal AI system for searchable aerial imagery using Amazon Bedrock and OpenSearch Serverless. They evaluated embedding models and fusion strategies, finding Amazon Nova Multimodal Embeddings achieved the best F1 scores. This technology powers Vexcel Intelligence, enabling efficient geospatial semantic search at scale.

RecentCore AIHigh-signal source

Signal trust

High-signal sourceSingle sourceEarly signal

Market reactionAMZN → 0.00% by next close

Before $233.09After $233.10

PublishedMonday, June 22, 2026 at 6:32 PMJun 22, 06:32 PM

Freshness1d live

Story ID#4457

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

In this post, we walk through the problem space, our architecture on Amazon Bedrock and Amazon OpenSearch Serverless, the evaluation methodology we built on OpenStreetMap ground truth, four experiments that compared embedding models, fusion strategies, captioning, and search methods, and the practical guidance you can apply when building a similar system. You’ll learn which design choices move the needle for geospatial semantic search, including why Amazon Nova Multimodal Embeddings delivered the highest F1 scores across both benchmark queries in our evaluation. The work described here evolved into Vexcel Intelligence, a searchable imagery product.

Turning a library of aerial imagery into a natural-language-searchable knowledge base is a problem that touches every industry that relies on geospatial data — insurance, real estate, government, infrastructure, and agriculture. The traditional path requires either manual tile-by-tile inspection or training a bespoke computer vision model for each new question. Multimodal embeddings, large language model (LLM) captioning, and vector search on AWS offer a faster alternative: index once, then query using natural language.