Story

Opening the briefing

Loading the article brief, supporting context, and related editorial blocks.

Amazon SageMaker AI Async Inference now supports inline request payloads | AI BriefWire

Original article excerpt

Server-side extracted preview paragraphs from the original source.

Today, we’re announcing inline payload support for Amazon SageMaker AI Async Inference. Customers can now send inference payloads directly in the request body of the InvokeEndpointAsync API, removing the need to upload input data to Amazon Simple Storage Service (Amazon S3) before each invocation.

For payloads up to 128,000 bytes, this removes an entire network round-trip, simplifies client-side code, and reduces the operational surface area of asynchronous inference workloads.

In this post, we explain the motivation behind this feature, walk through the customer experience before and after, and show you how to start using inline payloads today.

You can use Amazon SageMaker AI Async Inference to queue inference requests and process them asynchronously. It’s a good fit for workloads with large payloads, variable traffic, or tolerance for seconds-to-minutes latency. It supports automatic scaling to zero, making it cost-efficient for bursty or batch-style workloads.

The endpoint processes the request asynchronously and writes the output to a configured S3 output location, which the client polls or receives via Amazon Simple Notification Service (Amazon SNS) notification.

This two-step pattern works well for large payloads (images, audio, multi-MB documents). But for customers with small input payloads (in KB) who need longer processing times than real-time inference allows, the mandatory S3 dependency added unnecessary complexity.

Opening the briefing

Amazon SageMaker AI Async Inference now supports inline request payloads

Original article excerpt

Introducing Web Search on Amazon Bedrock AgentCore

Accelerate campaign workflow with insights from Adobe Marketing Agent for Amazon Quick

Amazon Bedrock AgentCore harness is now generally available: Go from idea to production-grade agent in minutes