Original article excerpt
Server-side extracted preview paragraphs from the original source.
In this post, we walk through how Huntington built a scalable AWS solution to detect and redact Personally Identifiable Information (PII) and Payment Card Industry (PCI) data from over 400 million documents, reducing processing time from years to just a few months while achieving 95%+ redaction accuracy.
When your document repository contains hundreds of millions of files accumulated over nearly a decade, how do you systematically find and redact sensitive customer data without taking years to complete? This was the challenge facing The Huntington National Bank (Huntington), a top 10 bank in the United States.
Since 2015, Huntington’s document management system has securely stored hundreds of millions of documents on-premises. In 2025, as part of a proactive compliance initiative, Huntington set out to process the documents in this system and redact sensitive data. These documents come in different formats, so the solution needed flexibility to handle varied file types while delivering the throughput required to process millions of documents quickly.
Original estimates indicated this effort would take years. However, by designing a scalable redaction workflow using Amazon Textract, Amazon SageMaker, AWS Step Functions, and AWS Lambda, Huntington reduced this timeline to months.
Before examining the technical implementation, let’s look at the core requirements Huntington established for this project. If you’re facing a similar large-scale document processing challenge, these requirements can serve as a starting point for your own solution design:
Huntington’s first objective was to move documents from an on-premises file share to an Amazon Simple Storage Service (Amazon S3) bucket. Moving documents is straightforward, but this effort required transferring over 400 million documents, encrypted in transit and at rest. To accomplish this, Huntington used AWS DataSync, AWS Direct Connect, Amazon S3, and AWS Key Management Service (AWS KMS).
AWS DataSync can be deployed as an agent in your on-premises data center to monitor a configured source, such as an SMB file share. While getting documents to AWS was critical for processing, AWS DataSync also supports syncing data back to on-premises, which was another key requirement for this project.
