AWS S3 Inventory Overview

What is S3 Inventory?

S3 Inventory is a feature that provides a list of objects and their associated metadata within an S3 bucket. This service is more efficient than using the S3 List API operation, especially when dealing with large numbers of objects.

Amazon S3 Inventory is a feature that provides a regularly updated list of all the objects in a specific S3 bucket or a subset of a bucket defined by a prefix. This inventory list includes valuable metadata about each object, making it a great tool for various use cases.

Use Cases for S3 Inventory

Auditing the replication and encryption status of objects
Identifying unencrypted objects
Counting the number of objects in a bucket
Calculating the total storage size of all previous object versions

Features

Output formats: CSV, ORC, and Apache Parquet
Inventory schedule: Can be generated daily or weekly
Data analysis: Can be queried using tools like Amazon Athena, Redshift, Presto, Hive, and Spark
S3 Select: Allows generating filtered reports
Compliance: Meets business, compliance, and regulatory needs

Inventory Report Structure

Manifest files (manifest.checksum and manifest.json) contain metadata about the inventory data.
The manifest.json includes:
- Source and destination buckets
- Manifest version and file format
- File schema (column names)
- List of files included in the inventory

Important Notes

The destination bucket for inventory reports must be in the same AWS region as the source bucket.