AWS S3 Inventory Overview
What is S3 Inventory?
S3 Inventory is a feature that provides a list of objects and their associated metadata within an S3 bucket. This service is more efficient than using the S3 List API operation, especially when dealing with large numbers of objects.
Amazon S3 Inventory is a feature that provides a regularly updated list of all the objects in a specific S3 bucket or a subset of a bucket defined by a prefix. This inventory list includes valuable metadata about each object, making it a great tool for various use cases.
Use Cases for S3 Inventory
- Auditing the replication and encryption status of objects
- Identifying unencrypted objects
- Counting the number of objects in a bucket
- Calculating the total storage size of all previous object versions
Features
- Output formats: CSV, ORC, and Apache Parquet
- Inventory schedule: Can be generated daily or weekly
- Data analysis: Can be queried using tools like Amazon Athena, Redshift, Presto, Hive, and Spark
- S3 Select: Allows generating filtered reports
- Compliance: Meets business, compliance, and regulatory needs
Inventory Report Structure
- Manifest files (
manifest.checksum and manifest.json) contain metadata about the inventory data.
- The manifest.json includes:
- Source and destination buckets
- Manifest version and file format
- File schema (column names)
- List of files included in the inventory
Important Notes
- The destination bucket for inventory reports must be in the same AWS region as the source bucket.