Skip to main content

Amazon S3

Amazon S3 Metadata

Accelerate data discovery with near real-time object metadata

Object metadata in Amazon S3

Every object in Amazon S3 carries metadata that describes it. There are four types of object metadata in S3. System-defined metadata is automatically captured by S3 and includes properties such as an object's creation time, size, storage class, and encryption status. System-defined metadata is always present and maintained by S3. User-defined metadata consists of custom key-value pairs you set at upload time, such as a department name or project code. User-defined metadata is immutable after upload and limited in size. Object tags are key-value labels you can add, modify, or delete at any time. Tags integrate with IAM policies, lifecycle rules, cost allocation, and S3 analytics, making them ideal for access control and operational workflows. Annotations let you attach rich, large-scale business context to any object at any time. Annotations support formats like JSON, XML, and YAML, with up to 1 GB per object. Annotations are mutable, share the same durability and consistency as the object, and are managed through their own set of S3 APIs.

Surface, store, and query all your metadata in one place

Amazon S3 Metadata brings all four types of object metadata together into a single, queryable experience. S3 Metadata automatically surfaces, stores, and queries metadata for objects in your S3 buckets, including system-defined details, user-defined metadata, object tags, and annotations, so you can find the data you need for business analytics, real-time inference applications, AI agents, and more.

S3 Metadata stores this information in fully managed, read-only Apache Iceberg tables that you can query with Amazon Athena and other Iceberg-compatible tools. S3 Metadata provides three table types: journal tables capture object-level events and annotation changes in near real-time, enabling event-driven workflows and change tracking. Live inventory tables provide a continuously updated view of all objects and their current metadata across your bucket. Annotation tables store annotations in a queryable format so you can search across all annotations at scale.

S3 Metadata automatically populates metadata for both new and existing objects, giving you a comprehensive, always-current view of your data without building or maintaining separate metadata systems. You can also use natural language to search objects by their metadata using agents in Amazon SageMaker Unified Studio, or any IDE with the S3 Tables MCP server.

Benefits

    Designed to create and manage metadata for all objects in your S3 buckets (both existing objects and new uploads) providing a comprehensive view of your data.

    Quickly find and retrieve the data you need across up to trillions of objects in S3. We update the metadata on an hourly basis so you can easily understand your latest storage landscape.

    Attach up to 1GB of mutable metadata per object using annotations. Store AI-generated summaries, technical specifications, compliance details, or any contextual information, eliminating the need for separate metadata management systems.

    Access your metadata through live inventory tables, journal tables, and annotation tables in managed S3 Tables, with built-in support for Apache Iceberg.

    Analyze metadata using familiar services like Amazon Athena, Redshift, and EMR through the S3 Tables integration with Amazon SageMaker. Query annotations using SQL or natural language through the Model Context Protocol (MCP) server. S3 Metadata is compatible with popular open source tools.

Use cases

    Track and manage AI-generated videos, images, and documents with rich annotations including their origin, creation time, the AI model used with Amazon Bedrock, confidence scores, and processing lineage—all stored directly with your objects.

    Use annotations to catalog all data with rich business context for easier discovery and utilization. Attach transcripts, scene descriptions, technical specifications, and licensing information directly to media files without separate databases.

    Improve data organization and compliance by attaching regulatory metadata, audit trails, data lineage, and compliance status directly to objects. Query across petabytes to identify data subject to specific regulations or retention policies.

    Analyze object metadata across your entire storage footprint to identify opportunities for cost savings and performance improvements.

    Quickly identify and analyze relevant datasets for business intelligence and decision-making.