Query cross-account Amazon DynamoDB tables using Amazon Athena Federated Query

by Satyanarayana Adimula | on

Amazon DynamoDB  is ideal for applications that need a flexible NoSQL database with low read and write latencies and the ability to scale storage and throughput up or down as needed without code changes or downtime. You can use DynamoDB for use cases including mobile apps, gaming, digital ad serving, live voting, audience interaction for live events, sensor networks, log ingestion, access control for web-based content, metadata storage for Amazon S3 objects, e-commerce shopping carts, and web session management.

What if you have the need to allow other Amazon Web Services accounts to query your DynamoDB table? What if other accounts need to join data on your DynamoDB table with their data stored in data sources like Amazon CloudWatch, Amazon DocumentDB, Amazon Redshift, Amazon OpenSearch, MySQL, PostgreSQL connected with Athena data source connectors , and Amazon S3?

Amazon Athena cross-account federated query enables you to run SQL queries across data stored in relational, non-relational, object, and custom data sources where data source and its connector are in different Amazon Web Services accounts from the user querying the data. There are no new charges for querying connectors in another account, but Athena’s standard rates for data scanned, Lambda usage, and other services apply.

This post will demonstrate Athena in an Amazon Web Services account accessing a DynamoDB table of another Amazon Web Services account by using the Athena cross-account federated query. It also explains deploying Amazon Athena DynamoDB connector using Amazon Web Services Serverless Application Repository and setting up Athena cross-account federation between two accounts for the Demo.

Walkthrough

The solution has the following steps to demonstrate Athena cross-account federated query:

  • Set up Athena federation – To deploy a Lambda function for the data source connector and connect it to a data source.
  • Set up Athena cross-account federation – To set up IAM permissions for Athena cross-account federation.
  • Test Athena cross-account federated query – To show a demo of how an Amazon Web Services account can share its DynamoDB table as an Athena data source with another Amazon Web Services account.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • Two Amazon Web Services Accounts
  • Amazon Web Services resources: Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon DynamoDB, Amazon Web Services Lambda

Data source connectors

A data source connector is a piece of code that can translate between your target data source and Athena. Athena uses data source connectors that run on Amazon Web Services Lambda to run federated queries. You can think of a connector as an extension of Athena’s query engine.

Connectors use Apache Arrow as the format for returning data requested in a query, which enables connectors to be implemented in languages such as C, C++, Java, Python, and Rust.

Athena uses data source connectors that run on Amazon Web Services Lambda to run federated queries. Since connectors are processed in Lambda, they can be used to access data from any data source on the cloud or on premises that is accessible from Lambda

To use a connector in your Athena queries, deploy it to your account using one of the following ways:

  • Amazon Web Services Serverless Application Repository
  • Athena and Lambda consoles

This blog uses the Amazon Web Services Serverless Application Repository to deploy the Amazon Athena DynamoDB connector .

After you deploy data source connectors, the connector is associated with a catalog that you can specify in SQL queries. You can combine SQL statements from multiple catalogs and span multiple data sources with a single query. When a query is submitted against a data source, Athena invokes the corresponding connector to identify parts of the tables that need to be read, manages parallelism, and pushes down filter predicates. Based on the user submitting the query, connectors can provide or restrict access to specific data elements.

Architecture

  • Amazon Web Services Account-A has a DynamoDB table called Music.
  • Account-A has an Athena data source connector to federate into DynamoDB.
  • Amazon Web Services Account-B has Analysts who need to query the DynamoDB table.
  • Account-A is sharing the Athena data source with Account-B by using Athena cross-account federated query.

The following figure shows Amazon Athena cross-account federation for Account-B to access DynamoDB in Account-A.

To demonstrate the Athena cross-account federation, create a sample DynamoDB table called music in Account-A. Follow the instructions at Getting started with DynamoDB to create the table Music and load thesample data.

Set up Athena federation

Preparing to create federated queries is a two-part process: deploying a Lambda function for the data source connector and connecting the Lambda function to a data source. For more details, see Enabling cross-account federated queries .

Deploy AthenaDynamoDBConnector using Amazon Web Services Serverless Application Repository

  1. Sign in as an administrator to Amazon Web Services Account-A.
  2. Open the Serverless Application Repository.
  3. In the navigation pane, choose Available applications.
  4. Select the option Show apps that create custom IAM roles or resource policies.
  5. In the search box, type the name of the connector AthenaDynamoDBConnector.

  6. Choosing a connector opens the Lambda function’s Application details page in the Amazon Web Services Lambda console.
  7. On the right side of the details page, for Application settings, fill in the required information.
    • Application name – Name of Amazon Web Services CloudFormation Stack to deploy the connector: AthenaDynamoDBConnector.
    • AthenaCatalogName – It is the catalog name to create in Athena. It is also the name of the Lambda function. Give it in lowercase: acct1dynamodb.
    • SpillBucket – Specify an existing S3 bucket (spill-bucket) in your account to receive data from any large response payloads that exceed Lambda function response size limits.
  8. Select I acknowledge that this app creates custom IAM roles and resource policies. For more information, choose the Info link.
  9. At the bottom right of the Application settings section, choose Deploy.
  10. Serverless Application Repository will create an Amazon Web Services CloudFormation stack to deploy the connector.
  11. When the deployment is complete, you will see the Lambda function in the Resources section of the Amazon Web Services CloudFormation stack. Note down the Lambda function name.

Connect Athena to the data source

  1. Go to Athena console in Account-A.
  2. Choose Data sources. Click Create Data source.
  3. In Choose data source, search for Amazon DynamoDB and select it.
  4. In Data source details, give a Data source name acct1dynamodb
  5. For Lambda function in the Connection details section, choose the name of the function acct1dynamodb from the dropdown.
  6. On the Review and create page, review the data source details, and then choose Create data source.

  7. You will see the data source acctdynamodb in the Data sources.
  8. Go to Query editor. Choose the Data Source acct1dynamodb from the dropdown.
  9. You will see all the tables in the shared data source.
  10. Run the following SQL in Athena Query editor 
    SELECT songtitle, albumtitle, cast(awards as int) as awards 
    FROM "acct1dynamodb"."default"."music" 
    WHERE artist = 'Acme Band' 
    limit 2;
  11. Verify Athena federation works.

Set up Athena cross-account federation

In Account-A: Set up IAM permissions for cross-account

  1. Sign in as an administrator to Account-A.
  2. On the S3 spill bucket (of the Lambda function), grant GetObject and ListBucket permissions to the IAM user analyst of Account-B.

Note: Replace Account-B-id with your actual Amazon Web Services cross-account id to which you want to share the DynamoDB table. Replace spill-bucket with your actual S3 bucket in Account-A.

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": ["arn:aws:iam::Account-B-id:user/analyst"]
            },
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
             ],
            "Resource": [
                "arn:aws:s3::: spill-bucket",
                "arn:aws:s3::: spill-bucket/*"
            ]
        }
     ]
 }
  1. Grant InvokeFunction on Lambda function acct1dynamodb to IAM user analyst of Account-B.

Note: Replace Account-A-id with your actual Amazon Web Services account id where you have the DynamoDB table. Replace Account-B-id with your actual Amazon Web Services cross-account id to which you want to share the DynamoDB table.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CrossAccountInvocationStatement",
      "Effect": "Allow",
      "Principal": {
        "AWS": ["arn:aws:iam::Account-B-id:user/analyst"]
      }, 
      "Action": "lambda:InvokeFunction",
      "Resource": "arn:aws:lambda:aws-region:Account-A-id:function:acct1dynamodb"
    }
  ]
}
  1. Go to the Lambda function acct1dynamodb. Choose Configuration and Permissions.

  1. Go to Resource-based policy and Add permissions.

When you save the above permissions, you can see them under Policy statements in Resource-based policy of the Lambda function.

In Account-B: Set up IAM permissions for cross-account

  1. Sign in as an administrator to Amazon Web Services Account-B.
  2. Create IAM role called AthenaCrossAccountFederated-Account-A-id for Account-A to assume. Add the following inline policy to the role.

Note: Replace Account-B-id with your actual Amazon Web Services cross-account id to which you want to share the DynamoDB table.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "athena:CreateDataCatalog",
            "Resource": "arn:aws:athena:aws-region:Account-B-id:datacatalog/*"
        }
    ]
}
  1. Grant permission to the IAM user analyst to invoke the Lambda function acct1dymanodb of Account-A

Note: Replace Account-A-id with your actual Amazon Web Services account id where you have the DynamoDB table.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "lambda:InvokeFunction",
            "Resource": "arn:aws:lambda:aws-region:Account-A-id:function:acct1dynamodb"
        }
    ]
}

Share the Athena Data source with Account-B

After the permissions are in place, you share a data connector in your account (Account-A) with another account (Account-B). Account-A retains full control and ownership of the connector. When Account-A makes configuration changes to the connector, the updated configuration applies to the shared connector in Account-B.

  1. Sign in as an administrator to Account-A.
  2. On Athena, go to Data sources, choose data source acct1dynamodb you want to share. Go to the Share option in the top right corner.

  1. For Account ID, enter the Account-B-id to share your data source with Account-B and click Share.

Test Athena cross-account federated query: Access the shared data source from Account-B

  1. Sign in as IAM user analyst to Account-B.
  2. In Athena, go to Data sources. You will see the data source acct1dynamodb.

  1. Go to Query editor. Choose the Data Source acct1dynamodb from the dropdown.

  1. You will see all the tables in the shared data source.

  1. Run the following SQL in Athena Query editor
SELECT songtitle, albumtitle, cast(awards as int) as awards 
FROM "acct1dynamodb"."default"."music" 
WHERE artist = 'Acme Band' 
limit 2;

  1. Athena cross-account federated has worked! This validates that user analyst in Account-B can see the data of the DynamoDB table of Account-A.

Clean up

To avoid incurring future charges, delete the following resources that were provisioned for this demo:

  • S3 spill bucket used in Amazon Web Services Lambda
  • Lambda function used for the data source connector
  • Sample DynamoDB table

Conclusion

In this post, we saw how you can access a cross-account DynamoDB table using Athena Federated Query to query the data in place. You can execute a single SQL query to join this data across data sources like Amazon CloudWatch, Amazon DocumentDB, Amazon Redshift, Amazon OpenSearch, MySQL, PostgreSQL, Oracle, SQL Server, HBase, Redis, BigQuery, Snowflake, Teradata with Athena data source connectors and Amazon S3.


About the author

Satya Adimula is a Senior Data Architect at Amazon Web Services based in Boston. With extensive experience in data and analytics, Satya helps organizations derive their business insights from the data at scale.