'Adding AWS Kinesis and Kinesis Firehose to an existing DynamoDB

We are looking to add Kinesis Streams and Kinesis Firehose to migrate data from our DynamoDB operational data store to S3.

I have created the Kinesis Stream and Kinesis Firehose Delivery Stream to send the data to an S3 bucket. All Insert, Modified and Remove events are being captured, transformed and added to the S3 bucket with a prefix of data/[YEAR]/[MONTH]/[DAY].

The question I have is around the data in the DynamoDB prior to enabling Kinesis. What is the best way to migrate the data to S3. I understand that you can do an Export to S3 from the DynamoDB table but that puts data into a predefined folder.

Any idea on the best approach here?



Solution 1:[1]

The formats of the DynamoDB Stream and the DynamoDB Export are different, as they are serving slightly different use cases. Nevertheless, it is possible to create a single view from both. If you want to run analytical queries on the data that you exported from DynamoDB into S3, you probably want to use Athena as your SQL engine.

  1. Export the data from DynamoDB using DynamoDB Export (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataExport.html)
  2. Create an Athena table on that export
  3. Enable DynamoDB Stream into S3 through Firehose (https://aws.amazon.com/blogs/big-data/build-seamless-data-streaming-pipelines-with-amazon-kinesis-data-streams-and-amazon-kinesis-data-firehose-for-amazon-dynamodb-tables/)
  4. Create an Athena table on that stream
  5. Create a unified view on these tables to make them simple to query.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Guy