'Storing Firehose transfered files in S3 under custom directory names

We primarily do bulk transfer of incoming click stream data through Kinesis Firehose service. Our system is a multi tenant SaaS platform. The incoming click stream data are stored S3 through Firehose. By default, all the files are stored under directories named per given date-format. I would like to specify the directory path for the data files in Firehose planel \ through API in order to segregate the customer data.

For example, the directory structure that I would like to have in S3 for customers A, B and C :

/A/2017/10/12/

/B/2017/10/12/

/C/2017/10/12/

How can I do it?



Solution 1:[1]

AWS Firehose supports the dynamic partitioning .

It can be done in two ways either with inline JQ parser or lambda function.

Example:

"ExtendedS3DestinationConfiguration": {  
"BucketARN": "arn:aws:s3:::my-logs-prod",  
"Prefix": "customer_id=!{partitionKeyFromQuery:customer_id}/ 
    device=!{partitionKeyFromQuery:device}/ 
    year=!{partitionKeyFromQuery:year}/  
    month=!{partitionKeyFromQuery:month}/  
    day=!{partitionKeyFromQuery:day}/  
    hour=!{partitionKeyFromQuery:hour}/"  
} 

Solution 2:[2]

You can separate your directories by configuring the S3 Prefix. In the console, this is done during setup when you set the S3 bucket name.

Using the CPI, you set the prefix in the --s3-destination-configuration as shown here:

http://docs.aws.amazon.com/cli/latest/reference/firehose/create-delivery-stream.html

Note however, you can only set one prefix per Firehose Delivery Stream, so if you're passing all of your clickstream data through one Firehose Delivery Stream you will not be able to send the records to different prefixes.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nishu Tayal
Solution 2 devonlazarus