'How to write kinesis events as batches using AWS Lambda?

Currently, I have a Kinesis data stream with one shard which consumes a stream of records from an external source.

Each record is treated as a single event and is stored in an S3 bucket under CSV format by a Lambda function in python. However, it is possible to receive more than 1k records in just a couple of minutes, which means writing 1k CSV files.

It is possible to accumulate events by batches and write a single CSV file, let's say, every 5 minutes, no matter the number of records using lambda? or Accumulate a certain number of records and then write them into a single csv file?

Thanks!



Solution 1:[1]

This is the problem that Kinesis Firehose exists to solve.

However, it may not be the right choice in this situation, unless each message includes a header row (in which case you'll have to strip that header from all records but the first in a batch using a transformation Lambda), or you don't care about headers.

Instead, I would configure the event source with a batch size and time window: the trigger will wait for N seconds to accumulate M records, and send them all to the Lambda at once.

You can also use tumbling windows in the event source. This is new, and I haven't tried it, but it appears that you need to modify your Lambda to implement. I would only turn to this if the other options don't do what you need.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Parsifal