'How do I parse by regular expressions only on filtered lines on Cloudwatch log insights?

Is there a way to restructure this cloudwatch insights query so that it runs faster?

fields @timestamp, @message
| filter @message like /NewProductRequest/
| parse @message /.*"productType":\s*"(?<productType>\w+)"/
| stats count(*) group productType

I am running it over a limited period (1 day's worth of logs). It is taking very long to run.

When I remove the parse command, and count(*) the filtered lines: there are only 2500 matches out of 20,000,000 lines: the query returns in several seconds

With the parse command, the query takes >15 minutes. I can see the throughput drop from ~1GBps to ~2MBps.

Running a parse regexp on 2500 filtered lines should be negligible. It takes less then 2 seconds if I download the filtered results to my macbook and run the regexp in Python.

This leads me to believe that cloudwatch is running the parse command on every line in the log, and not just the filtered lines.

Is there a way to restructure my query so that the parse command will run after my filter command? ( Effectively parsing 2.5k lines instead of 20 million lines)

amazon-cloudwatch amazon-cloudwatchlogs aws-cloudwatch-log-insights

Solution 1:^[1]

Removing the .* at the beginning of the expression increases performance. If you only searching for a string starting after any character sequence (.*), then this solution will work for you. This does not solve problems if the beginning of your regexp is anything other than .*

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	eshalev

'How do I parse by regular expressions only on filtered lines on Cloudwatch log insights?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]