'How do I parse by regular expressions only on filtered lines on Cloudwatch log insights?
Is there a way to restructure this cloudwatch insights query so that it runs faster?
fields @timestamp, @message
| filter @message like /NewProductRequest/
| parse @message /.*"productType":\s*"(?<productType>\w+)"/
| stats count(*) group productType
I am running it over a limited period (1 day's worth of logs). It is taking very long to run.
When I remove the parse
command, and count(*)
the filtered lines: there are only 2500 matches out of 20,000,000 lines: the query returns in several seconds
With the parse
command, the query takes >15 minutes. I can see the throughput drop from ~1GBps to ~2MBps.
Running a parse
regexp on 2500 filtered lines should be negligible. It takes less then 2 seconds if I download the filtered results to my macbook and run the regexp in Python.
This leads me to believe that cloudwatch is running the parse
command on every line in the log, and not just the filtered lines.
Is there a way to restructure my query so that the parse
command will run after my filter command? ( Effectively parsing 2.5k lines instead of 20 million lines)
Solution 1:[1]
Removing the .*
at the beginning of the expression increases performance. If you only searching for a string starting after any character sequence (.*
), then this solution will work for you. This does not solve problems if the beginning of your regexp is anything other than .*
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | eshalev |