'REGEXP_EXTRACT in Impala
I am trying to figure out how to extract customer ID from string that looks loke this:
{"param":"success","value":"10","level":"0","error_code":"101","customer_id":"5b0e9b23e423b0d33c9f7ddfd", "purchases": "13", "last_activity_ts": "123523465"}
I am trying to extract customer ID from strings that contain error code 101 with following code:
select regexp_extract(field, '\"customer_id":"(.*)', 0) from table_name
where field rlike '"error_code":"101"'
But this gives me a following result:
"customer_id":"5b0e9b23e423b0d33c9f7ddfd", "purchases": "13", "last_activity_ts": "123523465"}
Expected result:
5b0e9b23e423b0d33c9f7ddfd
Could you please help me with this?
Solution 1:[1]
You can use below regex:
"customer_id":"(\w+)"
Demo : https://regex101.com/r/MEOGw8/39
Test:
{"param":"success","value":"10","level":"0","error_code":"101","customer_id":"5b0e9b23e423b0d33c9f7ddfd", "purchases": "13", "last_activity_ts": "123523465"}
Match:
Match 1
Full match 63-104 `"customer_id":"5b0e9b23e423b0d33c9f7ddfd"`
Group 1. 78-103 `5b0e9b23e423b0d33c9f7ddfd`
SQL Statement:
select regexp_extract(field, '"customer_id":"(\w+)"',1, 1) from table_name
where field rlike '"error_code":"101"'
Solution 2:[2]
Your regex matches from "customer_id":"
until the end of the line because you use .*
which will match any charcter zero or more times and you use 0
as the last parameter of regexp_extract.
which refers to the entire extracted string.
To match what is between the double quotes, you could match not a double quote and capture that in a group ([^"]+)
using a negated character class:
Or you might specify the character ranges in a character class, repeat it one or more times ([a-f0-9]+)
and capture that in a group:
Your value is in the first capturing group which I think you could specify using 1
as the third parameter for regexp_extract.
regexp_extract(field, '"customer_id":"([a-f0-9]+)"', 1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Wiktor Stribiżew |
Solution 2 |