'Hive SQL regexp_extract (number)_(number)

I'm new to hiveSQL and I'm trying to extract a value from the column col_a from the data df which is in this format: \\\"id\\\":\\\"101_12345\\\" I only need to extract 101_12345, but underscore makes it hard to satisfy my need. I tried using regexp_extract(col_a, '(\\d+)[_](\\d+)') but only outputs 101. Could I get some help with regexp? Thank you



Solution 1:[1]

Simple solution: You don't need the two brackets.

Here's a working solution: '\\d+[_]\\d+'

When you put tokens into parentheses, the regex engine will group its match together, separate from the complete match. So the final result will comprise the complete match, and two extra matches representing the one before and after the underscore. To avoid this, just remove the brackets as you don't really need them.

In the future, if you want to group a regex together but don't want the result to contain it separately, use a non-capturing group given by (?:).

Here's a demo of what your code resulted in, hosted at regex101.com

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Robo Mop