'Java Regex Groups within groups

I'm struggling with regex since an hour now. So, just below this is my text I want to parse.

AddAgenda("Gangster's agenda", 
{ 
    TEAM_HITMAN, 
    TEAM_POLICE
}, 
{   
    TEAM_GANG,
    TEAM_MAFIA,
    TEAM_GANGSTER
})

I would like to capture the agenda name, each team INDIVUDUALLY from each pair of curly braces. One thing to know is that I don't know how much teams there are within with each of these pairs.

Basically, I want this:

Group [1]:
    Gangster's agenda
Group [2]:
    Group [0]: TEAM_HITMAN
    Group [1]: TEAM_POLICE
Group [3]:
    Group [0]: TEAM_GANG
    Group [1]: TEAM_MAFIA
    Group [2]: TEAM_GANGSTER

But I've only came up with this:

AddAgenda\(\"([^"]+)\",\s*\{(\s*([\w_]+,))*

Which produces this:

Group [0]:
    [0]: AddAgenda("Gangster's agenda", 
{ 
    TEAM_MOB, 
    TEAM_POLICE,
Group [1]:
    [0]: Gangster's agenda
Group [2]:
   [0]:  
    TEAM_POLICE,
Group [3]:
    [0]: TEAM_POLICE,


Solution 1:[1]

This my try:

AddAgenda\(\"([^"]+)\",\s*\{(\s*([\w_]+)\s*,?\s*([\w_]+)\s*)},\s*{\s*(([\w_]+)\s*,?\s*([\w_]+)?\s*,?\s*([\w_]+)?)\s*}\s*\)

So looking at your question, group 3,4 are contained by group 2, group 6,7,8 (TEAM_GANG, TEAM_MAFIA, TEAM_GANGSTER) are contained by Group 5.

The problem we have using only one regex is that you can't generate a new capture group automatically for each TEAM. To be clear, one features of regular expressions is that you can apply quantifiers to patterns. \d+ eats up more digits. For instance, the regex (\d) captures a digit into group 1.

So what happens if you put the two together into this regex?

(\d)+

The capturing parentheses you see in a pattern only capture a single group. So in (\d)+, capture groups do not proceed repeating the match again. The regex repeatedly refer again to the same group. If you try (\d)+ regex on 1234, Group 1 will contain 4, the last capture.

In a nut shell, Group 1 has been overwritten every time the regex iterates through the capturing parentheses.

Using two regex you could divide the problem in two parts. First match the three parameters in AddAgenda. Then split the two parameters in curly braces.

First regular expression could be:

AddAgenda\("([^"]+)",\s*\{\s*([^}]+)\},\s*\{\s*([^}]+)\s*\}\)

Second regular expression very simple.

([\w_]+)

In this case, in java you could execute the following code:

Matcher m = Pattern.compile("[\\w_]+").matcher(s);
while (m.find()) {
    System.out.println(m.group());
}

Solution 2:[2]

Something like this?

\"(.*)\"|\s([A-Z_].*)

https://regex101.com/r/6vJpXe/1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Glorfindel
Solution 2 Martijn Burger