'Java Regex Groups within groups
I'm struggling with regex since an hour now. So, just below this is my text I want to parse.
AddAgenda("Gangster's agenda",
{
TEAM_HITMAN,
TEAM_POLICE
},
{
TEAM_GANG,
TEAM_MAFIA,
TEAM_GANGSTER
})
I would like to capture the agenda name, each team INDIVUDUALLY from each pair of curly braces. One thing to know is that I don't know how much teams there are within with each of these pairs.
Basically, I want this:
Group [1]:
Gangster's agenda
Group [2]:
Group [0]: TEAM_HITMAN
Group [1]: TEAM_POLICE
Group [3]:
Group [0]: TEAM_GANG
Group [1]: TEAM_MAFIA
Group [2]: TEAM_GANGSTER
But I've only came up with this:
AddAgenda\(\"([^"]+)\",\s*\{(\s*([\w_]+,))*
Which produces this:
Group [0]:
[0]: AddAgenda("Gangster's agenda",
{
TEAM_MOB,
TEAM_POLICE,
Group [1]:
[0]: Gangster's agenda
Group [2]:
[0]:
TEAM_POLICE,
Group [3]:
[0]: TEAM_POLICE,
Solution 1:[1]
AddAgenda\(\"([^"]+)\",\s*\{(\s*([\w_]+)\s*,?\s*([\w_]+)\s*)},\s*{\s*(([\w_]+)\s*,?\s*([\w_]+)?\s*,?\s*([\w_]+)?)\s*}\s*\)
So looking at your question, group 3,4 are contained by group 2, group 6,7,8 (TEAM_GANG, TEAM_MAFIA, TEAM_GANGSTER) are contained by Group 5.
The problem we have using only one regex is that you can't generate a new capture group automatically for each TEAM. To be clear, one features of regular expressions is that you can apply quantifiers to patterns. \d+ eats up more digits. For instance, the regex (\d) captures a digit into group 1.
So what happens if you put the two together into this regex?
(\d)+
The capturing parentheses you see in a pattern only capture a single group. So in (\d)+, capture groups do not proceed repeating the match again. The regex repeatedly refer again to the same group. If you try (\d)+
regex on 1234, Group 1 will contain 4
, the last capture.
In a nut shell, Group 1 has been overwritten every time the regex iterates through the capturing parentheses.
Using two regex you could divide the problem in two parts. First match the three parameters in AddAgenda
. Then split the two parameters in curly braces.
First regular expression could be:
AddAgenda\("([^"]+)",\s*\{\s*([^}]+)\},\s*\{\s*([^}]+)\s*\}\)
Second regular expression very simple.
([\w_]+)
In this case, in java you could execute the following code:
Matcher m = Pattern.compile("[\\w_]+").matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Solution 2:[2]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Glorfindel |
Solution 2 | Martijn Burger |