'How to write a regex capture group which matches a character 3 or 4 times before a delimiter?

I'm trying to write a regex that splits elements out according to a delimiter. The regex also needs to ensure there are ideally 4, but at least 3 colons : in each match.

Here's an example string:

"Checkers, etc:Blue::C, Backgammon, I say:Green::Pepsi:P, Chess, misc:White:Coke:Florida:A, :::U"

From this, there should be 4 matches:

  • Checkers, etc:Blue::C
  • Backgammon, I say:Green::Pepsi:P
  • Chess, misc:White:Coke:Florida:A
  • :::U

Here's what I've tried so far:

([^:]*:[^:]*){3,4}(?:, )

Regex 101 at: https://regex101.com/r/O8iacP/8

I tried setting up a non-capturing group for ,

Then I tried matching a group of any character that's not a :, a :, and any character that's not a : 3 or 4 times.

The code I'm using to iterate over these groups is:

String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "([^:]*:[^:]*){3,4}(?:, )";

  // Create a Pattern object
  Pattern r = Pattern.compile(pattern);

  // Now create matcher object.
  Matcher matcher = r.matcher(line);
  while (matcher.find()) {
        System.out.println(matcher.group(1));
    }

Any help is appreciated!

Edit

Using @Casimir's regex, it's working. I had to change the above code to use group(0) like this:

String line = "Checkers, etc:Blue::C, Backgammon, I say::Pepsi:P, Chess:White:Coke:Florida:A, :::U";
String pattern = "(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])";

// Create a Pattern object
Pattern r = Pattern.compile(pattern);

// Now create matcher object.
Matcher matcher = r.matcher(line);
while (matcher.find()) {
    System.out.println(matcher.group(0));
}

Now prints:

Checkers, etc:Blue::C
Backgammon, I say::Pepsi:P
Chess:White:Coke:Florida:A
:::U

Thanks again!



Solution 1:[1]

I suggest this pattern:

(?![\\s,])(?:[^:]*:){3}\\S*(?![^,])

Negative lookaheads avoid to match leading or trailing delimiters. The second one in particular forces the match to be followed by the delimiter or the end of the string (not followed by a character that isn't a comma).

demo

Note that the pattern doesn't have capture groups, so the result is the whole match (or group 0).

Solution 2:[2]

You might use

(?:[^,:]+, )?[^:,]*(?::+[^:,]+)+
  • (?:[^,:]+, )? Optionally match 1+ any char except a , or : followed by , and space
  • [^:,]* Match 0+ any char except : or ,
  • (?: Non Capturing group
    • :+[^:,]+ Match 1+ : and 1+ times any char except : and ,
  • )+ Close group and repeat 1+ times

Regex demo

Solution 3:[3]

You seem to be making it harder than it needs to be with the lookahead (which won't be satisfied at end-of-line anyway).

([^:]*:){3}[^:,]*:?[^:,]*

Find the first 3 :'s, then start including , in the negative groupings, with an optional 4th :.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 The fourth bird
Solution 3 Jeff Y