'Mimicking negative lookbehind to match a pattern not immediately preceded with a specific character in JavaScript regex

I have this regex in Javascript:

0x[A-F0-9]{2}\g

I would like to modify that in order to get a match when the previous character IS NOT a \. Something like that:

  • 0x60 -> true
  • \0x60 -> false

I came out with something like that but it's not working properly:

[^\\]0x[A-F0-9]{2}\g

It matches EVERYTHING except \, where by EVERYTHING i mean:

  • a0x50 -> true, including "a"
  • _0x50 -> true, including "_"
  • ...
  • \0x50 -> false

Regex example: regex101, followed by a Plnkr.

Is it possible to achieve that? Thanks.



Solution 1:[1]

The main point is to match the pattern you would normally put into a negative lookbehind into an optional capturing group, and then check if the group matched. If it did, you do not need the match, else, use it.

If you need to match and collect substrings, use

var re = /(\\?)0x[A-F0-9]{2}/gi; 
var str = '\\0x50 0x60 asdasda0x60';
var res = [];
while ((m = re.exec(str)) !== null) {
	if (!m[1]) {
  	res.push(m[0]); 
  }
}
document.body.innerHTML = "TEST: " + str + "<br/>";
document.body.innerHTML += "RES: " + JSON.stringify(res,0,4) + "<br/>";

If you need to replace only those strings that have no \ before the 0x.., use a callback within the replace method to check if Group 1 matched. If it did, replace with the whole match, and if not, just replace with the pattern you need.

var re = /(\\?)0x[A-F0-9]{2}/gi; 
var str = '\\0x50 0x60 asdasda0x60';
var res = str.replace(re, function(m, group1){
	return group1 ? m : "NEW_VAL";
});
document.body.innerHTML = "TEST: " + str + "<br/>";
document.body.innerHTML += "RES: " + res + "<br/>";

Solution 2:[2]

JavaScript doesn't support lookbehinds, and as you already suggested the following will consume up an extra character (the character before 0x):

/[^\\]0x[A-F0-9]{2}/g

You can do some ugly hacks like:

'\\0x25 0x60'.match(/([^\\]|^)0x[A-F0-9]{2}/g).map(function(val) {
  return val.slice(1);
});
['0x60']

which will consume the leading character but remove it though an iteration over the matches array.

This however makes inputs like 0x600x60 give ['0x60'] instead of ['0x60', '0x60']

Solution 3:[3]

You could match both the bad and the good.
This would keep it aligned on all the good so you wouldn't miss any.

(?:\\0x[A-F0-9]{2}|(0x[A-F0-9]{2}))

In this case, only the good show up in capture group 1.

 (?:
      \\ 0x [A-F0-9]{2}     # Bad
   |  
      ( 0x [A-F0-9]{2} )    # (1), Good
 )

Solution 4:[4]

This will do it:

(?:[^\\]|^)0x[A-F0-9]{2}

var myregexp = /(?:[^\\]|^)0x[A-F0-9]{2}/mg;
var subject = '0x60 \0x99 0x60 \0x99 0x60 0x60';
var match = myregexp.exec(subject);
while (match != null) {
	for (var i = 0; i < match.length; i++) {
		document.body.innerHTML += match[i]+ "<br/>";
	}
	match = myregexp.exec(subject);
}

Regex Explanation:

(?:[^\\]|^)0x[A-F0-9]{2}

Match the regular expression below «(?:[^\\]|^)»
   Match this alternative (attempting the next alternative only if this one fails) «[^\\]»
      Match any character that is NOT the backslash character «[^\\]»
   Or match this alternative (the entire group fails if this one fails to match) «^»
      Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed, line feed, line separator, paragraph separator) «^»
Match the character string “0x” literally (case insensitive) «0x»
Match a single character present in the list below «[A-F0-9]{2}»
   Exactly 2 times «{2}»
   A character in the range between “A” and “F” (case insensitive) «A-F»
   A character in the range between “0” and “9” «0-9»

Solution 5:[5]

You're in luck if you're using Node, or are willing to turn on a browser flag (from here):

Lookbehind assertions are currently in a very early stage in the TC39 specification process. However, because they are such an obvious extension to the RegExp syntax, we decided to prioritize their implementation. You can already experiment with lookbehind assertions by running V8 version 4.9 or later with --harmony, or by enabling experimental JavaScript features (use about:flags) in Chrome from version 49 onwards.

Now of course it's just

/(?<!\\)0x[A-F0-9]{2}/g

There are other approaches for simulating lookbehinds in this answer. My favorite one is reversing the string and using a lookahead.

var re = /[A-F0-9]{2}x0(?!\\)/g;
var str = "0x60 \0x33";

function reverse(s) { return s.split('').reverse().join(''); }

document.write(reverse(str).match(re).map(reverse));

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew
Solution 2 Andreas Louv
Solution 3
Solution 4
Solution 5 Community