'Regex to search for unique last names in XML
I have last names in an XML file that I would like to capture, which are unique. I am going off this other StackOverflow answer to start: Only match unique string occurrences I am not able to match the strings that I have with this to return one Adams and one Yellow.
\b(.*<LastName>(.*)<\/LastName>)\b(?![\s\S]*\b\1\b)
<LastName>Adams</LastName>
<LastName>Adams</LastName>
<LastName>Yellow</LastName>
Solution 1:[1]
Does this work for you?
/<LastName>(\w+)<\/LastName>(?!.*<LastName>\1<\/LastName>)/gsm
(note the flags, they're important)
The issue was that your (.*)
to match the name allowed it to match across multiple lines. I replaced it with \w+
so it only matches word characters (depending on your needs something a little more international might be needed, though).
Solution 2:[2]
You can capture the name of the tag and it's content.
Then use the backreferences in the negative lookahead.
A lazy search .*?
for the tag's content helps here.
<(LastName)>(.*?)<\/\1>(?![\s\S]*?<\1>\2<\/\1>)
Test on regex101 here
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | isaactfa |
Solution 2 | LukStorms |