'How to match a string that starts and ends with the same letter?

How do I write a regular expression that consists of the characters {x,y} but must start and end with the same letter? For example:

  • xyyyxyx
  • yxyxyxy


Solution 1:[1]

This regex works:

^([xy])[xy]*\1$|^[xy]$

I tested it on regexr with

xyyyxyx
yxyxyxy
x
y
xyyyxyy
yxyxyxx
xyzyxx
z

and it only matched the first four.

Solution 2:[2]

This should work for you:

^(x|y).*\1$

This regex will match a string that starts and ends with the same letter (as the post title suggests), but does not limit the string to contain only x and y characters. It will match any strings, starting and ending with the same letters specified in the parenthesis.

It will match strings consisting of {x,y} characters, starting and ending with the same letter: (as the OP specified.)

xyyyxyx
yxyxyxy
zxyxyxz
xyxyxyy

But it will also match strings with any characters in between (not limited to only x and y):

xgjyhdtfx
yjsaudgty
xuhgrey
yudgfsx
yaaay

Working regex example:

https://regex101.com/r/TER7zI/1

Solution 3:[3]

I'm bad with regex but this would work I think

^(([x][xy]*[x])|([y][xy]*[y])|[x|y])$

Solution 4:[4]

The following regex works in sed.

^\(.\).*\1$

to find what you want.

Solution 5:[5]

This regex should work:

^([xy])(?:.*?\1)?$

Online Demo: http://regex101.com/r/kN0yQ4

Solution 6:[6]

You can use:

/^([xy]|[xy]).*\1$/

I've tested that with the following test cases: (It matches the bold ones)

xyyyxyx
yxyxyxy
x
y
xyyyxyy
yxyxyxx
xyzyxx
z

Solution 7:[7]

^(a).*(a)$|^(b).*(b)$

This works. I have tested with:

  • aba - true

  • abbb -false

  • bab - true

  • abababa - true

  • aba - true

  • aaaabbbbbaaa

Explanation:

1st Alternative ^(a).*(a)$

  • ^ asserts position at start of a line

1st Capturing Group (a)

  • a matches the character a with index 9710 (6116 or 1418) literally (case sensitive)
  • . matches any character (except for line terminators)
  • * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

2nd Capturing Group (a)

  • a matches the character a with index 9710 (6116 or 1418) literally (case sensitive)
  • $ asserts position at the end of a line

2nd Alternative ^(b).*(b)$

  • ^ asserts position at start of a line

3rd Capturing Group (b)

  • b matches the character b with index 9810 (6216 or 1428) literally (case sensitive)
  • . matches any character (except for line terminators)
  • * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

4th Capturing Group (b)

  • b matches the character b with index 9810 (6216 or 1428) literally (case sensitive)
  • $ asserts position at the end of a line Global pattern flags
  • g modifier: global. All matches (don't return after first match)
  • m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3
Solution 4 unxnut
Solution 5 anubhava
Solution 6 S.B
Solution 7 Laurel