'How to grep a string ending in a specific punctuation mark

I'm trying to grep strings that end in a dash in R, but having trouble. I've worked out how to grep strings ending in any punctuation mark, maybe not the best way but this worked:

grep("\\#[[:print:]]+[[:punct:]]$",c)

Can't for the life of me work out how to grep strings that end specifically in a dash for example these strings:

 - # (piano) - not this.
 - # hello hello - not this either.

I'd like to sub all the stuff between the dashes (and including the dashes) with nothing "" and leave the text to the right of the second dash, which end in full stops. So, I would like the output to be (for example, based on the example above):

not this.

and

not this either.

Any help would be appreciated.

Thank you!

Maro


UPDATE: Hi again everyone, I'm just updating my original question again:

So what I had in my original data was these three examples (I tried to simplify in my original post above, but I think it might be helpful for you all to see what I was actually dealing with):

  1. - # (Piano) - no, and neither can you.
  2. - # (Piano) - uh-huh.
  3. - # Many dreams ago - Try it again.

(numbers 1-3 are for the purposes of making things clearer, they are not part of the strings)

I was trying to find a way to delete all the stuff between and including the two dashes, and leave all the stuff after the second dash, so I wanted my output to be:

  1. no, and neither can you.
  2. uh-huh.
  3. Try it again.

I ended up using this:

gsub(("-[[:blank:]]#[[:blank:]]\\(?[A-Z][a-z]*\\)?[[:blank:]]-", "", c) 

which helped me get 1. and 2. in one go. But this didn't help with 3 - I thought by including the question mark after the open and close bracket (which I thought meant 'optional') this would help me get all three targets, but for some reason it didn't. To then get 3, I just ended up targeting that specific string i.e. - # Many dreams ago -, by using:

gsub(("- # Many dreams ago -"), "", c) 

I'm new to this, so not the best solution I'm sure.

In my original post (this has been edited a couple of times) I included square brackets around the three strings, which explains some of the answers I originally received from members of the community. Apologies for the confusion!

Thanks everyone - if there's anything that doesn't make sense, please let me know, and I'll try to clarify.

Maro



Solution 1:[1]

If you want to stay in between the square brackets you can start the match at #, then use a negated character class [^][]* matching optional chars other than an opening or closing square bracket, and match the last -

Replace the match with an empty string.

c <- "[- # (piano) - not this.]"
sub("#[^][]*-", "", c)

Output

[1] "[-  not this.]"

For a more specific match of that string format, you can match the whole line including the square brackets, the # and the string ending on a full stop, and capture what you want to keep.

In the replacement use the capture group value.

c <- c("[- # (piano) - not this.]", "[- # hello hello - not this either.]")
sub("\\[[^][#]*#[^][]*-\\s*([^][]*\\.)]", "\\1", c)

Output

[1] "not this."        "not this either."

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1