'Delete a word from a string which contains hashtags
I have already done a lot of "filtering" with regexp to remove unwanted characters from a string, this is what i am using:
var regexpHashtag = new RegExp(/(?:^|\s)(?:#)([a-zA-Z\d]+)/g)
var regexpUrl = new RegExp(/(?:https?|ftp):\/\/[\n\S]+/g)
var regexpEmoji = new RegExp(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g)
var regexpQuotes = new RegExp(/['"]+/g)
tweetText = tweetText.replace(regexpHashtag, '')
tweetText = tweetText.replace(regexpUrl, '')
tweetText = tweetText.replace(regexpEmoji, '')
tweetText = tweetText.replace(regexpQuotes, '')
but still there are cases where hashtag persists, for example before filtering:
Pogledajte prizore koje je naš fotograf danas zabilježio na Ilidži (FOTO) 📸☀️☀️☀️#Setnja #Ilidza #Malaaleja
after:
Pogledajte prizore koje je naš fotograf danas zabilježio na Ilidži (FOTO) ️️️#Setnja
"#Setnja" this word is what is causing my problem, is it because there are emoji symbols before a word because these hashtags "#Ilidza #Malaaleja" are removed. How can i improve my regexp to delete this word? Thanks.
Solution 1:[1]
Your logic admits that a hashtag may be preceded by some character, so remove the whitespace boundary check on the LHS:
var regexpHashtag = new RegExp(/#[a-zA-Z\d]+/g)
var regexpUrl = new RegExp(/(?:https?|ftp):\/\/[\n\S]+/g)
var regexpEmoji = new RegExp(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g)
var regexpQuotes = new RegExp(/['"]+/g)
tweetText = "Pogledajte prizore koje je naš fotograf danas zabilježio na Ilidži (FOTO) ???????#Setnja #Ilidza #Malaaleja";
tweetText = tweetText.replace(regexpHashtag, '')
tweetText = tweetText.replace(regexpUrl, '')
tweetText = tweetText.replace(regexpEmoji, '')
tweetText = tweetText.replace(regexpQuotes, '')
console.log(tweetText);
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Tim Biegeleisen |