'Javascript: find all occurrences of word in text document
I'm trying to write a Javascript function to find indices of all occurrences of a word in a text document. Currently this is what I have--
//function that finds all occurrences of string 'needle' in string 'haystack'
function getMatches(haystack, needle) {
if(needle && haystack){
var matches=[], ind=0, l=needle.length;
var t = haystack.toLowerCase();
var n = needle.toLowerCase();
while (true) {
ind = t.indexOf(n, ind);
if (ind == -1) break;
matches.push(ind);
ind += l;
}
return matches;
}
However, this gives me a problem since this matches the occurrences of the word even when it's part of a string. For example, if the needle is "book" and haystack is "Tom wrote a book. The book's name is Facebook for dummies", the result is the index of 'book', 'book's' and 'Facebook', when I want only the index of 'book'. How can I accomplish this? Any help is appreciated.
Solution 1:[1]
Here's the regex I propose:
/\bbook\b((?!\W(?=\w))|(?=\s))/gi
To fix your problem. Try it with the exec()
method. The regexp I provided will also consider words like "booklet" that occur in the example sentence you provided:
function getMatches(needle, haystack) {
var myRe = new RegExp("\\b" + needle + "\\b((?!\\W(?=\\w))|(?=\\s))", "gi"),
myArray, myResult = [];
while ((myArray = myRe.exec(haystack)) !== null) {
myResult.push(myArray.index);
}
return myResult;
}
Edit
I've edited the regexp to account for words like "booklet" as well. I've also reformatted my answer to be similar to your function.
You can do some testing here
Solution 2:[2]
Try this:
function getMatches(searchStr, str) {
var ind = 0, searchStrL = searchStr.length;
var index, matches = [];
str = str.toLowerCase();
searchStr = searchStr.toLowerCase();
while ((index = str.indexOf(searchStr, ind)) > -1) {
matches.push(index);
ind = index + searchStrL;
}
return matches;
}
indexOf
returns the position of the first occurrence of book.
var str = "Tom wrote a book. The book's name is Facebook for dummies";
var n = str.indexOf("book");
Solution 3:[3]
I don't know what is going on there but I can offer a better solution using a regex.
function getMatches(haystack, needle) {
var regex = new RegExp(needle.toLowerCase(), 'g'),
result = [];
haystack = haystack.toLowerCase();
while ((match = regex.exec(haystack)) != null) {
result.push(match.index);
}
return result;
}
Usage:
getMatches('hello hi hello hi hi hi hello hi hello john hi hi', 'hi');
Result => [6, 15, 18, 21, 30, 44, 47]
Conserning your book
vs books
problem, you just need to provide "book "
with a space.
Or in the function you could do.
needle = ' ' + needle + ' ';
Solution 4:[4]
The easiest way might be using text.match(RegX)
function. For example you can write something like this for a case insensitive search:
"This is a test. This is a Test.".match(/test/gi)
Result:
(2) ['test', 'Test']
Or this one for case sensitive scenarios:
"This is a test. This is a Test.".match(/test/g)
Result:
['test']
let myControlValue=document.getElementById('myControl').innerText;
document.getElementById('searchResult').innerText=myControlValue.match(/test/gi)
<p id='myControl'>This is a test. Just a Test
</p>
<span><b>Search Result:</b></span>
<div id='searchResult'></div>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | |
Solution 4 | Mohamad Bahmani |