'Finding how many times a given string is a substring of another string using scala
what is the elegant way for finding how many times a given string is a substring of another string using scala?
The below test cases that should make clear what are the requirements:
import org.scalatest.FunSuite
class WordOccurrencesSolverTest extends FunSuite {
private val solver = new WordOccurrencesSolver()
test("solve for a in a") {
assert(solver.solve("a", "a") === 1)
}
test("solve for b in a") {
assert(solver.solve("b", "a") === 0)
}
test("solve for a in aa") {
assert(solver.solve("a", "aa") === 2)
}
test("solve for b in ab") {
assert(solver.solve("b", "ab") === 1)
}
test("solve for ab in ab") {
assert(solver.solve("ab", "ab") === 1)
}
test("solve for ab in abab") {
assert(solver.solve("ab", "abab") === 2)
}
test("solve for aa in aaa") {
assert(solver.solve("aa", "aaa") === 2)
}
}
Here is my solution to the problem of which I am not particularly proud:
class WordOccurrencesSolver {
def solve(word: String, text: String): Int = {
val n = word.length
def solve(acc: Int, word: String, sb: String): Int = sb match {
case _ if sb.length < n => acc
case _ if sb.substring(0, n) == word => solve(acc + 1, word, sb.tail)
case _ => solve(acc, word, sb.tail)
}
solve(0, word, text)
}
}
I assume there must be a clean one liner which takes advantage of Scala's higher order functions instead of recursion and match/case clause.
Solution 1:[1]
If you are looking for an idiomatic Scala solution then you can use the sliding
to create a sliding window iterator and count the wondows which are equal to your target String.
This solution while being functional also offers you an acceptable performence.
def countOccurrences(src: String, tgt: String): Int =
src.sliding(tgt.length).count(window => window == tgt)
Solution 2:[2]
You can probably use this java function:
StringUtils.countMatches(stringToFindMatchesIn, keyWordToFind );
This will return the number of occurences of the keyword in the string
Solution 3:[3]
If somebody else is looking for a very efficient and semi-idomatic solution, use
extension (s: String) def countSubstr(sub: String, overlap: Boolean = false): Int = {
if (sub.isEmpty) throw IllegalArgumentException("substring must not be empty")
val d = if (overlap) 1 else sub.length
@tailrec
def count(pos: Int, acc: Int): Int =
s.indexOf(sub, pos) match {
case -1 => acc
case j => count(j + d, acc + 1)
}
count(0, 0)
}
For Scala 2 compatibility or if you do not like extensions, use
def countSubstr(s: String, sub: String, overlap: Boolean = false): Int = {
...
Runtime is O(s.length)
for overlap = false
.
There are no object allocations, the @tailrec
method gets optimized to a jump and the match
to an if
.
As your last example suggests, you wanted to allow overlaps, so it is slightly slower (but worst case is O(s.length*sub.length)
).
If you are looking for a (slower, but may still be faster than sliding
) one-liner this may be for you:
def countSubstr(s: String, regex: String): Int =
s.split(regex, -1).length - 1
Note:
- Second argument is a regex, this may lead to unexpected results and be slower if a real regex is used.
- It does not count overlapping substrings, so it fails on the last test.
- The
-1
as second argument tosplit
is important. Otherwise, substrings at the end would be ignored.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | sarveshseri |
Solution 2 | iPhantomGuy |
Solution 3 | anselm |