'Finding how many times a given string is a substring of another string using scala

what is the elegant way for finding how many times a given string is a substring of another string using scala?

The below test cases that should make clear what are the requirements:

import org.scalatest.FunSuite

class WordOccurrencesSolverTest extends FunSuite {

  private val solver = new WordOccurrencesSolver()

  test("solve for a in a") {
    assert(solver.solve("a", "a") === 1)
  }

  test("solve for b in a") {
    assert(solver.solve("b", "a") === 0)
  }

  test("solve for a in aa") {
    assert(solver.solve("a", "aa") === 2)
  }

  test("solve for b in ab") {
    assert(solver.solve("b", "ab") === 1)
  }

  test("solve for ab in ab") {
    assert(solver.solve("ab", "ab") === 1)
  }

  test("solve for ab in abab") {
    assert(solver.solve("ab", "abab") === 2)
  }

  test("solve for aa in aaa") {
    assert(solver.solve("aa", "aaa") === 2)
  }
}

Here is my solution to the problem of which I am not particularly proud:

class WordOccurrencesSolver {

  def solve(word: String, text: String): Int = {
    val n = word.length
    def solve(acc: Int, word: String, sb: String): Int = sb match {
      case _ if sb.length < n => acc
      case _ if sb.substring(0, n) == word => solve(acc + 1, word, sb.tail)
      case _ => solve(acc, word, sb.tail)
    }
    solve(0, word, text)
  }

}

I assume there must be a clean one liner which takes advantage of Scala's higher order functions instead of recursion and match/case clause.



Solution 1:[1]

If you are looking for an idiomatic Scala solution then you can use the sliding to create a sliding window iterator and count the wondows which are equal to your target String.

This solution while being functional also offers you an acceptable performence.

def countOccurrences(src: String, tgt: String): Int =
  src.sliding(tgt.length).count(window => window == tgt)

Solution 2:[2]

You can probably use this java function:

StringUtils.countMatches(stringToFindMatchesIn, keyWordToFind );

This will return the number of occurences of the keyword in the string

Solution 3:[3]

If somebody else is looking for a very efficient and semi-idomatic solution, use

extension (s: String) def countSubstr(sub: String, overlap: Boolean = false): Int = {
  if (sub.isEmpty) throw IllegalArgumentException("substring must not be empty")
  val d = if (overlap) 1 else sub.length
  @tailrec
  def count(pos: Int, acc: Int): Int =
    s.indexOf(sub, pos) match {
      case -1 => acc
      case j => count(j + d, acc + 1)
    }
  count(0, 0)
}

For Scala 2 compatibility or if you do not like extensions, use

def countSubstr(s: String, sub: String, overlap: Boolean = false): Int = {
  ...

Runtime is O(s.length) for overlap = false. There are no object allocations, the @tailrec method gets optimized to a jump and the match to an if.

As your last example suggests, you wanted to allow overlaps, so it is slightly slower (but worst case is O(s.length*sub.length)).


If you are looking for a (slower, but may still be faster than sliding) one-liner this may be for you:

  def countSubstr(s: String, regex: String): Int =
    s.split(regex, -1).length - 1

Note:

  • Second argument is a regex, this may lead to unexpected results and be slower if a real regex is used.
  • It does not count overlapping substrings, so it fails on the last test.
  • The -1 as second argument to split is important. Otherwise, substrings at the end would be ignored.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 sarveshseri
Solution 2 iPhantomGuy
Solution 3 anselm