'C# How to know which elements of a list are substrings of a string?

If I have a list of string like

var MyList = new List<string>
{
    "substring1", "substring2", "substring3", "substring4", "substring5"
};

is there any efficient way to determine which elements of that list are contained in the following string

"substring1 is the substring2 document that was processed electronically"

In this case the result should be

var MySubList = new List<string>
{
    "substring1", "substring2"
};


Solution 1:[1]

  1. Split the Text by whitespaces
  2. Sort the words alphabetically
  3. Create a unique list from that
var words = Text.Split(" ").OrderBy(word => word).Distinct().ToList();
  1. Create an accumulator collection for the matches
  2. Create two index variables (one for the words, one for the patterns)
List<string> matches = new();
int patternIdx = 0, wordIdx = 0;
  1. Iterate through the lists until you reach one of the collections' end
while(patternIdx < patterns.Count && wordIdx < words.Count)
{

}
  1. Perform a string comparison
  2. Advance index variable(s) based on the comparison result
int comparison = string.Compare(patterns[patternIdx],words[wordIdx]);
switch(comparison)
{
    case > 0: wordIdx++; break;
    case < 0: patternIdx++; break;
    default: 
    {
        matches.Add(patterns[patternIdx]); 
        wordIdx++;
        patternIdx++;
        break;
    }
}

Here I've used C# 9 new feature switch + pattern matching.
If you can't use C# 9 then a if ... else if .. else block would be fine as well.


For the sake of completeness here is the whole code

var Text = "substring1 is the substring2 document that was processed electronically";
var words = Text.Split(" ").OrderBy(x => x).Distinct().ToList();
var patterns = new List<string> {  "substring1", "substring2", "substring3", "substring4", "substring5" };

List<string> matches = new();
int patternIdx = 0, wordIdx = 0;
while(patternIdx < patterns.Count && wordIdx < words.Count)
{
    int comparison = string.Compare(patterns[patternIdx], words[wordIdx]);
    switch(comparison)
    {
        case > 0: wordIdx++; break;
        case < 0: patternIdx++; break;
        default: 
        {
            matches.Add(patterns[patternIdx]); 
            wordIdx++;
            patternIdx++;
            break;
        }
    }
}

Dotnetfiddle link

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Peter Csala