'Splitting a Rust string into a Vec of its substrings with contiguous identical characters

I have a string or a &str of ASCII characters and I want to separate it into a Vec of all substrings with contiguous identical characters (for example, "aabbca" would become ["aa","bb","c","a"]).

I could build a function that iterates over the individual characters and gradually builds a Vec of strings, but I have the feeling that I'd be reinventing the wheel. Is there a more idiomatic way to achieve this?

Here's my intuitive (and current) solution, implemented for &str:

fn split_cont_chars(source:&str) -> Vec<String> {
    let mut answer: Vec<String> = Vec::new();

    let mut head_char = source.chars().next().unwrap();
    let mut counter: usize = 1;

    for c in source.chars().skip(1) {
        if c == head_char {
            counter += 1;
        }
        else {
            answer.push(head_char.to_string().repeat(counter));
            head_char = c;
            counter = 1;
        }
    }
    answer.push(head_char.to_string().repeat(counter));

    answer
}

This works as intended, but it is much more verbose than the average rust code that tackles iterative problems like these.



Solution 1:[1]

There doesn't seem to be a more functional translation of the original solutions, but there is a more idiomatic one:

struct LetterSequence {
    char_type: char,
    len: usize
}

impl LetterSequence {
    fn new(a:char, b:usize) -> Self {
        LetterSequence{char_type:a, len:b}
    }
    fn to_string(&self) -> String {
        self.char_type.to_string().repeat(self.len)
    }
}

fn split_char_struct(source:&str) -> Vec<LetterSequence> {
    let mut answer: Vec<LetterSequence> = Vec::new();

    let mut seq_count: usize = 1;
    let mut head_char: char = source.chars().next().unwrap();

    for c in source.chars().skip(1) {
        if c == head_char {
            seq_count += 1;
        }
        else {
            answer.push(LetterSequence::new(head_char, seq_count));
            head_char = c;
            seq_count = 1;
        }
    }
    answer.push(LetterSequence::new(head_char, seq_count));

answer}

With the help of the LetterSequence struct, we avoid having to maintain potentially as many mutable strings as the total length of the starting &str.

Solution 2:[2]

You could use itertools::Iterator::group_by over s.chars() (with the identity function), but it'll inherently be inefficient because when it comes time to collect the chars into a String, you'll have to allocate a String for each result. I think the only way to get slices into the original String is to do it manually find the slices yourself. (If itertools offered a group_by_indices function, then you could use it to construct the slices, but AFAIK it does not.)

fn char_runs(s: &str) -> Vec<&str> {
    let mut slices = Vec::new();
    let mut it = s.char_indices();
    let (mut slice_start, mut prev_char) = match it.next() {
        Some(pair) => pair,
        None => return slices,
    };

    for (i, c) in it {
        if c != prev_char {
            slices.push(&s[slice_start..i]);
            slice_start = i;
            prev_char = c;
        }
    }

    slices.push(&s[slice_start..]);

    slices
}

fn main() {
    let strings = ["", "a", "aa", "aab", "aabb", "aabbca", "aa??bb"];
    for s in strings {
        println!("{:?} => {:?}", s, char_runs(s));
    }

    // "" => []
    // "a" => ["a"]
    // "aa" => ["aa"]
    // "aab" => ["aa", "b"]
    // "aabb" => ["aa", "bb"]
    // "aabbca" => ["aa", "bb", "c", "a"]
    // "aa??bb" => ["aa", "??", "bb"]
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Peter Mortensen
Solution 2 mcarton