'Splitting a Rust string into a Vec of its substrings with contiguous identical characters
I have a string
or a &str
of ASCII characters and I want to separate it into a Vec
of all substrings with contiguous identical characters (for example, "aabbca"
would become ["aa","bb","c","a"]
).
I could build a function that iterates over the individual characters and gradually builds a Vec of strings, but I have the feeling that I'd be reinventing the wheel. Is there a more idiomatic way to achieve this?
Here's my intuitive (and current) solution, implemented for &str
:
fn split_cont_chars(source:&str) -> Vec<String> {
let mut answer: Vec<String> = Vec::new();
let mut head_char = source.chars().next().unwrap();
let mut counter: usize = 1;
for c in source.chars().skip(1) {
if c == head_char {
counter += 1;
}
else {
answer.push(head_char.to_string().repeat(counter));
head_char = c;
counter = 1;
}
}
answer.push(head_char.to_string().repeat(counter));
answer
}
This works as intended, but it is much more verbose than the average rust code that tackles iterative problems like these.
Solution 1:[1]
There doesn't seem to be a more functional translation of the original solutions, but there is a more idiomatic one:
struct LetterSequence {
char_type: char,
len: usize
}
impl LetterSequence {
fn new(a:char, b:usize) -> Self {
LetterSequence{char_type:a, len:b}
}
fn to_string(&self) -> String {
self.char_type.to_string().repeat(self.len)
}
}
fn split_char_struct(source:&str) -> Vec<LetterSequence> {
let mut answer: Vec<LetterSequence> = Vec::new();
let mut seq_count: usize = 1;
let mut head_char: char = source.chars().next().unwrap();
for c in source.chars().skip(1) {
if c == head_char {
seq_count += 1;
}
else {
answer.push(LetterSequence::new(head_char, seq_count));
head_char = c;
seq_count = 1;
}
}
answer.push(LetterSequence::new(head_char, seq_count));
answer}
With the help of the LetterSequence
struct, we avoid having to maintain potentially as many mutable strings as the total length of the starting &str
.
Solution 2:[2]
You could use itertools::Iterator::group_by
over s.chars()
(with the identity function), but it'll inherently be inefficient because when it comes time to collect the chars into a String, you'll have to allocate a String for each result. I think the only way to get slices into the original String is to do it manually find the slices yourself. (If itertools
offered a group_by_indices
function, then you could use it to construct the slices, but AFAIK it does not.)
fn char_runs(s: &str) -> Vec<&str> {
let mut slices = Vec::new();
let mut it = s.char_indices();
let (mut slice_start, mut prev_char) = match it.next() {
Some(pair) => pair,
None => return slices,
};
for (i, c) in it {
if c != prev_char {
slices.push(&s[slice_start..i]);
slice_start = i;
prev_char = c;
}
}
slices.push(&s[slice_start..]);
slices
}
fn main() {
let strings = ["", "a", "aa", "aab", "aabb", "aabbca", "aa??bb"];
for s in strings {
println!("{:?} => {:?}", s, char_runs(s));
}
// "" => []
// "a" => ["a"]
// "aa" => ["aa"]
// "aab" => ["aa", "b"]
// "aabb" => ["aa", "bb"]
// "aabbca" => ["aa", "bb", "c", "a"]
// "aa??bb" => ["aa", "??", "bb"]
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Peter Mortensen |
Solution 2 | mcarton |