'How to compose two calls to Regex::replace_all?

Regex::replace_all has the signature fn (text: &str) -> Cow<str>. How would two calls to this be written, f(g(x)), giving the same signature?

Here's some code I'm trying to write. This has the two calls separated out into two functions, but I couldn't get it working in one function either. Here's my lib.rs in a fresh Cargo project:

#![allow(dead_code)]

/// Plaintext and HTML manipulation.

use lazy_static::lazy_static;
use regex::Regex;
use std::borrow::Cow;

lazy_static! {
    static ref DOUBLE_QUOTED_TEXT: Regex = Regex::new(r#""(?P<content>[^"]+)""#).unwrap();
    static ref SINGLE_QUOTE:       Regex = Regex::new(r"'").unwrap();
}


fn add_typography(text: &str) -> Cow<str> {
    add_double_quotes(&add_single_quotes(text)) // Error! "returns a value referencing data owned by the current function"
}

fn add_double_quotes(text: &str) -> Cow<str> {
    DOUBLE_QUOTED_TEXT.replace_all(text, "“$content”")
}

fn add_single_quotes(text: &str) -> Cow<str> {
    SINGLE_QUOTE.replace_all(text, "’")
}


#[cfg(test)]
mod tests {
    use crate::{add_typography};

    #[test]
    fn converts_to_double_quotes() {
        assert_eq!(add_typography(r#""Hello""#), "“Hello”");
    }

    #[test]
    fn converts_a_single_quote() {
        assert_eq!(add_typography("Today's Menu"), "Today’s Menu");
    }
}

Here's the best I could come up with, but this will get ugly fast when chaining three or four functions:

fn add_typography(input: &str) -> Cow<str> {
    match add_single_quotes(input) {
        Cow::Owned(output) => add_double_quotes(&output).into_owned().into(),
        _                  => add_double_quotes(input),
    }
}


Solution 1:[1]

A Cow contains maybe-owned data.

We can infer from what the replace_all function does that it returns borrowed data only if substitutions did not happen, otherwise it has to return new, owned data.

The problem arises when the inner call makes a substitution but the outer one does not. In that case, the outer call will simply pass its input through as Cow::Borrowed, but it borrows from the Cow::Owned value returned by the inner call, whose data now belongs to a Cow temporary that is local to add_typography(). The function would therefore return a Cow::Borrowed, but would borrow from the temporary, and that's obviously not memory-safe.

Basically, this function will only ever return borrowed data when no substitutions were made by either call. What we need is a helper that can propagate owned-ness through the call layers whenever the returned Cow is itself owned.

We can construct a .map() extension method on top of Cow that does exactly this:

use std::borrow::{Borrow, Cow};

trait CowMapExt<'a, B>
    where B: 'a + ToOwned + ?Sized
{
    fn map<F>(self, f: F) -> Self
        where F: for <'b> FnOnce(&'b B) -> Cow<'b, B>;
}

impl<'a, B> CowMapExt<'a, B> for Cow<'a, B>
    where B: 'a + ToOwned + ?Sized
{
    fn map<F>(self, f: F) -> Self
        where F: for <'b> FnOnce(&'b B) -> Cow<'b, B>
    {
        match self {
            Cow::Borrowed(v) => f(v),
            Cow::Owned(v) => Cow::Owned(f(v.borrow()).into_owned()),
        }
    }
}

Now your call site can stay nice and clean:

fn add_typography(text: &str) -> Cow<str> {
    add_single_quotes(text).map(add_double_quotes)
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1