'How can I deserialize a type where all the fields are default values as a None instead?

I have to deserialize JSON blobs where in some places the absence of an entire object is encoded as an object with the same structure but all of its fields set to default values (empty strings and zeroes).

extern crate serde_json; // 1.0.27
#[macro_use] extern crate serde_derive; // 1.0.78
extern crate serde; // 1.0.78

#[derive(Debug, Deserialize)]
struct Test<T> {
    text: T,
    number: i32,
}

#[derive(Debug, Deserialize)]
struct Outer {
    test: Option<Test<String>>,
}

#[derive(Debug, Deserialize)]
enum Foo { Bar, Baz }
#[derive(Debug, Deserialize)]
struct Outer2 {
    test: Option<Test<Foo>>,
}

fn main() {
    println!("{:?}", serde_json::from_str::<Outer>(r#"{ "test": { "text": "abc", "number": 42 } }"#).unwrap());
    // good: Outer { test: Some(Test { text: "abc", number: 42 }) }

    println!("{:?}", serde_json::from_str::<Outer>(r#"{ "test": null }"#).unwrap());
    // good: Outer { test: None }

    println!("{:?}", serde_json::from_str::<Outer>(r#"{ "test": { "text": "", "number": 0 } }"#).unwrap());
    // bad: Outer { test: Some(Test { text: "", number: 0 }) }
    // should be: Outer { test: None }

    println!("{:?}", serde_json::from_str::<Outer2>(r#"{ "test": { "text": "Bar", "number": 42 } }"#).unwrap());
    // good: Outer2 { test: Some(Test { text: Bar, number: 42 }) }

    println!("{:?}", serde_json::from_str::<Outer2>(r#"{ "test": { "text": "", "number": 0 } }"#).unwrap());
    // bad: error
    // should be: Outer { test: None }
}

I would handle this after deserialization but as you can see this approach is not possible for enum values: no variant matches the empty string so the deserialization fails entirely.

How can I teach this to serde?



Solution 1:[1]

There are two things that need to be solved here: replacing Some(value) with None if value is all defaults, and handling the empty string case for Foo.

The first thing is easy. The Deserialize implementation for Option unconditionally deserializes it as Some if the input field isn't None, so you need to create a custom Deserialize implementation that replaces Some(value) with None if the value is equal to some sentinel, like the default (this is the answer proposed by Issac, but implemented correctly here):

fn none_if_all_default<'de, T, D>(deserializer: D) -> Result<Option<T>, D::Error>
where
    T: Deserialize<'de> + Default + Eq,
    D: Deserializer<'de>,
{
    Option::deserialize(deserializer).map(|opt| match opt {
        Some(value) if value == T::default() => None,
        opt => opt,
    })
}

#[derive(Deserialize)]
struct Outer<T: Eq + Default> {
    #[serde(deserialize_with = "none_if_all_default")]
    #[serde(bound(deserialize = "T: Deserialize<'de>"))]
    test: Option<Test<T>>,
}

This solves the first half of your problem, with Option<Test<String>>. This will work for any deserializable type that is Eq + Default.

The enum case is much more tricky; the problem you're faced with is that Foo simply won't deserialize from a string other than "Bar" or "Baz". I don't really see a good solution for this other than adding a third "dead" variant to the enum:

#[derive(PartialEq, Eq, Deserialize)]
enum Foo {
    Bar,
    Baz,

    #[serde(rename = "")]
    Absent,
}

impl Default for Foo { fn default() -> Self { Self::Absent } }

The reason this problem exists from a data-modeling point of view is that it has to account for the possibility that you'll get json like this:

{ "test": { "text": "", "number": 42 } }

In this case, clearly Outer { test: None } is not the correct result, but it still needs a value to store in Foo, or else return a deserialization error.

If you want it to be the case that "" is valid text only if number is 0, you could do something significantly more elaborate and probably overkill for your needs, compared to just using Absent. You'd need to use an untagged enum, which can store either a "valid" Test or an "all empty" Test, and then create a version of your struct that only deserializes default values:

struct MustBeDefault<T> {
    marker: PhantomData<T>
}

impl<'de, T> Deserialize<'de> for MustBeDefault<T>
where
    T: Deserialize<'de> + Eq + Default
{
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>
    {
        match T::deserialize(deserializer)? == T::default() {
            true => Ok(MustBeDefault { marker: PhantomData }),
            false => Err(D::Error::custom("value must be default"))
        }
    }
}

// All fields need to be generic in order to use this solution.
// Like I said, this is radically overkill.
#[derive(Deserialize)]
struct Test<T, U> {
    text: T,
    number: U,
}

#[derive(Deserialize)]
#[serde(untagged)]
enum MaybeDefaultedTest<T> {
    AllDefault(Test<EmptyString, MustBeDefault<i32>>),
    Normal(Test<Foo, i32>),
}

// `EmptyString` is a type that only deserializes from empty strings;
// its implementation is left as an exercise to the reader.
// You'll also need to convert from MaybeDefaultedTest<T> to Option<T>;
// this is also left as an exercise to the reader.

It is now possible to write MaybeDefaulted<Foo>, which will deserialize from things like {"text": "", "number": 0} or {"text": "Baz", "number": 10} or {"text": "Baz", "number": 0}, but will fail to deserialize from {"text": "", "number": 10}.

Again, for the third time, this solution is probably radically overkill (especially if your real-world use case involves more than 2 fields in the Test struct), and so unless you have very intense data modeling requirements, you should go with adding an Absent variant to Foo.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1