'Is allowing library users to embed arbitrary data in your structures a correct usage of std::mem::transmute?

A library I'm working on stores various data structures in a graph-like manner. I'd like to let users store metadata ("annotations") in nodes, so they can retrieve them later. Currently, they have to create their own data structure which mirrors the library's, which is very inconvenient.

I'm placing very little constraints on what an annotation can be, because I do not know what the users will want to store in the future. The rest of this question is about my current attempt at solving this use case, but I'm open to completely different implementations as well.


User annotations are represented with a trait:

pub trait Annotation {
    fn some_important_method(&self)
}

This trait contains a few methods (all on &self) which are important for the domain, but these are always trivial to implement for users. The real data of an annotation implementation cannot be retrieved this way.

I can store a list of annotations this way:

pub struct Node {
    // ...
    annotations: Vec<Box<dyn Annotation>>,
}

I'd like to let the user retrieve whatever implementation they previously added to a list, something like this:

impl Node {
    fn annotations_with_type<T>(&self) -> Vec<&T>
    where
        T: Annotation,
    {
        // ??
    }
}

I originally aimed to convert dyn Annotation to dyn Any, then use downcast_ref, however trait upcasting coercion is unsable.

Another solution would be to require each Annotation implementation to store its TypeId, compare it with annotations_with_type's type parameter's TypeId, and std::mem::transmute the resulting &dyn Annotation to &T… but the documentation of transmute is quite scary and I honestly don't know whether that's one of the allowed cases in which it is safe. I definitely would have done some kind of void * in C.

Of course it's also possible that there's a third (safe) way to go through this. I'm open to suggestions.



Solution 1:[1]

What you are describing is commonly solved by TypeMaps, allowing a type to be associated with some data.

If you are open to using a library, you might consider looking into using an existing implementation, such as https://crates.io/crates/typemap_rev, to store data. For example:

struct MyAnnotation;

impl TypeMapKey for MyAnnotation {
    type Value = String;
}

let mut map = TypeMap::new();
map.insert::<MyAnnotation>("Some Annotation");

If you are curious. It underlying uses a HashMap<TypeId, Box<(dyn Any + Send + Sync)>> to store the data. To retrieve data, it uses a downcast_ref on the Any type which is stable. This could also be a pattern to implement it yourself if needed.

Solution 2:[2]

You don't have to worry whether this is valid - because it doesn't compile (playground):

error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
 --> src/main.rs:7:18
  |
7 |     _ = unsafe { std::mem::transmute::<&dyn Annotation, &i32>(&*v) };
  |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: source type: `&dyn Annotation` (128 bits)
  = note: target type: `&i32` (64 bits)

The error message should be clear, I hope: &dyn Trait is a fat pointer, and has size 2*size_of::<usize>(). &T, on the other hand, is a thin pointer (as long as T: Sized), of size of only one usize, and you cannot transmute between types of different sizes.

You can work around that with transmute_copy(), but it will just make things worse: it will work, but it is unsound and is not guaranteed to work in any way. It may become UB in future Rust versions. This is because the only guaranteed thing (as of now) for &dyn Trait references is:

Pointers to unsized types are sized. The size and alignment is guaranteed to be at least equal to the size and alignment of a pointer.

Nothing guarantees the order of the fields. It can be (data_ptr, vtable_ptr) (as it is now, and thus transmute_copy() works) or (vtable_ptr, data_ptr). Nothing is even guaranteed about the contents. It can not contain a data pointer at all (though I doubt somebody will ever do something like that). transmute_copy() copies the data from the beginning, meaning that for the code to work the data pointer should be there and should be first (which it is). For the code to be sound this needs to be guaranteed (which is not).

So what can we do? Let's check how Any does its magic:

// SAFETY: caller guarantees that T is the correct type
unsafe { &*(self as *const dyn Any as *const T) }

So it uses as for the conversion. Does it work? Certainly. And that means std can do that, because std can do things that are not guaranteed and relying on how things work in practice. But we shouldn't. So, is it guaranteed?

I don't have a firm answer, but I'm pretty sure the answer is no. I have found no authoritative source that guarantees the behavior of casts from unsized to sized pointers.

Edit: @CAD97 pointed on Zulip that the reference promises that *[const|mut] T as *[const|mut V] where V: Sized will be a pointer-to-pointer case, and that can be read as a guarantee this will work.

But I still feel fine with relying on that. Because, unlike the transmute_copy(), people are doing it. In production. And there is no better way in stable. So the chance it will become undefined behavior is very low. It is much more likely to be defined.

Does a guaranteed way even exist? Well, yes and no. Yes, but only using the unstable pointer metadata API:

#![feature(ptr_metadata)]

let v: &dyn Annotation;
let v = v as *const dyn Annotation;
let v: *const T = v.to_raw_parts().0.cast::<T>();
let v: &T = unsafe { &*v };

In conclusion, if you can use nightly features, I would prefer the pointer metadata API just to be extra safe. But in case you can't, I think the cast approach is fine.

Last point, there may be a crate that already does that. Prefer that, if it exists.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Shepmaster
Solution 2