'Serialization without Boost.Serialization
I'm trying to implement a simple serialization/deserialization method for my code to be able to pass an object over the network using MPI. In an ideal world I would have used Boost.Serialization
and Boost.MPI
for that but they are not installed on some of the clusters I have access to so I'm considering doing this myself.
My strategy is to serialize every object into a std::stringstream
object and then send a message via MPI_Send
using MPI_CHAR
as the datatype. In such a case I would pass std::stringstream::str()::c_str()
as the pointer and std::streaingstream::str()::size()*sizeof(char)
as the size of the message.
I've figured how to serialize everything into a std::stringstream
object. My deserialization method also takes a std::stringstream
object and deserializes everything back. This works fine except I do not know how to create a std::stringstream
object from an array of char
s and avoid the extra copy from the array into the stream. Should I change my deserialization method to directly work with an array of char
using memcpy
instead?
Solution 1:[1]
The MPI way of doing this, would be using MPI_Pack
and MPI_Unpack
. Of course that is C and might not be as convenient as something using C++ features. For a simple example see http://www.mcs.anl.gov/research/projects/mpi/tutorial/mpiexmpl/src/bcast/C/pack/solution.html
Solution 2:[2]
Use an istrstream
, which extracts from a char
array. The header for this is <strstream>
. And, yes, formally it's deprecated in the C++ Standard. The committee indulged in a great deal of wishful thinking in its early days. istrstream
is not going to go away.
Solution 3:[3]
Although the accepted answer is the right way to go, the concrete question "except I do not know how to create a std::stringstream
object from an array of chars" was not answered yet.
The answer is that effectively you can't.
Presumably have an object already in memory and you want so see it as an stringstream
, even if you can make it work the stringstream
will not be able to grow further in the existing memory (which is an useful property of the stream buffer.).
Also there is no much point into making heroic attempts to not copy data that is going to be send in an MPI message. MPI might end up copying the data internally anyway, to "pinned" memory for example depending on the exact method of communication (buffered, unbuffered, asynchronous, etc). That is my understanding.
If the data is too large to be copied at all (e.g. many gigabytes) then it is better to use MPI_Datatype
s which can be a real challenge to handle in generic ways.
BTW, the accepted answer is what my MPI library does internally https://gitlab.com/correaa/boost-mpi3, it is probably what the original Boost.MPI does too.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Zulan |
Solution 2 | Pete Becker |
Solution 3 |