'Using 'memcpy()' inside a class with a union

I have a class foo that manages data using small buffer optimization (SBO).
When size < 16, the data is held locally (in buffer), otherwise it is stored on the heap, with reserved holding the allocated space.

class foo {
    static const int sbo_size = 16;

    long size = 0;
    char *ptr;

    union {
        char buffer[sbo_size];
        long reserved;
    };
public:

    foo()
    {
        for (int i = 0; i < sbo_size; ++i)
            buffer[i] = 0;
    }

    void clone(const foo &f)
    {
        // release 'ptr' if necessary

        if (f.size < sbo_size)
        {
            memcpy(this, &f, sizeof(foo));
            ptr = buffer;
        } else
        {
            // handle non-sbo case
        }
    }
};

Question about clone():
With the SBO case, it may not be clear for the compiler that union::buffer will be used.
is it correct to use memcpy and set ptr accordingly?



Solution 1:[1]

If you can use C++17, I would side-step any potential type-punning problems by using std::variant in place of a union.

Although this uses a small amount of storage internally to keep track of the current type it contains, it's probably a win overall as your ptr variable can disappear (although that should be inside your union anyway).

It's also typesafe, which a union is not (because std::get will throw if the variant doesn't contain the desired type) and will keep track of the type of data it contains simply by assigning to it.

The resulting class fragment might look something like this (no doubt this code can be improved):

class foo
{
private:
    static const size_t sbo_size = 16;
    using small_buf = std::array <char, sbo_size>;
    size_t size = 0;
    std::variant <small_buf, char *> buf = { };

public:
    void clone (const foo &f)
    {
        char **bufptr = std::get_if <char *> (&buf);
        if (bufptr)
            delete [] *bufptr;

        size = f.size;
        if (size < sbo_size)
            buf = std::get <small_buf> (f.buf);
        else
        {
            buf = new char [size];
            std::memcpy (std::get <char *> (buf), std::get <char *> (f.buf), size);
        }
    }
};

Notes:

  • You will see that I've used std::array instead of a C-style array because std:array has lots of nice features that C-style arrays do not

  • Why clone and not a copy constructor?

  • if you want foo to have an empty state (after being default constructed, say), then you can look into the strangely named std::monostate.

  • For raw storage, std::byte is probably to be preferred over char.

Fully worked example here.


Edit: To answer the question as posed, I am no language lawyer but it seems to me that, inside clone, the compiler has no clue what the active member of f might be as it has, in effect, been parachuted in from outer space.

In such circumstances, I would expect compiler writers to play it safe and set the active member of the union to "don't know" until some concrete information comes along. But (and it's a big but), I wouldn't like to bet my shirt on that. It's a complex job and compiler writers do make mistakes.

So, in a spirit of sharing, here's a slightly modified version of your original code which fixes that. I've also moved ptr inside your union since it clearly belongs there:

class foo {
    static const int sbo_size = 16;

    long size = 0;

    union {
        std::array <char, sbo_size> buffer;   // changing this
        char *ptr;
        long reserved;
    };
public:

    foo()
    {
        for (int i = 0; i < sbo_size; ++i)
            buffer[i] = 0;
    }

    void clone(const foo &f)
    {
        // release 'ptr' if necessary

        if (f.size < sbo_size)
        {
            buffer = f.buffer;                // lets me do this
            ptr = buffer.data ();
        } else
        {
            // handle non-sbo case
        }
    }
};

So you can see, by using std::array for buffer (rather than one of those hacky C-style arrays), you can directly assign to it (rather than having to resort to memcpy) and the compiler will then make that the active member of your union and you should be safe.

In conclusion, the question is actually rather meaningless since one shouldn't (ever) need to write code like that. But no doubt someone will immediately come up with something that proves me wrong.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1