'Why does GCC include an "empty" XOR

I have following piece of code:

typedef struct {
        int x;
        int y;
        int z;
        int w;
} s32x4;

s32x4
f() {
        s32x4 v;
        v.x = 0

        return v;
}

which generates (gcc -O2):

f:
        xor     eax, eax
        xor     edx, edx          ; this line is questionable
        ret

where clang outputs (clang -O2):

f:                                      # @f
        xor     eax, eax
        ret

Questions

  • Is there a reason why GCC inserts an XOR there?
  • If there isn't a good reason for it: can I somehow get rid of it?

Note



Solution 1:[1]

You read a partly uninitialized struct object to return it, which is (arguably) Undefined Behaviour on the spot, even if the caller doesn't use the return value.

The 16-byte struct is returned in RDX:RAX in the x86-64 System V ABI (any larger and it would be returned by having the caller pass a pointer to a return-value object). GCC is zeroing the uninitialized parts, clang is leaving whatever garbage was there.

GCC loves to break dependencies any time there might be a risk of coupling a false dependency into something. (e.g. pxor xmm0,xmm0 before using the badly-designed cvtsi2sd xmm0, eax). Clang is more "aggressive" in leaving that out, sometimes even when there's only a tiny code-size benefit for doing so, e.g. using mov al, 1 instead of mov eax,1, or mov al, [rdi] instead of movzx eax, byte ptr [rdi])


The simplest form of what you're seeing is returning an uninitialized plain int,
same difference between GCC and clang code-gen:

int foo(){
    int x;
    return x;
}

(Godbolt)

# clang 11.0.1 -O2
foo:
        # leaving EAX unwritten
        ret


# GCC 10.2 -O2
foo:
        xor     eax, eax        # return 0
        ret

Here clang "gets to" leave out a whole instruction. Of course it's undefined behaviour (reading an uninitialized object), so the standard allows literally anything, including ud2 (guaranteed to raise an illegal instruction exception), or omitting even the ret on the assumption that this code-path is unreachable, i.e. the function will never be called. Or to return 0xdeadbeef, or call any other function, if you have a malicious DeathStation 9000 C implementation.

Solution 2:[2]

The easiest way of handling some corner cases where the Standard defines the behavior of programs that uses the value of an uninitialized automatic variable is to initialize such values zero zero. A compiler that does that will avoid the need for any other corner-case handling.

Consider, for example, how something like:

#include <string.h>
extern unsigned short volatile vv;
int test(int a, int mode)
{
    unsigned short x,y;

    if (mode)
        x=vv;
    memcpy(&y,&x,sizeof x);
    return y;
}

should be processed on a platform which uses 32-bit registers to hold all automatic objects of all integer types 32 bits and smaller. If mode is zero, this function should copy two unspecified byte values into the bytes comprising y and return that, causing it to hold an arbitrary number in the range 0-65535. On ARM GCC 4.5.4, however, this function would use R0 register to hold x and y, without ever writing to that register in the 'mode==0' case. This would result in y behaving as though it holds whatever was passed as the first argument, even if that value was outside the range 0-65535. Later versions avoid this issue by pre-initializing R0 to zero (which is of course always in the range 0-65535).

I'm not sure if gcc's decision to zero things in the OP's example is a result of trying to preempt corner cases that might otherwise be problematic, but certainly some situations where it pre-zeroes things in cases not required by the Standard seem to stem from such a goal.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 supercat