'How can some architectures guarantee that aligned memory operations are atomic?

As explained in this post: Why is integer assignment on a naturally aligned variable atomic on x86? :

Memory load/store on a byte value - and any correctly aligned value up to 64 bits is guaranteed to be atomic on x86.

But what if:

1- The data crosses cache line boundaries. Assume I have short a = 1234; and address of a is halfword aligned. But for some reason 2 byte data is split between 2 cache lines hence CPU needs to do extra work to fetch and concatenate. How can this remain atomic?

2- The value is paged. Assume a value which CPU is trying to fetch is properly aligned but it's not even in cache or memory. Now it needs to fetch it all the way from disk. How it this still atomic?

I like to ask a third related question while we are at it:

3- Why does the data need to be aligned to its data type at all? Why isn't it enough if it is within a cache line block as every memory load/store is done is cache line blocks and not in specific data sizes?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source