As explained in this post: Why is integer assignment on a naturally aligned variable atomic on x86? : Memory load/store on a byte value - and any correctly alig
Consider int t = 0; for( int i = 0; i < 8; i++ ) { for( int j = 0; j < 8; j++ ) { t = t + i*j; } } Ex: Create a branch history table in t =
I am confused between branching instructions BZ and BNZ. Can anybody, please, explain the concept and working of BZ and BNZ with an example?
Specifically is: mov %eax, %ds Slower than mov %eax, %ebx Or are they the same speed. I've researched online, but have been unable to find a definitive an
I know store buffer and invalidate queues are reasons that cause memory reordering. What I don't know is if Out-of-Order-Execution can cause memory reordering.
Which problems arise in the following assembly loop, if Predict Not Taken is chosen by default? Optimize the example to Predict not Taken. addi $s1, $zero, 1024
I'm reading the mail list about LKMM: Add volatile_if(). The control dependency is somewhat subtle since it is easily forgotten by us developers. So I wonder i
After serious development, CPUs gained many cores, gained distributed blocks of cores on multiple chiplets, numa systems, etc but still a piece of data has to p
For example, in case of 32-bit processors, a word is 4-byte long. Is it also possible to use 5-byte word or others?
Hello Forum – I have a few similar/related questions about SIMD intrinsic for which I searched online including stackoverflow but did not find good answer
After reading this post (answer on StackOverflow) (at the optimization section), I was wondering why conditional moves are not vulnerable for Branch Prediction
I keep seeing people claim that the MOV instruction can be free in x86, because of register renaming. For the life of me, I can't verify this in a single tes
Here is a piece of C++ code that shows some very peculiar behavior. For some strange reason, sorting the data (before the timed region) miraculously makes the l
I'm really confused on the difference between bubbles, stalls, and repeated decoding/fetching. My text is the Patterson text, 3rd edition. Example 1: add $3,
I have found something unexpected (to me) using the Intel® Architecture Code Analyzer (IACA). The following instruction using [base+index] addressing add