'LDUR and STUR in ARM v8
I've had a couple of courses that touched on ARMv8 assembly, but both teachers described LDUR/STUR instructions a different way and now I've become pretty lost. Can someone help to clarify?
If I had the instruction:
LDUR R3, [R1, #8]
I'll be putting the answer in R3, but what am I taking from R1 and how does the offset operate? Is it like a logical shift? The ARM manual describes it as "byte offset" but then doesn't describe how that offset functions on R1. Do I shift the value stored in R1 (say R1 has value 50 in it) or is there a memory address outside of the R1 that I need to be thinking about? Other sources say I need to think of R1 as an array somehow?
Solution 1:[1]
LDUR is Load (unscaled) Register. It loads a value (32-bits or 64-bits) from an address plus an offset to a register. unscaled
means that in the machine-code, the offset will not be encoded with a scaled offset like ldr
uses, i.e. no shift will be applied to the immediate offset bits. The offset (simm signed immediate) will be added to the base register Xn|SP.
Thus it's possible to use displacements that aren't a multiple of 4 or 8 with ldur
, unlike with ldr
These are the prototypes for LDUR:
-- loads a 32-bit value
LDUR <Wt>, [<Xn|SP>{, #<simm>}]
-- loads a 64-bit value
LDUR <Xt>, [<Xn|SP>{, #<simm>}]
STUR is Store (unscaled) Register and works in the same way but it stores the value in a register to memory.
These are the prototypes for STUR:
-- stores a 32-bit register
STUR <Wt>, [<Xn|SP>{, #<simm>}]
-- stores a 64-bit register
STUR <Xt>, [<Xn|SP>{, #<simm>}]
LDUR/STUR allow accessing 32/64-bit values when they are not aligned to the size of the operand. For example, a 32-bit value stored at address 0x52.
In your example,
LDUR R3, [R1, #8]
this instruction will load to R3
the value pointed by R1
plus 8
bytes. This is what the ARM Reference Manual means by byte offset
.
So if R1
holds the value 0x50
, this will load the value stored at address 0x58
. The value of R1
will not be modified.
The instruction LDR R3, [R1, #8]
(LDR (immediate) the Unsigned offset variant) produces the same operation, however, the prototype is different:
-- loads a 32-bit value
LDR <Wt>, [<Xn|SP>{, #<pimm>}]
-- loads a 64-bit value
LDR <Xt>, [<Xn|SP>{, #<pimm>}]
The immediate offset pimm is different, LDUR uses a simm. This means that the offset is interpreted in a different way. The first (pimm) is a positive offset and its range is different for the 32-bit variant and the 64-bit variant.
In the 32 bit version:
- It ranges from 0 to 16380 and can only be a multiple of 4
In the 64 bit version:
- It ranges from 0 to 32760 and can only be a multiple of 8
This means that some of the offsets combinations of LDUR and LDR (immediate) are going to produce the same operation.
Solution 2:[2]
From ArmĀ® A64 Instruction Set Architecture: Armv8, for Armv8-A architecture profile
if HaveMTEExt() then
boolean is_load_store = MemOp_LOAD IN {MemOp_STORE, MemOp_LOAD};
SetNotTagCheckedInstruction(is_load_store && n == 31);
bits(64) address;
bits(datasize) data;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];
address = address + offset;
data = Mem[address, datasize DIV 8, AccType_NORMAL];
X[t] = ZeroExtend(data, regsize);
this pseudo-code shows how does the offset operates is applied.
Solution 3:[3]
In unsigned offset mode, LDR's imm should keep 4 byte or 8 byte align.
If imm % 4 == 0, stur Wt, [Xn|SP, #imm] is equal to str Wt, [Xn|SP, #imm] in A32.
If imm % 8 == 0, stur Xt, [Xn|SP, #imm] is equal to str Xt, [Xn|SP, #imm] in A64.
LDR can not addressing from -256 to 255 byte by byte while keeping the base register unchanged. That's what LDUR does.
Solution 4:[4]
Instruction Meaning
LDUR R3, [R1, #8]
if:
R1=50
then:
[R1, #8]
= value of address for 58 (=50
+#8
)LDUR R3, [R1, #8]
= Load value of address for 58 toR3
register
Q&A
but what am I taking from R1 and how does the offset operate?
offset #8
operate normally, just: R1
+ #8
= 50 + 8
= 58
Is it like a logical shift?
no any logical shift.
The ARM manual describes it as "byte offset" but then doesn't describe how that offset functions on R1.
ARM manual is here: LDUR, the full description is
LDUR Wt, [Xn|SP{, #simm}] ; 32-bit general registers ... simm Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0.
I have same confusion same with you before. now clear:
Detail Explanation
LDR
Function: Load Register (immediate)
syntax
LDR Wt, [Xn|SP], #simm ; 32-bit general registers, Post-index
LDR Xt, [Xn|SP], #simm ; 64-bit general registers, Post-index
LDR Wt, [Xn|SP, #simm]! ; 32-bit general registers, Pre-index
LDR Xt, [Xn|SP, #simm]! ; 64-bit general registers, Pre-index
LDR Wt, [Xn|SP{, #pimm}] ; 32-bit general registers
LDR Xt, [Xn|SP{, #pimm}] ; 64-bit general registers
->
- 32-bit general registers
- simm
- Post-index
- LDR Wt, [Xn|SP], #simm ;
- Pre-index
- LDR Wt, [Xn|SP, #simm]!
- Post-index
- pimm
- LDR Wt, [Xn|SP{, #pimm}] ;
- simm
- 64-bit general registers
- simm
- Post-index
- LDR Xt, [Xn|SP], #simm ;
- Pre-index
- LDR Xt, [Xn|SP, #simm]! ;
- Post-index
- pimm
- LDR Xt, [Xn|SP{, #pimm}] ;
- simm
-?
- 32-bit / 64-bit general registers
- simm
- Post-index
- LDR Wt/Xt, [Xn|SP], #simm ;
- Pre-index
- LDR Wt/Xt, [Xn|SP, #simm]!
- Post-index
- pimm
- LDR Wt/Xt, [Xn|SP{, #pimm}] ;
- simm
Note:
- value range
- simm?
-256 ~ 255
- pimm
- 32-bit :
0 ~ 16380
- but is must mutilple of 4, that is:
pimm % 4 == 0
- but is must mutilple of 4, that is:
- 64-bit :
0 ~ 32760
- but is must mutilple of 8, that is:
pimm % 8 == 0
- but is must mutilple of 8, that is:
- 32-bit :
- simm?
LDUR
Function: Load register (unscaled offset)
syntax:
LDUR Wt, [Xn|SP{, #simm}] ; 32-bit general registers
LDUR Xt, [Xn|SP{, #simm}] ; 64-bit general registers
->
- 32-bit / 64-bit general registers
- LDUR Wt/Xt, [Xn|SP{, #simm}]
Note:
- value range
- simm?
-256 ~ 255
- simm?
LDUR vs LDR
Let's talk about LDR first:
It supports 3 ways to get value
- The first type: LDR Wt/Xt, [Xn|SP], #simm ;
- called: Post-index
- Second: LDR Wt/Xt, [Xn|SP, #simm]!
- called: Pre-index
- The third type: LDR Wt/Xt, [Xn|SP{, #pimm}] ;
Note:
The commonly used writing method here is: the third:
ldur q0, [x19, #0xa8]
which is:
The last part is:
[register name, #immediate data]
And: LDUR also supports the third type (the first and second types are not supported)
Then comes the difference:
- The third type of LDR:
LDR Wt/Xt, [Xn|SP{, #pimm}]
- pimm value range:
*Different cases
* 32-bit:
0 ~ 16380
, andpimm % 4 == 0
(is a multiple of 4) * 64-bit:0 ~ 32760
, andpimm % 8 == 0
(is a multiple of 8)- Key point: MUST be a multiple of so-and-so (4 or 8)
- pimm value range:
*Different cases
* 32-bit:
- LDUR: LDUR Wt/Xt, [Xn|SP{, #simm}]
- simm value range:
-256 ~ 255
- Important: NO need to be a multiple of so-and-so (4 or 8)
- simm value range:
-> must be a multiple of 4 (or 8), which is called:
offset (i.e. imm here) is scaled by 4 (or 8)
-> Here:
- scale=scale = a multiple of= is a multiple of so-and-so
- Must be a multiple of 4 or 8
- So when addressing, it must be 4 bytes or 8 bytes to address
- Must be a multiple of 4 or 8
- unscale=unscaled = not necessarily a multiple of so-and-so
- Not necessarily a multiple of 4 or 8
- So when addressing, it does not have to be 4-byte or 8-byte addressing
- Can be addressed by one byte by oneself = addressing byte by byte
- So when addressing, it does not have to be 4-byte or 8-byte addressing
- Not necessarily a multiple of 4 or 8
-?
- LDR=LoaD Register = LoaD (Scaled) Register = Load scaled immediate value into register
- LDUR=LoaD Unscaled Register = load unscaled immediate value into register
-> which is:
Syntax of LDR and LDUR:
LDR Wt/Xt, [Xn|SP{, #imm}]
the core differences are:
- value of imm
- The value range is different
- LDR: relatively large (32-bit 0~16380, 64-bit 0~32760), and all positive numbers
- LDUR: relatively small, only -256~255, and can be positive or negative
- Numerical requirements vary
- LDR: imm must be a multiple of so-and-so
- Depending on the platform, 32-bit or 64-bit, 4 or 8
- 32-bit: imm%4==0
- 64-bit: imm%8==0
- Depending on the platform, 32-bit or 64-bit, 4 or 8
- LDUR: imm does not need to be a multiple of so-and-so
- LDR: imm must be a multiple of so-and-so
- The value range is different
Diff Meaning in Real Case
The difference is found, that:
- Q: What is the difference between LDR and LDUR corresponding to the actual usage meaning or scenario?
- A: If you want to address byte by byte (and the range does not exceed -256-255), you can only use LDUR, that is, the minimum moving unit of imm is 1
- Otherwise, LDR cannot be used, because imm in LDR must be at least a multiple of 4 or 8.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | schspa |
Solution 3 | |
Solution 4 | crifan |