'RISC-V build 32-bit constants with LUI and ADDI
LUI (load upper immediate) is used to build 32-bit constants and uses the U-type format. LUI places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros.
I found this in manual, but if I want to move 0xffffffff to a register, all the code I need is:
LUI x2, 0xfffff000
ADDI x2, x2, 0xfff
But a problem occurred, ADDI will extend sign to make a immediate data to a signed number, so 0xfff
will be extend to 0xffffffff
.
It make x2
to 0xffffefff
but not 0xffffffff
and what is an good implementation to move a 32bits immediate to register?
Solution 1:[1]
TL;DR: The 32-bit constant you want to load into x2
is 0xffffffff
which corresponds to -1. Since -1 is in the range [-2048, 2047], this constant can be loaded with a single instruction: addi x2, zero, -1
. You can also use the li
pseudoinstruction: li, x2, -1
which the assembler, in turn, translates to addi x2, zero, -1
.
Loading a 32-bit constant with a lui
+addi
sequence
In general, we need a lui
+addi
sequence – two instructions – for loading a 32-bit constant into a register. The lui
instruction encodes a 20-bit immediate, whereas the addi
instruction encodes a 12-bit immediate. lui
and addi
can be used to load the upper 20 bits and the lower 12 bits of a 32-bit constant, respectively.
Let N be a 32-bit constant we want to load into a register: N ? n31 ... n0. Then, we can split this constant into its upper 20 bits and lower 12 bits, NU and NL, respectively: NU ? n31 ... n12 ; NL ? n11 ... n0
In principle, we encode NU in the immediate in lui
and NL in the immediate in addi
. Nevertheless, there is a difficulty to handle if the most significant bit of the 12-bit immediate in addi
is 1 because the immediate value encoded in the addi
instruction is sign extended to 32 bits. If this is the case, the addi
instruction adds to the destination register not NL, but NL - 4096 instead — -4096 (or -212) is the resulting number when the upper 20 bits are 1s and the lower 12 bits are 0s.
To compensate for the unwanted term -4096, we can add 1 to lui
's immediate – the LSB of the immediate in lui
corresponds to bit #12 – so, adding 1 to this immediate results in adding 4096 to the destination register which cancels out the -4096 term.
Loading a 32-bit constant with a single addi
instruction
The issue explained above is due to the sign extension that the immediate in addi
undergoes. The decision of sign extending addi
's immediate was probably to allow the loading of small integers – integers between -2048 and 2047, both inclusive – with a single addi
instruction. For example, if the immediate in addi
were zero extended instead of sign extended, it wouldn't be possible to load such a frequent constant like -1 into a register with just a single instruction.
Loading a 32-bit constant with the li
pseudoinstruction
In any case, you can always use the li
pseudoinstruction for loading a 32-bit constant without having to care about what the value of the constant to load is. This pseudoinstruction can load any 32-bit number into a register, and it is, therefore, simpler to use and less error-prone than manually writing the lui
+addi
sequence.
If the number fits in addi
's immediate field ([-2048, 2047]), the assembler will translate the li
pseudoinstruction into just an addi
instruction, otherwise, li
will be translated into a lui
+addi
sequence and the complication explained above is handled automatically by the assembler.
Solution 2:[2]
The RISC-V assembler supports the pseudo-instruction li x2, 0xFFFFFFFF
.
Let N
is a signed, 2's complement 32 bit integer.
Common case implementation of li x2,N
is:
# sign extend low 12 bits
M=(N << 20) >> 20
# Upper 20 bits
K=((N-M) >> 12) <<12
# Load upper 20 bits
LUI x2,K
# Add lower bits
ADDI x2,x2,M
Of course, to load short immediate li
can use
addi x2,x0,imm
So, li x2, 0xFFFFFFFF
is addi x2,x0,-1
.
Solution 3:[3]
I was going to say "use ORI
instead of ADDI
" but then I read the Instruction Set Manual and it turns out that that doesn't work either, because all of the lower-12 Immediate operands get sign-extended, even for logical operations.
AFAICT you have to bias the value you put into the upper 20 bits in a way that anticipates the effect of the instruction you use to set the lower 12 bits. So if you want to end up with a value X in the top 20 bits and you're going to use ADDI
to set the lower 12 bits, and those lower 12 bits have a 1 in the leftmost position, you must do LUI (X+1)
rather than LUI X
. Similarly if you are going to use XORI
to set the lower 12 bits, and those lower 12 bits have a 1 in the leftmost position, you must do LUI (~X)
(that is, the bitwise inverse of X) rather than LUI X
.
But before you do any of that, I'd look to see whether your assembler already has some sort of "load immediate" pseudo-op or macro that will take care of this for you. If it doesn't, then see if you can write one :-)
It's not unusual for RISC processors to need this kind of extra effort from the programmer (or, more usually, from the compiler). The idea is "keep the hardware simple so it can go fast, and it doesn't matter if that makes it harder to construct the software".
Solution 4:[4]
In practice, just use an li
pseudo-instruction that gets the assembler to optimize to one instruction if possible (a single lui or a single addi), and if not does the math for you.
li t0, 0x12345678
li t1, 123
li t2, -1
li t3, 0xffffffff # same as -1 in 32-bit 2's complement
li t4, 1<<17
I separated each "group" with spaces. Only the first one (into t0
) needed two instructions.
$ clang -c -target riscv32 rv.s # on my x86-64 Arch GNU/Linux desktop
$ llvm-objdump -d rv.o
...
00000000 <.text>:
0: 01 00 nop
2: 01 00 nop
4: b7 52 34 12 lui t0, 74565
8: 93 82 82 67 addi t0, t0, 1656
c: 13 03 b0 07 addi t1, zero, 123
10: fd 53 addi t2, zero, -1
12: 7d 5e addi t3, zero, -1
14: b7 0e 02 00 lui t4, 32
If you do want to do it manually, most assemblers for RISC-V (or at least GAS / clang) have %lo
and %hi
"macros" so you can lui dst, %hi(value)
/ addi dst, dst, %lo(value)
.
lui x9, %hi(0x12345678)
addi x9, x9, %lo(0x12345678)
lui x10, %hi(0xFFFFFFFF)
addi x10, x10, %lo(0xFFFFFFFF)
assemble with clang, disassemble with llvm-objdump again:
18: b7 54 34 12 lui s1, 74565
1c: 93 84 84 67 addi s1, s1, 1656
20: 37 05 00 00 lui a0, 0
24: 7d 15 addi a0, a0, -1
Note that lui a0, 0
is a silly waste of an instruction that results from naively using hi/lo on 0xffffffff without realizing that the whole thing fits in a sign-extended 12-bit immediate.
There are good use-cases for manual %hi/%lo, especially for addresses, where you have one aligned "anchor" point and want to load or store to some label after that:
lui t0, %hi(symbol)
lw t1, %lo(symbol)(t0)
lw t2, %lo(symbol2)(t0)
addi t3, t0, %lo(symbol3) # also put an address in a register
...
sw t1, %lo(symbol)(t0)
So instead of wasting instructions doing a separate lui for each symbol, if you know they're in the same 2k aligned block you can reference them all relative to one base with the assembler's help. Or actually to a 4k aligned block with the "anchor" in the middle, since %lo
can be negative.
(The PC-relative version of this with auipc
is just as efficient but looks a little different: What do %pcrel_hi and %pcrel_lo actually do? - %pcrel_lo actually references a %pcrel_hi relocation to find out the actual target symbol as well as the location of the relative reference.)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ãƒãƒã‚¯ |
Solution 2 | |
Solution 3 | |
Solution 4 | Peter Cordes |