'What is the difference, if any, between LONG and FAR jumps in Assembly?
I'm looking at some practice code for assembly, and the assignment is basically to replace one jump point with another.
The original jmp is a SHORT jmp, and the end point I need to approach cannot be reached with this instruction.
I have three options now, I either remove 'SHORT', I insert 'LONG' or I insert 'FAR'.
If there's documentation anywhere that indicates the differences between them, I haven't found it yet. Can anyone be of assistance here?
Solution 1:[1]
I'm assuming your question pertains to the x86 architecture; you haven't specified in your question.
A SHORT
jump is a jump to a particular offset from the current instruction pointer address. A LONG
jump can use a larger offset value, and so can jump further away from the current instruction pointer address. Both of these jump types are usually relative - that is, the operand is an offset from the current instruction pointer (though in assembly source, you normally provide the target label - the assembler or linker then computes the offset). Neither of them jump to a different code segment, so they are both 'near' jumps.
A FAR
jump specifies both a segment and offset, which are both absolute in the sense that they specify the required code segment and instruction pointer, rather than an offset relative to the current code segment / instruction pointer.
To summarise, there are three types of direct jump: short and long, which are both near jumps capable of jumping different relative distances with the same code segment, and far, which can jump to any absolute address (segment and offset).
(Note that it is also possible to perform an indirect absolute jump, where you specify an operand that holds the absolute address that you wish to jump to. In this case the jump can either be near or far - i.e. it can include or not include the required code segment).
If you don't specify the jump 'distance', it is up to the assembler whether you get a short, long or far jump. Most modern assemblers are "two-pass" and will use a short jump if possible, or a long or far jump otherwise - the latter only if required.
See wikipedia's entry on x86 memory segmentation if you need help with understanding what I mean by 'segment'.
See this description of the x86 JMP instruction for full details of the possible JMP instruction addressing modes.
Solution 2:[2]
A SHORT
jump:
- If it is a forward jump, the encoding uses a relative offset value from 00h (+0) to 7Fh (+127) which enables program execution to jump to another instruction with a maximum of 127 bytes in-between them.
- If it is a backward jump, the encoding uses a relative offset value from 80h (-128) to FFh (-1) which enables program execution to jump to another instruction with a maximum of 125 bytes in-between them.
A LONG
jump, can use a larger offset.
A FAR
jump, jumps to another code segment.
Solution 3:[3]
TL:DR: short
vs. long
/near
just forces the instruction length choice.far
is a totally different beast.
jmp short foo
isjmp rel8
.jmp long foo
orjmp near foo
isjmp rel16/rel32
(depending on mode)jmp far [rdi]
is a jmp / call to a new CS:[ER]IP. You rarely want this.
See Intel's manual (or an HTML scrape if you're familiar with the notation used) for the available machine-code forms of jmp
/ call
, jcc
(conditionals like jge
), and loop
(which is similar in encoding to a jcc short
).
x86 has two major families of jumps/calls, with some variety within each:
Far to a new CS:IP (or CS:EIP / CS:RIP depending on mode).
Almost never used in normal 32 or 64-bit code (e.g. WOW64 32-bit system DLLs that call into 64-bit code that uses thesyscall
instruction), and only in 16-bit code if you can't fit your program into 64K, or in MBR bootloaders to set a known CS in real mode, or switch to 32-bit.Direct (except in 64-bit mode) or memory-indirect, but always absolute. Interestingly, x86's only absolute direct jump.
No conditional far jumps, onlyjmp
orcall
/retf
Syntax details depend on the assembler, but often something likejmp 0x10:foo
orjmp far [eax]
work in NASM, the latter loading 6 bytes from [DS:EAX] into CS:EIP if run in 32-bit mode.Near is a normal jump not changing CS, just setting a new IP/EIP/RIP. The following forms are available:
- Indirect (using absolute targets, e.g. function pointers or jump tables) like
jmp ax
orjmp qword [rsi]
, or the same withcall
. (Not conditional indirectjcc
). short or not isn't meaningful because the machine code for the instruction only encodes the place to find a new [ER]IP, not directly how to reach it.ax
vs.eax
is a matter of operand-size, or[esi]
vs.[rsi]
is a matter of address-size. And[rsi+0x1230]
is a matter of the disp8 / disp32 used as part of the addressing mode. - Direct using relative displacements (encoded into the machine code of the instruction) that get added to IP/EIP/RIP. So they're relative to the end of the instruction1. This is what you get from a normal
jmp foo
orjle .else
, with the assembler normally picking a length for you.short
means using an 8-bit (1-byte) relative displacement, aka arel8
.
Available forjcc rel8
andjmp rel8
(andloop
), notcall
non-short, using a
rel16
(16-bit mode2) orrel32
(other modes). So a 2-byte or 4-byte relative displacement.
You can force this encoding withnear
orlong
in the asm source.
Available forjmp rel16/32
andcall rel16/32
, and on 386 and laterjcc rel16/32
. If assembling with instructions restricted to 286 or earlier, an assembler will complain if the distance to the target label is outside the [-128, +127] range of a rel8, or use a fallback like ajnle
over ajmp
.There is no near absolute direct. For that use
mov eax, 0x123456
/jmp eax
(near register-indirect) if you can't guarantee (or easily get the toolchain to calculate at link time) the distance between this code and the absolute target.
- Indirect (using absolute targets, e.g. function pointers or jump tables) like
Footnote 1: For example EB 00
is a slow NOP using a short jmp
.
Or E8 00 00 00 00
/ 5B
is a call next
(or call $+5
) / next: pop ebx
like you might use to read EIP without actually going anywhere in 32-bit mode where RIP-relative LEA isn't available)
Footnote 2:: Technically you can use jmp rel16
in 32-bit mode, but it will truncate EIP to 16-bit. (And the encoding uses a 66h
operand-size prefix so it's only 1 byte shorter than a jmp rel32
)
In 16 and 32-bit mode, jmp rel16/rel32
can reach any other IP/EIP value, but in 64-bit mode the +-2GiB range is only a small fraction of the virtual address space. Still, it's normal for code for a single executable to assume that it fits in 2GiB so any code can reach any other code in the same library or main executable with a relative near jump/call. A "large" code model would require mov reg, imm64
/ jmp reg
or something inefficient like that. Or even worse to make it position-independent.
LONG is unusual terminology. In most assemblers, the encoding overrides are short
(rel8) or near
(rel16 or rel32 depending on mode) to force the length (and thus how far you can jump) for near jumps (cs
unchanged, just adding an offset to IP/EIP/RIP)
According to the other answers here, in assemblers where long
is a thing, it's the same rel16 or rel32 override that you get with NASM jmp near foo
.
NASM listing (nasm -felf32 foo.asm -l/dev/stdout
)
1 foo:
2 00000000 E9FBFFFFFF jmp near foo
3 00000005 EBF9 jmp foo ; optimizes to short by default
4 00000007 EBF7 jmp short foo
NASM does multi-pass optimization to find the shortest encoding it can use for each branch. This is usually optimal, but see Why is the "start small" algorithm for branch displacement not optimal? for corner cases where manually forcing one branch's encoding could allow smaller code.
If the branch target is in another file so NASM doesn't know at assemble time how far away it'll be, it assumes near
(non-short). You could force that if you know the files you'll be linking together are small (or the code is in a special section).
Or if you want to leave a full rel32
for something else to modify this machine code and write a new offset, that would be a use-case for near
. For example, the PLT used in dynamic linking on Linux used to work that way (I think), rewriting an offset in a jmp rel32
instead of doing an indirect jmp with a GOT entry.
Historically, some assemblers are not as smart as NASM, and always needed manual hinting if you wanted the short encoding for jumps. Especially for forward jumps, to a label the assembler hasn't seen yet. (If you're working with 16-bit code using old tools from that era, you may run into this.) Even NASM used to default to optimization disabled in old versions, which would make it choose the long encoding.
Also, jcc near
is only supported on 386 and later, so you may need to be explicit if you want your assembler to actually emit that.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Sep Roland |
Solution 3 | Peter Cordes |