'What is the difference, if any, between LONG and FAR jumps in Assembly?

I'm looking at some practice code for assembly, and the assignment is basically to replace one jump point with another.

The original jmp is a SHORT jmp, and the end point I need to approach cannot be reached with this instruction.

I have three options now, I either remove 'SHORT', I insert 'LONG' or I insert 'FAR'.

If there's documentation anywhere that indicates the differences between them, I haven't found it yet. Can anyone be of assistance here?



Solution 1:[1]

I'm assuming your question pertains to the x86 architecture; you haven't specified in your question.

A SHORT jump is a jump to a particular offset from the current instruction pointer address. A LONG jump can use a larger offset value, and so can jump further away from the current instruction pointer address. Both of these jump types are usually relative - that is, the operand is an offset from the current instruction pointer (though in assembly source, you normally provide the target label - the assembler or linker then computes the offset). Neither of them jump to a different code segment, so they are both 'near' jumps.

A FAR jump specifies both a segment and offset, which are both absolute in the sense that they specify the required code segment and instruction pointer, rather than an offset relative to the current code segment / instruction pointer.

To summarise, there are three types of direct jump: short and long, which are both near jumps capable of jumping different relative distances with the same code segment, and far, which can jump to any absolute address (segment and offset).

(Note that it is also possible to perform an indirect absolute jump, where you specify an operand that holds the absolute address that you wish to jump to. In this case the jump can either be near or far - i.e. it can include or not include the required code segment).

If you don't specify the jump 'distance', it is up to the assembler whether you get a short, long or far jump. Most modern assemblers are "two-pass" and will use a short jump if possible, or a long or far jump otherwise - the latter only if required.

See wikipedia's entry on x86 memory segmentation if you need help with understanding what I mean by 'segment'.

See this description of the x86 JMP instruction for full details of the possible JMP instruction addressing modes.

Solution 2:[2]

A SHORT jump:

  • If it is a forward jump, the encoding uses a relative offset value from 00h (+0) to 7Fh (+127) which enables program execution to jump to another instruction with a maximum of 127 bytes in-between them.
  • If it is a backward jump, the encoding uses a relative offset value from 80h (-128) to FFh (-1) which enables program execution to jump to another instruction with a maximum of 125 bytes in-between them.

A LONG jump, can use a larger offset.

A FAR jump, jumps to another code segment.

Solution 3:[3]

TL:DR: short vs. long/near just forces the instruction length choice.
far is a totally different beast.

  • jmp short foo is jmp rel8.
  • jmp long foo or jmp near foo is jmp rel16/rel32 (depending on mode)
  • jmp far [rdi] is a jmp / call to a new CS:[ER]IP. You rarely want this.

See Intel's manual (or an HTML scrape if you're familiar with the notation used) for the available machine-code forms of jmp / call, jcc (conditionals like jge), and loop (which is similar in encoding to a jcc short).


x86 has two major families of jumps/calls, with some variety within each:

  • Far to a new CS:IP (or CS:EIP / CS:RIP depending on mode).
    Almost never used in normal 32 or 64-bit code (e.g. WOW64 32-bit system DLLs that call into 64-bit code that uses the syscall instruction), and only in 16-bit code if you can't fit your program into 64K, or in MBR bootloaders to set a known CS in real mode, or switch to 32-bit.

    Direct (except in 64-bit mode) or memory-indirect, but always absolute. Interestingly, x86's only absolute direct jump.
    No conditional far jumps, only jmp or call / retf
    Syntax details depend on the assembler, but often something like jmp 0x10:foo or jmp far [eax] work in NASM, the latter loading 6 bytes from [DS:EAX] into CS:EIP if run in 32-bit mode.

  • Near is a normal jump not changing CS, just setting a new IP/EIP/RIP. The following forms are available:

    • Indirect (using absolute targets, e.g. function pointers or jump tables) like
      jmp ax or jmp qword [rsi], or the same with call. (Not conditional indirect jcc). short or not isn't meaningful because the machine code for the instruction only encodes the place to find a new [ER]IP, not directly how to reach it. ax vs. eax is a matter of operand-size, or [esi] vs. [rsi] is a matter of address-size. And [rsi+0x1230] is a matter of the disp8 / disp32 used as part of the addressing mode.
    • Direct using relative displacements (encoded into the machine code of the instruction) that get added to IP/EIP/RIP. So they're relative to the end of the instruction1. This is what you get from a normal jmp foo or jle .else, with the assembler normally picking a length for you.
      • short means using an 8-bit (1-byte) relative displacement, aka a rel8.
        Available for jcc rel8 and jmp rel8 (and loop), not call

      • non-short, using a rel16 (16-bit mode2) or rel32 (other modes). So a 2-byte or 4-byte relative displacement.
        You can force this encoding with near or long in the asm source.
        Available for jmp rel16/32 and call rel16/32, and on 386 and later jcc rel16/32. If assembling with instructions restricted to 286 or earlier, an assembler will complain if the distance to the target label is outside the [-128, +127] range of a rel8, or use a fallback like a jnle over a jmp.

      • There is no near absolute direct. For that use mov eax, 0x123456 / jmp eax (near register-indirect) if you can't guarantee (or easily get the toolchain to calculate at link time) the distance between this code and the absolute target.


Footnote 1: For example EB 00 is a slow NOP using a short jmp.
Or E8 00 00 00 00 / 5B is a call next (or call $+5) / next: pop ebx like you might use to read EIP without actually going anywhere in 32-bit mode where RIP-relative LEA isn't available)

Footnote 2:: Technically you can use jmp rel16 in 32-bit mode, but it will truncate EIP to 16-bit. (And the encoding uses a 66h operand-size prefix so it's only 1 byte shorter than a jmp rel32)

In 16 and 32-bit mode, jmp rel16/rel32 can reach any other IP/EIP value, but in 64-bit mode the +-2GiB range is only a small fraction of the virtual address space. Still, it's normal for code for a single executable to assume that it fits in 2GiB so any code can reach any other code in the same library or main executable with a relative near jump/call. A "large" code model would require mov reg, imm64 / jmp reg or something inefficient like that. Or even worse to make it position-independent.


LONG is unusual terminology. In most assemblers, the encoding overrides are short (rel8) or near (rel16 or rel32 depending on mode) to force the length (and thus how far you can jump) for near jumps (cs unchanged, just adding an offset to IP/EIP/RIP)

According to the other answers here, in assemblers where long is a thing, it's the same rel16 or rel32 override that you get with NASM jmp near foo.

NASM listing (nasm -felf32 foo.asm -l/dev/stdout)

     1                                  foo:
     2 00000000 E9FBFFFFFF              jmp  near foo
     3 00000005 EBF9                    jmp  foo           ; optimizes to short by default
     4 00000007 EBF7                    jmp short foo

NASM does multi-pass optimization to find the shortest encoding it can use for each branch. This is usually optimal, but see Why is the "start small" algorithm for branch displacement not optimal? for corner cases where manually forcing one branch's encoding could allow smaller code.

If the branch target is in another file so NASM doesn't know at assemble time how far away it'll be, it assumes near (non-short). You could force that if you know the files you'll be linking together are small (or the code is in a special section).

Or if you want to leave a full rel32 for something else to modify this machine code and write a new offset, that would be a use-case for near. For example, the PLT used in dynamic linking on Linux used to work that way (I think), rewriting an offset in a jmp rel32 instead of doing an indirect jmp with a GOT entry.

Historically, some assemblers are not as smart as NASM, and always needed manual hinting if you wanted the short encoding for jumps. Especially for forward jumps, to a label the assembler hasn't seen yet. (If you're working with 16-bit code using old tools from that era, you may run into this.) Even NASM used to default to optimization disabled in old versions, which would make it choose the long encoding.

Also, jcc near is only supported on 386 and later, so you may need to be explicit if you want your assembler to actually emit that.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Sep Roland
Solution 3 Peter Cordes