'MCU/embedded: Position independent code, max size for .got section?

I am trying to get my project "position independent", but it won't give...

Some background:

  • nxp imx rt 1024 evk board
  • c++ project
  • compiled both C and C++ files with -fPIC, -msingle-pic-base -mno-pic-data-is-text-relative
  • a working prototype where I can run a small demo c++ program run which starts some freertos tasks and creates some static c++ objects (with inheritance, with pure virtual classes, to test)
  • the strong desire to have 1 binary which we can update "over the air" (OTA) by having a customer bootloader which jumps to either app1 or app2.

When I apply my changes to my "real" project, it works all the same as long as I comment out the vast majority of my c++ static constructors.

enter image description here

When I include one (any) more constructor in my main.cpp, the following will happen:

  • My bootloader copies the vector table from flash (either app1 or app2) to sram = OK
  • My bootloader jumps to 0x202000004 (OC sram where reset handler ISR sits) = OK
  • The ResetHandler will start setting up R9 (the register used for the .got) = OK
  • The ResetHandler will jump to Startup = hard faults, checking the registers in the CPU, I can see that the LR (link register) has a bogus value (0xfffffff9) some clearly something went wrong.

I verified:

  • the vector table from disassembly, matches 1-on-1 with vector table in OC sram
  • the .got section from disassembly, matches 1-on-1 with .got in DTC sram.
  • the address of the Startup function just before the jump is actually done. It matches to an entry in the .got section.

When I REDUCE the amount of code by commenting out stuff, everything behaves EXACTLY the same except for the hard fault and the broken value in LR.

Is there some (officially?!) documentation that confirms there is a hard limit to the .got section when cross compiling for ARM (Cortex m7)?

Is there anybody that can contribute in any way by giving possible hints what the hell is causing this ?

For reference, the startup code that bonks out when "some weird threshold" is reached in .got size (my assumption, could be wrong of course).

extern void Startup(unsigned int flash_start, unsigned int flash_end, unsigned int lma_offset);

extern unsigned int __flash_start__;
extern unsigned int __flash_end__;

extern unsigned int __global_offset_table_flash_start__;
extern unsigned int __global_offset_table_sram_start__;
extern unsigned int __global_offset_table_sram_end__;

//*****************************************************************************
// Reset entry point for your code.
// Sets up a simple runtime environment and initializes the C/C++
// library.
//*****************************************************************************
__attribute__ ((naked))
void ResetISR(void)
{
    __asm ("MOV R11, #1");

    // Disable interrupts
    __asm volatile ("cpsid i");

    unsigned int lma_offset;
    unsigned int *global_offset_table_flash_start;

    // Before doing anything else related to variables in sram, setup r9 for position independent code first.
    // And correct the firmware offset which is stored in r10 (add it to r9)
    // Finally grab the updated global offset table address from r9
    __asm volatile ("LDR r9, = __global_offset_table_flash_start__");
    __asm volatile ("ADD r9, r9, r10");

    __asm ("MOV %[result], R9"
        : [result] "=r" (global_offset_table_flash_start) );

    // Grab the lma offset defined in bootloader from r10
    __asm ("MOV %[result], R10"
        : [result] "=r" (lma_offset) );

    unsigned int flash_start = reinterpret_cast<unsigned int>(&__flash_start__);
    unsigned int flash_end = reinterpret_cast<unsigned int>(&__flash_end__);

    unsigned int *flash;
    unsigned int *sram;
    unsigned int *sram_end;

    __asm ("MOV R11, #2");

    //
    // Copy global offset table to sram
    //
    flash = const_cast<unsigned int*>(global_offset_table_flash_start);
    sram = const_cast<unsigned int*>(&__global_offset_table_sram_start__);
    sram_end = const_cast<unsigned int*>(&__global_offset_table_sram_end__);

    for (int i = 0u; i < (sram_end - sram); ++i)
    {
        sram[i] = flash[i];
        if (sram[i] >= flash_start && sram[i] <= flash_end)
        {
            sram[i] += lma_offset;
        }
    }

    // Update R9, as of now, all functions should be resolvable through the got
    __asm volatile ("LDR r9, = __global_offset_table_sram_start__");

    __asm ("MOV R11, #3");


    unsigned int address = reinterpret_cast<unsigned int>(&Startup);

    __asm__ volatile ("MOV R12, %[input]"
        : : [input] "r" (address)
          );

    // Jump to regular startup code
    Startup(flash_start, flash_end, lma_offset);
}

PS: I know -fPIC is BROADLY used in linux. No such limitation would exist there. Maybe this is something ARM specific, or even CPU (cortex m7) specific). Still maybe some Linux -fPIC guru might have ideas that can help me on my way...

PPS: If I need to share anything else, say the word...



Solution 1:[1]

I will leave it open just as a reference for people struggling with the same thing. There is no dependency. There is no problem, except for the ones introduced by yours truly: myself.

The main problem for me was not being able to debug my app when it is relocated. This can be resolved by issueing the GDB command add-symbol-file <path-to-elf-file> <address-to-text-section>

As example:

  • my app is compiled and linked to 0x60020000
  • my app is uploaded in flash to 0x60030000 (so with an offset of 0x10000)
  • when reading the elf file with arm-none-eabi-readelf -WS myapp.axf I can read that the text section has an offset of 0x2120 in my case.

When I start my bootloader in de debugger, before I jump to the relocated app, I issue the commmand:

add-symbol-file myapp.axf 0x60032120

This loads the symbols, and gbd will add the offset of 0x2120 to all symbols in the .text section. That way I am able to debug through.

Once I had my debugger running, I could see several programming errors on my end. The most critical one was reading linker symbols after setting up r9 with the base of the .got section in sram. I still added the LMA offset to those linker symbols, while that happens 'automagically' behind the scenes. So I was reading garbage memory in some cases, and stored that in the parts that were to be initialized by libc_init_array.

After fixing those, I ran into another strange issue. One nxp driver declared a static const array of all pointers to GPIOs. When I compiled the source file, and pulled it througgh arm-none-eabi-objdump I could see the array in .text, perfectly setup with the addresses to GPIO1..GPIO5. But, after linking, and dumping the contents again through objdump, I noticed that that very same array was altered. The reference to the GPIO5 peripheral somehow was set to 0x0.

Now, I have no idea why that happened, but I thought if I would remove the const part, then the array will be mapped to sram, and maybe I get rid of this issue. I was in luck for once, it solved the issue. Not really a perfect fix, because now I now I have to be very weary of code that declares static const stuff. I'll investigate it later, but for now, I am mostly thrilled that this story came to an end. I have my c++ app compiled with -fPIC and I am able to run it on any location (4 byte aligned) in flash, and on top of that, I am able to debug through it as well.

So for the next guy who's going insane on this "position independent code" journey: don't give up, there is an end to the suffering ;-)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 bas