'ARM Cortex-M4: Running code from external flash

Is it possible to separate base FW and application code on ARM Cortex-M4 architecture (e.g STM32 F4). What I´d like to do is to run applications from external flash and base FW from internal flash. Applications are all implementing same "API" (single header file), but functionality is different.

Idea is that base FW is offering drivers, engine and UI and can work as standalone. Applications would provide extra functionality to base FW when needed. All applications cannot be flashed on internal flash since total code size for all applications is too big for internal flash. Another reason is that we´d like to update / add applications on the fly without re-flashing the device.

So, far I have few ideas how to do this but is any of these feasible or are there other options?

Load applications on the fly to internal SRAM
- RAM consumption might be problem.
- Not sure if base FW and application code can be executed at "same time". Can application code functions be called from based FW and and wise verse? I have seen this technique used with flash-loaders, but once you start to run code from RAM the code from flash cannot be executed anymore?
Flash application from external flash to internal flash.
- Not sure how long internal flash will last. What is maximum amount of write cycles to internal flash. There is need to change application 1-20 times / day
- Can part of the internal flash be flashed when executing code from there (application loader)
Find Cortex-M4 that supports running code same time at internal flash and external flash.
- I haven't find any, probably not possible with Cortex-M4 architecture?

All tips, hints and example codes appreciated!

EDIT: Thanks for the answers, need some time to digest those.

Main reason for this trial is to allow updating the device functionality without flashing the base FW, not so much of saving SRAM/internal flash. Kind-of plugin architecture to offer simple interface to extend system functionality without need to alter the underlying system. If I cannot build system executing code from external flash (SD card, NAND) I will try first loading application on the fly to SRAM/internal flash. But I will also dig deeper to emcraft solution.

There is no need to stick on STM chips, I just happen to have their devkits in my table. The final target is to load applications from SD card or NAND memory, so in this point I don't want to limit implementation to work only with NOR flash.

I´ll start to work with minimal implementation by using STM32 F4 devkit. First I need to wrap some NAND/SD card on it. I will try both options for loading applications to SRAM and internal flash to see how those works and what is the impact for performance. As Clifford said the challenge will be more in linking, building and toolset settings. Even-tough I can force application to be always in same place on memory the functions will be in different place, need to figure out how to take care of this. Examples/demos would be helpful.

Spec for my minimal implementation.

Project 1: Base FW
    Driver for accessing applications from external flash
    Minimal filesystem to write and read applications to/from external flash
    UART commands -- Write applications to external flash -- Load applications from external flash to SRAM/Internal flash -- Execute application and print result to UART
Interface.h
    int functionWrapper(int functionNumber)
    bool initApplication()
    int executeMathOperation1(int a, int b)
    int executeMathOperation2(int a, int b)
Project 2: Application 1
    MathOp1: Sums up two values
    MathOp2: Multiply up two values
Project 3: Application 2
    MathOp1: Subtracts two values
    MathOp2: Divide two values

I haven't think the final OS, but most probably it will be Free/OpenRTOS

c arm embedded cortex-m

Solution 1:^[1]

The problem is not the processor; executing code in different memory spaces is a matter of building, linking and loading your code appropriately, and that is largely a toolchain issue than a matter of chip selection.

The first problem is selection of a device that actually has an external memory interface that is memory mapped. In this case for example serial memory devices such as NAND flash, or mass storage devices such as SD card are not suitable. It must be NOR flash on the system address/data bus.

Second in most cases the external memory interface must be configured for the correct memory type, bus width and timing etc. in order to be addressable. Than means that you cannot boot directly into software on the external memory because software must run in order to perform the configuration.

Thirdly, typically your toolchain will compile and link your application into a single monotithic application - separating it into BIOS/OS and application is not trivial, and for bare-metal targets (ie. not running a full OS such as Linux with load/execute and dynamic linking etc. built-in) there is no standard method - you'll have to cook your own.

For the case of a bootloader starting up, loading an application and running it, it is perhaps simple because once the application is running the bootloader plays no further part - the bootloader only needs to know the start address of the application. However in your case you want to separately compile and link two separate software entities and have the application be able to access your BIOS/OS code, so the application entity needs to know the routine entry-point addresses of the independently linked BIOS/OS. One way to do this is to generate a link map of the BIOS/OS (which is toolchain specific), and from that generate an entry-point lookup table (and array of function pointers essentially) that you link with each application, That way your application will have the means to back-call BIOS/OS.

You may however not need to do that, you can link your application into disjoint memory address regions and program the internal and external memory devices separately (sometimes called "scatter-loading). That way the linker is responsible for resolving the internal and external addresses and calling in either direction is possible. You need to ensure that the start-up code that configures external memory is in the internal memory of course, but it is possible to instruct the linker to place specific code in specific memory or let it decide when it does not matter.

The need to re-flash the code does not itself require the architecture you describe, you could simply implement a bootloader (occupying reserved pages of internal flash) that can load data from some source such as SD card, USB, serial or NAND flash, and write it to the appropriate internal or external flash pages; if the data loaded is for example in Intel format Hex file, it will contain address information that will tell the bootloader where to write the data. In this approach the only permanent code is the bootloader (which must configure the external memory), and your BIOS/OS and application can be monolithic and "scatter loaded" across internal and external flash.

A word of warning however Cortex-M devices are optimised to execute code and load data over separate buses. On STM32 the internal flash and internal SRAM are on separate buses allowing instruction and data fetches to occur in parallel. when running code from and external memory, you have to realise that not only is that bus likely to be slower, instruction and data fetches from external memory will be serialised. So performance may take a significant hit.

Update April 2022 MCU's with a QSPI memory controller can execute code from QSPI flash (albeit somewhat slowly compared to a normal address/data bus parallel memory). Unlike a normal serial (SPI or I²C) or NAND flash memory, QSPI memory is memory mapped for read access, so can contain directly executed code.

Solution 2:^[2]

EDIT: Actually, your chip has an external memory mapping (FMC/FSMC). This would allow direct execution, so it's probably the easiest way if you can make it work.

If you stick with this chip, loading the whole application to SRAM is probably the easiest. Jumping between RAM and Flash execution is not a problem, RAM and flash are mapped at different addresses. To keep it simple, you should always load applications at the same RAM address, and your applications need to be built to run at this particular address (usually done with linker settings). You may try to implement a partial load system, which could load parts of the application when needed, but this would require some efforts, for both designing the host FW and building the applications.

Writing to internal flash can work, and may be preferable if you are short on RAM. Usually, internal flash can be written while executing, but that often results in CPU stalling (CPU waits until flash operation is done), which might not be a problem. If you look at your chip datasheet, the flash is characterized for min 10,000 erase/write cycles. That means, if your chip should last 5 years, you should load less than 6 applications per day (you may reasonably double this amount, maybe more, for typical endurance, but there is no guarantee).

Solution 3:^[3]

cortex-m4 has nothing to do with what you are talking about, the cortex-m4 as with all arm processor cores is just a processor core nothing more, arm makes IP not chips. You have to look at the chip vendors implementation, do any of them provide the ability to execute code on an external flash interface. I would assume not, just like it is hard to find a microcontroller that executes from external ram but those are more often found than flash. With almost all you can certainly copy and run or flash and run, code running on an internal chip resource copies code from the external resource to something internal (rom/ram) then runs it there, that code can again copy and jump or go back to the main on chip firmware to copy and load the next overlay.

The chip vendor should be able to tell you the lifespan of the flash, flashes wear out, code designed to be regularly/constantly erasing and writing the flash will wear out that part in days perhaps or weeks or months depending on how badly you punish the flash. Saving state to prom or flash when power is lost, say the odometer reading on a vehicle for example, is okay that is a once or a few times a day thing or it may be once every few years thing, but saving every tenth of a mile will wear out that part before the lifetime of the vehicle.

The m4s are hard to call microcontrollers with all the features inside (floating point, caches, mmu like stuff, etc) and tend to have lots of ram and flash compared to say cortex-m0 siblings. So your best bet is to just get one with more ram, and copy and jump. There should be many parts that will work nicely for you...

Solution 4:^[4]

There are some implementations using Cortex M3/M4 micro-controllers where the internal firmware is used just to configure an external RAM and map it to the internal memory map, then load the main program from a external media (like a flash chip, SD card, etc) to this RAM and execute from it (almost like ARM processors based Linux boards does, where the internal firmware is nothing more than a initial bootloader)

This company do this: http://www.emcraft.com/

take a look, they even use STM32F4

The biggest problem is that Cortex M does not have MMUs, so you cannot run full blown linux on it, limiting it to uClinux

Edit: this article should be interesting for you too: http://electronicdesign.com/embedded/practical-advice-running-uclinux-cortex-m3m4

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2
Solution 3	old_timer
Solution 4

'ARM Cortex-M4: Running code from external flash

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]

Solution 4:^[4]