Monday, January 13, 2014

Developing STM32 microcontroller code on Linux (Part 6 of 8, building and linking STM32 programs)

The first post of this series covered the steps to build and run code for the STM32. The second post covered how to build a cross-compiler for the STM32. The third post covered how to build a debugger for the STM32. The fourth post covered building and configuring OpenOCD for your development environment. The fifth post covered building the device library, libopencm3. This post will cover linker scripts and command-line options necessary for building and linking programs to run on the STM32.

Once we have all of the previous steps done, we are achingly close to being able to build and run code on our target STM32 processor. However, there is one more set of low-level details that we have to understand before we can get there. Those details revolve around how our C code gets turned into machine code, and how that code is laid out in memory.

As you may know, compiling code to run on a target is roughly a two-step process:
  1. Turn C/C++ code into machine code the target processor understands. The output of this step are what are known as object files.
  2. Take the object files and link them together to form a coherent binary. The output of this step is generally an ELF file.
Let's talk about these two steps in more detail.

Compile step

During compilation, the compiler parses the C/C++ code and turns it into an object file. A little more concretely, what we want to have our cross-compiler do is to take our C code, turn it into ARM instructions that can run on the STM32, and then output that into object files.

To do this, we use our cross-compiler. As with any version of gcc, there are many flags that can be passed to our cross-compiler, and they can have many effects on the code that is output. What I'm going to present here is a set of flags that I've found works pretty well. This isn't necessarily optimal in any dimension, but will at least serve as a starting point for our code. I'll also point out that this is where we start to get into the differences between the various STM32F* processors. For instance, the STM32F4 processor has an FPU, while the STM32F3 does not. This will affect the flags that we will pass to the compiler.

For the STM32F3, Cortex-M3 processor that I am using, here are the compiler flags: -Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wstrict-prototypes -Wundef -Wshadow -g -fno-common -mcpu=cortex-m3 -mthumb -mfloat-abi=hard -MD
Let's go through each of them. The -W* flags tell the compiler to generate compile-time warnings for several classes of common errors. I find that enabling these warnings and getting rid of them usually makes the code much better. The -g flag tells the compiler to include debugging symbols in the binary; this makes the code easier to debug, at the expense of some code space. The -fno-common flag tells gcc to place uninitialized global variables into the data section of the binary, which improves performance a bit. The -mcpu=cortex-m3 flag tells the compiler that we have a Cortex-M3, and thus to generate code optimized for the Cortex-M3. The -mthumb flag tells gcc to generate ARM thumb code, which is smaller and more compact than full ARM code. The -mfloat-abi=hard flag tells gcc that we want to use a hard float ABI; this doesn't make a huge difference on a processor without an FPU, but is a good habit to get into. Finally, the -MD flag tells gcc to generate dependency files while compiling, which is useful for Makefiles.

Linking step

Once all of the individual files have been compiled, they are put together into the final binary by the linker. This is more complicated when targeting an embedded platform vs. a regular program. In particular, we have to tell the linker not only which files to link together, but also how to lay the resulting binary out on flash and in memory.

We'll first start by talking about the flags that we need to pass to the linker to make this work. Here are the set of flags we are going to start with: --static -lc -lnosys -T tut.ld -nostartfiles -Wl,--gc-sections -mcpu=cortex-m3 -mthumb -mfloat-abi=hard -lm -Wl,
Again, let's go through each of them. The --static flag tells the linker to link a static, not a dynamically linked, binary. This flag probably isn't strictly necessary in this case, but we add it anyway. The -lc flag tells the linker to link this binary against the C library, which is newlib in our case. That gives us access to various convenient functions, such as printf(), scanf(), etc. The -lnosys flag tells the linker to link this binary against the "nosys" library. Several of the convenience functions in the C library require underlying implementations of certain functions to operate, such as _write() for printf(). Since we don't have a POSIX operating system that can provide these for us, the nosys library provides empty stub functions for these. If we want, we can later on define our own versions of these stub functions that will get used instead. The -T tut.ld flag tells the linker to use tut.ld as the linker script; we'll talk more about linker scripts below. The -nostartfiles flag tells the linker not to use standard system startup files. Since we don't have an OS here, we can't rely on the standard OS utilities to start our program up. The -Wl,--gc-sections flag tells the linker to garbage collect unused sections. That is, any sections that are not referenced are removed from the resulting binary, which can shrink the binary. The -mcpu=cortex-m3, -mthumb, and -mfloat-abi=hard flags have the same meaning as for the compile flags. The -lm flag tells the linker to link this binary against the math library. It isn't strictly required for our little programs, but most programs want it sooner or later. Finally, the -Wl, tells the linker to generate a map file and stick it into The map file is helpful for debugging, but is informational only.

Linker script

As mentioned before, the linker script tells the linker how to lay out the resulting binary in memory. This script is highly chip specific. The details have to do with where the processor jumps to on reset, and where it expects certain things to be. Note that most chips are actually configurable (based on some jumper settings), so where it jumps to on reset can change. Luckily, for most off-the-shelf STM32 designs, including the DISCOVERY boards, it is always configured to expect the code to start out in flash. Therefore, the linker script tells the linker to lay out the code in flash, but to put the data and bss in RAM.

With all that said, libopencm3 actually makes this easy on you. They have default linker scripts for each of the chips that are supported. All you really need to do is to fill in a small linker script with the RAM and FLASH size of your chip, include the default libopencm3 one, and away you go.

So we are going to put all of the above together and write a Makefile and a linker script into the project directory we created in the last tutorial. Neither of these are necessarily the best examples of what to do, but they will get the job done. First the Makefile:

$ cd ~/stm32-project
$ cat <<EOF > Makefile
CFLAGS=-Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wstrict-prototypes -Wundef -Wshadow -g -fno-common -mcpu=cortex-m3 -mthumb -mfloat-abi=hard -MD -DSTM32F3
LDFLAGS=--static -lc -lnosys -T tut.ld -nostartfiles -Wl,--gc-sections -mcpu=cortex-m3 -mthumb -mfloat-abi=hard -lm -Wl,

all: tut.bin

tut.bin: tut.elf
$( echo -e "\t" )\$(OBJCOPY) -Obinary tut.elf tut.bin

tut.elf: \$(OBJS)
$( echo -e "\t" )\$(CC) -o tut.elf \$(OBJS) ~/opt/cross/arm-none-eabi/lib/libopencm3_stm32f3.a --static -lc -lnosys -T tut.ld -nostartfiles -Wl,--gc-sections -mcpu=cortex-m3 -mthumb -mfloat-abi=hard -lm -Wl,

flash: tut.bin
$( echo -e "\t" )sudo \$(OPENOCD) -f stm32-openocd.cfg -c "init" -c "reset init" -c "flash write_image erase tut.bin 0x08000000" -c "reset run" -c "shutdown"

$( echo -e "\t")rm -f *.elf *.bin *.list *.map *.o *.d *~
You should notice a couple of things in the Makefile. First, we use all of the compiler and linker flags that we talked about earlier. Second, our objects ($OBJS) are tut.c, which we'll create in the next post. And third, we have a flash target that will build the project and flash it onto the target processor. This requires the OpenOCD configuration file that we created a couple of posts ago.

Now the linker script:

$ cat <<EOF > tut.ld
        rom (rx) : ORIGIN = 0x08000000, LENGTH = 256K
        ram (rwx) : ORIGIN = 0x20000000, LENGTH = 40K

/* Include the common ld script. */
INCLUDE libopencm3_stm32f3.ld
You'll notice that there isn't a lot here. We just have to define the RAM location and size, and the ROM (Flash) location and size, and the default libopencm3 linker script will take care of the rest.

We now have all of the parts in place. The next post will write, compile, and run a simple program on the board.


  1. Hi Chris,

    Just wanted to point out that the STM32F3 is actually a Cortex-M4. It has FPU.

  2. Hello, thanks for your post, it is really good.
    I am wondering if you know how can I discover how much memory I am using in my program when I flash it, because I am debugging and will like to know the amount of memory used or allocated.

    I am using STM32F4 Discover.

    1. It depends on what you mean by "memory". If you are talking about flash size, then you can use the "size" command to tell you the amount of flash your program will use (the .text section). If you are talking about static memory, then you can also use the "size" command to look at that (.bss and .data sections). If you are talking about stack usage, then you can pass -fstack-usage to the compiler to get some information about stack usage. If you are talking about dynamic memory allocation (malloc and friends), then there aren't a lot of great tools to track it and you'll have to do it by hand.