Name
Floating Point — Overview of use of floating point
Overview
The Cortex-M architectural HAL provides support for a hardware Floating Point Unit (FPU) if one is present, to provide accelerated floating point math operations.
Support is currently provided for the FPU designs found on the Cortex-M4 and Cortex-M7 architectural variants. However even with these variants, the FPU is an optional feature and a more specific classification of, for example, Cortex-M4F indicates the presence of the FPU.
Furthermore, even if an FPU is present, as indicated by a platform HAL package, it is not required to be used, and the default is not to use it (therefore defaulting to software FP) so that the developer must take the step of enabling hardware FPU support if it is desired. Equally the developer is still permitted to keep using software floating point, which may simplify and reduce code size and stack use (due to the larger register contexts required for FP) in some cases. This software floating point is provided by the compiler (GCC) runtime, based on the compiler flags in use.
Configuration
As described earlier, in
order to enable hardware floating point support in this HAL package you
must enable the configuration
option CYGHWR_HAL_CORTEXM_FPU
(Use hardware FPU) which
can be found within the CYGPKG_HAL_CORTEXM_FPU
(Floating Point Support) CDL component.
Configuration of the FPU support is an important step as the use of the FPU not only affects code generation and requires some initialization, but also an understanding of whether multiple kernel threads in the application may be using FP operations, in which case the method of saving/restoring the FPU register bank on context switches must be set appropriately.
Compile and link flags
Both the application and eCos must be built and linked with matching
compiler/linker flags appropriate to the configuration selected for FPU
support. It is usually easiest to examine
the CYGBLD_GLOBAL_CFLAGS
configuration option, or
simply the build output, to see the relevant flags in use. These are the
flags to look for, and a brief summary of their purpose:
-
-mcpu=cortex-m4
-
No Cortex-M3 core supports FPU operations, so
-mcpu=cortex-m4
is required to allow the correct instructions to be generated. For the moment, use of this option also applies when using the Cortex-M7 although this will likely change in a future compiler update. -
-mfloat-abi=hard
-
This directs the compiler to generate FPU instructions for floating
point operations. If this option is absent, the default
of
-mfloat-abi=soft
, i.e. software FP is used. -
-mfpu=
…
-
This option indicates which hardware FPU is present, covering the
number of registers, their sizes, and so on. For the moment,
only
-mfpu=fpv4-sp-d16
, as used on the Cortex-M4F and M7F, is supported. This corresponds to the VFPv4 specification with 32 single precision registers, also usable as 16 double precision registers.
Threads and context switching with FP
With the hardware FPU support enabled, it is then possible to configure
the CYGHWR_HAL_CORTEXM_FPU_SWITCH
(FPU context
switch) configuration option in order to control how FPU registers are
saved/restored in context switches. There are three settings: ALL, LAZY,
and NONE.
- ALL
This mode is the most straightforward, and means that on every context switch, all FPU registers are saved and restored between threads.
This mode makes the most sense if you need determinism and/or most or all of your threads will use FP. However if few threads use FP, it can result in a lot of overhead due to saves and restores of unchanged registers.
Enabling the
ALL
mode also takes advantage of the Cortex-M lazy exception stacking feature in order to reduce interrupt/exception latency. This means that after an exception or interrupt, the core reserves space for the FPU register context on the stack, but does not actually save the FPU register contents onto the stack unless needed.- LAZY
In this mode, if a thread has not used the FPU, the FPU context will not be saved or restored for it. The HAL installs exception handlers on the Cortex-M UsageFault and HardFault exceptions in order to detect the first time the FPU is accessed by that thread. Once the FPU is accessed, the fault handler enables the FPU for that thread, and from then on, the FPU context will be saved and restored when switching from or to that thread.
In a system where some or many threads do not use the FPU, this can greatly improve context switch time. However if the system spends most of its time swapping between two or more threads which do both use the FPU, then there may be additional overhead compared to the
ALL
mode (due to the need to check if the FPU was enabled for a particular thread on switch). This means the worst case context switch time is longer than withALL
mode. It also reduces determinism as there is an unavoidable latency at the point the thread first accesses the FPU, so that the fault handler can execute to enable the FPU; and determinism is further affected as context switch time depends on whether threads use the FPU.The
LAZY
mode does not save on stack usage, as the number of registers which might need to be saved remains the same.Unlike the
ALL
mode, there is not yet support for lazy exception stacking for those threads which have the FPU enabled, which means if the FPU is enabled at the time of interrupt or exception, much of the FPU register context (the FPSCR, and half the data registers) will be saved. Please contact eCosCentric if it would be of interest to enhance eCos by adding adding lazy exception stacking to theLAZY
context switching mode.- NONE
In this mode, the FPU is enabled, but no floating point context is stored at any point, which naturally means there is no overhead on context switch. However this means that only one thread or context may use the FPU at a time.
If using this mode, either all FP operations must be constrained to a single thread. Or there must be locking to ensure that multiple threads do not access the FPU registers simultaneously. But if you rely on locking, great care must be taken as the compiler has the potential to reorder floating point accesses outside of the critical region if it is still in the same function. The use of the
HAL_REORDER_BARRIER()
call from the<cyg/hal/hal_arch.h>
HAL header can be useful to prevent reordering across a particular point in the code.
Floating point specific tests
The kernel package has a number of tests to exercise floating point operations, especially when switching threads. Some of these tests take particular account of the Cortex-M features in determining what to test. The relevant test names in the kernel package all have the prefix "fp".
FP in exception contexts
Floating point must not be used in an ISR or from kernel exception handlers. If used, FPU registers will not be restored correctly on the return from the ISR/exception.
However if
the ALL
context switch mode is in use, it is permitted to use floating
point in DSR routines, including kernel alarm functions. They may also
be used
in NONE
mode, but as expected, this could only be if no threads are using
floating point; this can be ensured in threads by using the kernel
scheduler lock to prevent DSRs from running temporarily, although
clearly that has an impact on real-time behaviour. As mentioned earlier,
it would also be advised to combine the lock with use
of HAL_REORDER_BARRIER()
.
Do not use floating point operations in DSRs
when LAZY
context switch mode is used. There is no guarantee of which thread
context will be current when the DSR is run, meaning that if the interrupt
occured while a thread that does not use FP was running, the DSR would
cause the FPU to enabled for that thread from then on.
2024-03-18 | eCosPro Non-Commercial Public License |