Name
Kernel — Overview of the eCos Kernel
Description
The kernel is one of the key packages in all of eCos. It provides the core functionality needed for developing multi-threaded applications:
- The ability to create new threads in the system, either during startup or when the system is already running.
- Control over the various threads in the system, for example manipulating their priorities.
- A choice of schedulers, determining which thread should currently be running.
- A range of synchronization primitives, allowing threads to interact and share data safely.
- Integration with the system's support for interrupts and exceptions.
In some other operating systems the kernel provides additional functionality. For example the kernel may also provide memory allocation functionality, and device drivers may be part of the kernel as well. This is not the case for eCos. Memory allocation is handled by a separate package. Similarly each device driver will typically be a separate package. Various packages are combined and configured using the eCos configuration technology to meet the requirements of the application.
The eCos kernel package is optional. It is possible to write single-threaded applications which do not use any kernel functionality, for example RedBoot. Typically such applications are based around a central polling loop, continually checking all devices and taking appropriate action when I/O occurs. A small amount of calculation is possible every iteration, at the cost of an increased delay between an I/O event occurring and the polling loop detecting the event. When the requirements are straightforward it may well be easier to develop the application using a polling loop, avoiding the complexities of multiple threads and synchronization between threads. As requirements get more complicated a multi-threaded solution becomes more appropriate, requiring the use of the kernel. In fact some of the more advanced packages in eCos, for example the TCP/IP stack, use multi-threading internally. Therefore if the application uses any of those packages then the kernel becomes a required package, not an optional one.
The kernel functionality can be used in one of two ways. The kernel
provides its own C API, with functions like
cyg_thread_create
and
cyg_mutex_lock
. These can be called directly from
application code or from other packages. Alternatively there are a
number of packages which provide compatibility with existing API's,
for example POSIX threads or µITRON. These allow application
code to call standard functions such as
pthread_create
, and those functions are
implemented using the basic functionality provided by the eCos kernel.
Using compatibility packages in an eCos application can make it much
easier to reuse code developed in other environments, and to share
code.
Although the different compatibility packages have similar
requirements on the underlying kernel, for example the ability to
create a new thread, there are differences in the exact semantics. For
example, strict µITRON compliance requires that kernel
timeslicing is disabled. This is achieved largely through the
configuration technology. The kernel provides a number of
configuration options that control the exact semantics that are
provided, and the various compatibility packages require particular
settings for those options. This has two important consequences.
First, it is not usually possible to have two different compatibility
packages in one eCos configuration because they will have conflicting
requirements on the underlying kernel. Second, the semantics of the
kernel's own API are only loosely defined because of the many
configuration options. For example cyg_mutex_lock
will always attempt to lock a mutex, but various configuration options
determine the behaviour when the mutex is already locked and there is
a possibility of priority inversion.
The optional nature of the kernel package presents some complications
for other code, especially device drivers. Wherever possible a device
driver should work whether or not the kernel is present. However there
are some parts of the system, especially those related to interrupt
handling, which should be implemented differently in multi-threaded
environments containing the eCos kernel and in single-threaded
environments without the kernel. To cope with both scenarios the
common HAL package provides a driver API, with functions such as
cyg_drv_interrupt_attach
. When the kernel package
is present these driver API functions map directly on to the
equivalent kernel functions such as
cyg_interrupt_attach
, using macros to avoid any
overheads. When the kernel is absent the common HAL package implements
the driver API directly, but this implementation is simpler than the
one in the kernel because it can assume a single-threaded environment.
Schedulers
When a system involves multiple threads, a scheduler is needed to determine which thread should currently be running. The eCos kernel can be configured with one of three schedulers, the bitmap scheduler, the multi-level queue (MLQ) scheduler and the SMP scheduler (MLQSMP). The bitmap scheduler is somewhat more efficient, but has a number of limitations. Most systems will instead use the MLQ scheduler, and MLQSMP in SMP configurations. Other schedulers may be added in the future, either as extensions to the kernel package or in separate packages.
Both the bitmap and the MLQ schedulers use a simple numerical priority
to determine which thread should be running. The number of priority
levels is configurable via the option
CYGNUM_KERNEL_SCHED_PRIORITIES
, but a typical
system will have up to 32 priority levels. Therefore thread priorities
will be in the range 0 to 31, with 0 being the highest priority and 31
the lowest. Usually only the system's idle thread will run at the
lowest priority. Thread priorities are absolute, so the kernel will
only run a lower-priority thread if all higher-priority threads are
currently blocked.
The bitmap scheduler only allows one thread per priority level, so if the system is configured with 32 priority levels then it is limited to only 32 threads — still enough for many applications. A simple bitmap can be used to keep track of which threads are currently runnable. Bitmaps can also be used to keep track of threads waiting on a mutex or other synchronization primitive. Identifying the highest-priority runnable or waiting thread involves a simple operation on the bitmap, and an array index operation can then be used to get hold of the thread data structure itself. This makes the bitmap scheduler fast and totally deterministic.
The MLQ schedulers (MLQ and MLQSMP) allows multiple threads to run at the same priority. This means that there is no limit on the number of threads in the system, other than the amount of memory available. However operations such as finding the highest priority runnable thread are a little bit more expensive than for the bitmap scheduler.
Optionally the MLQ schedulers support timeslicing, where the scheduler
automatically switches from one runnable thread to another when some
number of clock ticks have occurred. Timeslicing only comes into play
when there are two runnable threads at the same priority and no higher
priority runnable threads. If timeslicing is disabled then a thread
will not be preempted by another thread of the same priority, and will
continue running until either it explicitly yields the processor or
until it blocks by, for example, waiting on a synchronization
primitive. The configuration options
CYGSEM_KERNEL_SCHED_TIMESLICE
and
CYGNUM_KERNEL_SCHED_TIMESLICE_TICKS
control
timeslicing. The bitmap scheduler does not provide timeslicing
support. It only allows one thread per priority level, so it is not
possible to preempt the current thread in favour of another one with
the same priority.
An experimental timeslicing feature is also available in eCosPro,
which is "fair" timeslicing, and can be enabled with the
CYGSEM_KERNEL_SCHED_TIMESLICE_FAIR
configuration option.
By default, normal timeslicing does not guarantee that different
threads get a similar amount of CPU time. In fact, a thread could use
most of one of its timeslice ticks allocated to it, but then block
shortly before the tick occurs, at which point as far as the kernel is
concerned, it effectively used no time at all from that tick. In some
applications where some threads may often be operating on small parts of work
that take less than a tick, or where there is a periodic event such as
an external interrupt that regularly causes a thread to be pre-empted
and this can occasionally happen roughly synchronised to the kernel clock,
then this can result in other threads at the same priority being starved.
The fair timeslicing option seeks to prevent this by using information from
the underlying HAL clock to determine a more accurate view of how much
CPU time a thread has used. This is naturally at the expense of a slightly
greater context switch time. With this option enabled, threads should
become more fairly timesliced, although due to the granularity of the
kernel clock, there will always be a small error margin of roughly
half a kernel clock tick on average. This feature can be tested with the
timeslice_fair
kernel test.
Another important configuration option that affects the MLQ schedulers
is CYGIMP_KERNEL_SCHED_SORTED_QUEUES
. This
determines what happens when a thread blocks, for example by waiting
on a semaphore which has no pending events. The default behaviour of
the system is last-in-first-out queuing. For example if several
threads are waiting on a semaphore and an event is posted, the thread
that gets woken up is the last one that called
cyg_semaphore_wait
. This allows for a simple and
fast implementation of both the queue and dequeue operations. However
if there are several queued threads with different priorities, it may
not be the highest priority one that gets woken up. In practice this
is rarely a problem: usually there will be at most one thread waiting
on a queue, or when there are several threads they will be of the same
priority. However if the application does require strict priority
queueing then the option
CYGIMP_KERNEL_SCHED_SORTED_QUEUES
should be
enabled. There are disadvantages: more work is needed whenever a
thread is queued, and the scheduler needs to be locked for this
operation so the system's dispatch latency is worse. If the bitmap
scheduler is used then priority queueing is automatic and does not
involve any penalties.
Some kernel functionality is currently only supported with the MLQ schedulers, not the bitmap scheduler. This includes support for SMP systems, and protection against priority inversion using either mutex priority ceilings or priority inheritance.
The MLQSMP scheduler is a derivative of the MLQ scheduler that has some additional features for controlling thread affinity and CPU activation. The MLQ scheduler can support SMP operation, but does not support the additional features. By default, eCosPro uses the MLQSMP scheduler when configured for SMP operation.
Synchronization Primitives
The eCos kernel provides a number of different synchronization primitives: mutexes, condition variables, counting semaphores, mail boxes and event flags.
Mutexes serve a very different purpose from the other primitives. A mutex allows multiple threads to share a resource safely: a thread locks a mutex, manipulates the shared resource, and then unlocks the mutex again. The other primitives are used to communicate information between threads, or alternatively from a DSR associated with an interrupt handler to a thread.
When a thread that has locked a mutex needs to wait for some condition to become true, it should use a condition variable. A condition variable is essentially just a place for a thread to wait, and which another thread, or DSR, can use to wake it up. When a thread waits on a condition variable it releases the mutex before waiting, and when it wakes up it reacquires it before proceeding. These operations are atomic so that synchronization race conditions cannot be introduced.
A counting semaphore is used to indicate that a particular event has occurred. A consumer thread can wait for this event to occur, and a producer thread or a DSR can post the event. There is a count associated with the semaphore so if the event occurs multiple times in quick succession this information is not lost, and the appropriate number of semaphore wait operations will succeed.
Mail boxes are also used to indicate that a particular event has occurred, and allows for one item of data to be exchanged per event. Typically this item of data would be a pointer to some data structure. Because of the need to store this extra data, mail boxes have a finite capacity. If a producer thread generates mail box events faster than they can be consumed then, to avoid overflow, it will be blocked until space is again available in the mail box. This means that mail boxes usually cannot be used by a DSR to wake up a thread. Instead mail boxes are typically only used between threads.
Event flags can be used to wait on some number of different events, and to signal that one or several of these events have occurred. This is achieved by associating bits in a bit mask with the different events. Unlike a counting semaphore no attempt is made to keep track of the number of events that have occurred, only the fact that an event has occurred at least once. Unlike a mail box it is not possible to send additional data with the event, but this does mean that there is no possibility of an overflow and hence event flags can be used between a DSR and a thread as well as between threads.
The eCos common HAL package provides its own device driver API which contains some of the above synchronization primitives. These allow the DSR for an interrupt handler to signal events to higher-level code. If the configuration includes the eCos kernel package then the driver API routines map directly on to the equivalent kernel routines, allowing interrupt handlers to interact with threads. If the kernel package is not included and the application consists of just a single thread running in polled mode then the driver API is implemented entirely within the common HAL, and with no need to worry about multiple threads the implementation can obviously be rather simpler.
Threads and Interrupt Handling
During normal operation the processor will be running one of the threads in the system. This may be an application thread, a system thread running inside say the TCP/IP stack, or the idle thread. From time to time a hardware interrupt will occur, causing control to be transferred briefly to an interrupt handler. When the interrupt has been completed the system's scheduler will decide whether to return control to the interrupted thread or to some other runnable thread.
Threads and interrupt handlers must be able to interact. If a thread is waiting for some I/O operation to complete, the interrupt handler associated with that I/O must be able to inform the thread that the operation has completed. This can be achieved in a number of ways. One very simple approach is for the interrupt handler to set a volatile variable. A thread can then poll continuously until this flag is set, possibly sleeping for a clock tick in between. Polling continuously means that the CPU time is not available for other activities, which may be acceptable for some but not all applications. Polling once every clock tick imposes much less overhead, but means that the thread may not detect that the I/O event has occurred until an entire clock tick has elapsed. In typical systems this could be as long as 10 milliseconds. Such a delay might be acceptable for some applications, but not all.
A better solution would be to use one of the synchronization primitives. The interrupt handler could signal a condition variable, post to a semaphore, or use one of the other primitives. The thread would perform a wait operation on the same primitive. It would not consume any CPU cycles until the I/O event had occurred, and when the event does occur the thread can start running again immediately (subject to any higher priority threads that might also be runnable).
Synchronization primitives constitute shared data, so care must be taken to avoid problems with concurrent access. If the thread that was interrupted was just performing some calculations then the interrupt handler could manipulate the synchronization primitive quite safely. However if the interrupted thread happened to be inside some kernel call then there is a real possibility that some kernel data structure will be corrupted.
One way of avoiding such problems would be for the kernel functions to disable interrupts when executing any critical region. On most architectures this would be simple to implement and very fast, but it would mean that interrupts would be disabled often and for quite a long time. For some applications that might not matter, but many embedded applications require that the interrupt handler run as soon as possible after the hardware interrupt has occurred. If the kernel relied on disabling interrupts then it would not be able to support such applications.
Instead the kernel uses a two-level approach to interrupt handling. Associated with every interrupt vector is an Interrupt Service Routine or ISR, which will run as quickly as possible so that it can service the hardware. However an ISR can make only a small number of kernel calls, mostly related to the interrupt subsystem, and it cannot make any call that would cause a thread to wake up. If an ISR detects that an I/O operation has completed and hence that a thread should be woken up, it can cause the associated Deferred Service Routine or DSR to run. A DSR is allowed to make more kernel calls, for example it can signal a condition variable or post to a semaphore.
Disabling interrupts prevents ISRs from running, but very few parts of the system disable interrupts and then only for short periods of time. The main reason for a thread to disable interrupts is to manipulate some state that is shared with an ISR. For example if a thread needs to add another buffer to a linked list of free buffers and the ISR may remove a buffer from this list at any time, the thread would need to disable interrupts for the few instructions needed to manipulate the list. If the hardware raises an interrupt at this time, it remains pending until interrupts are reenabled.
Analogous to interrupts being disabled or enabled, the kernel has a
scheduler lock. The various kernel functions such as
cyg_mutex_lock
and
cyg_semaphore_post
will claim the scheduler lock,
manipulate the kernel data structures, and then release the scheduler
lock. If an interrupt results in a DSR being requested and the
scheduler is currently locked, the DSR remains pending. When the
scheduler lock is released any pending DSRs will run. These may post
events to synchronization primitives, causing other higher priority
threads to be woken up.
For an example, consider the following scenario. The system has a high priority thread A, responsible for processing some data coming from an external device. This device will raise an interrupt when data is available. There are two other threads B and C which spend their time performing calculations and occasionally writing results to a display of some sort. This display is a shared resource so a mutex is used to control access.
At a particular moment in time thread A is likely to be blocked,
waiting on a semaphore or another synchronization primitive until data
is available. Thread B might be running performing some calculations,
and thread C is runnable waiting for its next timeslice. Interrupts
are enabled, and the scheduler is unlocked because none of the threads
are in the middle of a kernel operation. At this point the device
raises an interrupt. The hardware transfers control to a low-level
interrupt handler provided by eCos which works out exactly which
interrupt occurs, and then the corresponding ISR is run. This ISR
manipulates the hardware as appropriate, determines that there is now
data available, and wants to wake up thread A by posting to the
semaphore. However ISR's are not allowed to call
cyg_semaphore_post
directly, so instead the ISR
requests that its associated DSR be run and returns. There are no more
interrupts to be processed, so the kernel next checks for DSR's. One
DSR is pending and the scheduler is currently unlocked, so the DSR can
run immediately and post the semaphore. This will have the effect of
making thread A runnable again, so the scheduler's data structures are
adjusted accordingly. When the DSR returns thread B is no longer the
highest priority runnable thread so it will be suspended, and instead
thread A gains control over the CPU.
In the above example no kernel data structures were being manipulated
at the exact moment that the interrupt happened. However that cannot
be assumed. Suppose that thread B had finished its current set of
calculations and wanted to write the results to the display. It would
claim the appropriate mutex and manipulate the display. Now suppose
that thread B was timesliced in favour of thread C, and that thread C
also finished its calculations and wanted to write the results to the
display. It would call cyg_mutex_lock
. This
kernel call locks the scheduler, examines the current state of the
mutex, discovers that the mutex is already owned by another thread,
suspends the current thread, and switches control to another runnable
thread. Another interrupt happens in the middle of this
cyg_mutex_lock
call, causing the ISR to run
immediately. The ISR decides that thread A should be woken up so it
requests that its DSR be run and returns back to the kernel. At this
point there is a pending DSR, but the scheduler is still locked by the
call to cyg_mutex_lock
so the DSR cannot run
immediately. Instead the call to cyg_mutex_lock
is allowed to continue, which at some point involves unlocking the
scheduler. The pending DSR can now run, safely post the semaphore, and
thus wake up thread A.
If the ISR had called cyg_semaphore_post
directly
rather than leaving it to a DSR, it is likely that there would have
been some sort of corruption of a kernel data structure. For example
the kernel might have completely lost track of one of the threads, and
that thread would never have run again. The two-level approach to
interrupt handling, ISR's and DSR's, prevents such problems with no
need to disable interrupts.
Calling Contexts
eCos defines a number of contexts. Only certain calls are allowed from inside each context, for example most operations on threads or synchronization primitives are not allowed from ISR context. The different contexts are initialization, thread, ISR and DSR.
When eCos starts up it goes through a number of phases, including
setting up the hardware and invoking C++ static constructors. During
this time interrupts are disabled and the scheduler is locked. When a
configuration includes the kernel package the final operation is a
call to cyg_scheduler_start
.
At this point interrupts are enabled, the scheduler is unlocked, and
control is transferred to the highest priority runnable thread. If the
configuration also includes the C library package then usually the C
library startup package will have created a thread which will call the
application's main
entry point.
Some application code can also run before the scheduler is started,
and this code runs in initialization context. If the application is
written partly or completely in C++ then the constructors for any
static objects will be run. Alternatively application code can define
a function cyg_user_start
which gets called after
any C++ static constructors. This allows applications to be written
entirely in C.
void cyg_user_start(void) { /* Perform application-specific initialization here */ }
It is not necessary for applications to provide a
cyg_user_start
function since the system will
provide a default implementation which does nothing.
Typical operations that are performed from inside static constructors
or cyg_user_start
include creating threads,
synchronization primitives, setting up alarms, and registering
application-specific interrupt handlers. In fact for many applications
all such creation operations happen at this time, using statically
allocated data, avoiding any need for dynamic memory allocation or
other overheads.
Code running in initialization context runs with interrupts disabled
and the scheduler locked. It is not permitted to reenable interrupts
or unlock the scheduler because the system is not guaranteed to be in
a totally consistent state at this point. A consequence is that
initialization code cannot use synchronization primitives such as
cyg_semaphore_wait
to wait for an external event.
It is permitted to lock and unlock a mutex: there are no other threads
running so it is guaranteed that the mutex is not yet locked, and
therefore the lock operation will never block; this is useful when
making library calls that may use a mutex internally.
At the end of the startup sequence the system will call
cyg_scheduler_start
and the various threads will
start running. In thread context nearly all of the kernel functions
are available. There may be some restrictions on interrupt-related
operations, depending on the target hardware. For example the hardware
may require that interrupts be acknowledged in the ISR or DSR before
control returns to thread context, in which case
cyg_interrupt_acknowledge
should not be called
by a thread.
At any time the processor may receive an external interrupt, causing control to be transferred from the current thread. Typically a VSR provided by eCos will run and determine exactly which interrupt occurred. Then the VSR will switch to the appropriate ISR, which can be provided by a HAL package, a device driver, or by the application. During this time the system is running at ISR context, and most of the kernel function calls are disallowed. This includes the various synchronization primitives, so for example an ISR is not allowed to post to a semaphore to indicate that an event has happened. Usually the only operations that should be performed from inside an ISR are ones related to the interrupt subsystem itself, for example masking an interrupt or acknowledging that an interrupt has been processed. On SMP systems it is also possible to use spinlocks from ISR context.
When an ISR returns it can request that the corresponding DSR be run
as soon as it is safe to do so, and that will run in DSR context. This
context is also used for running alarm functions, and threads can
switch temporarily to DSR context by locking the scheduler. Only
certain kernel functions can be called from DSR context, although more
than in ISR context. In particular it is possible to use any
synchronization primitives which cannot block. These include
cyg_semaphore_post
,
cyg_cond_signal
,
cyg_cond_broadcast
,
cyg_flag_setbits
, and
cyg_mbox_tryput
. It is not possible to use any
primitives that may block such as
cyg_semaphore_wait
,
cyg_mutex_lock
, or
cyg_mbox_put
. Calling such functions from inside
a DSR may cause the system to hang.
The specific documentation for the various kernel functions gives more details about valid contexts.
Error Handling and Assertions
In many APIs each function is expected to perform some validation of
its parameters and possibly of the current state of the system. This
is supposed to ensure that each function is used correctly, and that
application code is not attempting to perform a semaphore operation on
a mutex or anything like that. If an error is detected then a suitable
error code is returned, for example the POSIX function
pthread_mutex_lock
can return various error codes
including EINVAL
and EDEADLK
.
There are a number of problems with this approach, especially in the
context of deeply embedded systems:
- Performing these checks inside the mutex lock and all the other functions requires extra CPU cycles and adds significantly to the code size. Even if the application is written correctly and only makes system function calls with sensible arguments and under the right conditions, these overheads still exist.
- Returning an error code is only useful if the calling code detects these error codes and takes appropriate action. In practice the calling code will often ignore any errors because the programmer “knows” that the function is being used correctly. If the programmer is mistaken then an error condition may be detected and reported, but the application continues running anyway and is likely to fail some time later in mysterious ways.
- If the calling code does always check for error codes, that adds yet more CPU cycles and code size overhead.
-
Usually there will be no way to recover from certain errors, so if the
application code detected an error such as
EINVAL
then all it could do is abort the application somehow.
The approach taken within the eCos kernel is different. Functions such
as cyg_mutex_lock
will not return an error code.
Instead they contain various assertions, which can be enabled or
disabled. During the development process assertions are normally left
enabled, and the various kernel functions will perform parameter
checks and other system consistency checks. If a problem is detected
then an assertion failure will be reported and the application will be
terminated. In a typical debug session a suitable breakpoint will have
been installed and the developer can now examine the state of the
system and work out exactly what is going on. Towards the end of the
development cycle assertions will be disabled by manipulating
configuration options within the eCos infrastructure package, and all
assertions will be eliminated at compile-time. The assumption is that
by this time the application code has been mostly debugged: the
initial version of the code might have tried to perform a semaphore
operation on a mutex, but any problems like that will have been fixed
some time ago. This approach has a number of advantages:
- In the final application there will be no overheads for checking parameters and other conditions. All that code will have been eliminated at compile-time.
- Because the final application will not suffer any overheads, it is reasonable for the system to do more work during the development process. In particular the various assertions can test for more error conditions and more complicated errors. When an error is detected it is possible to give a text message describing the error rather than just return an error code.
- There is no need for application programmers to handle error codes returned by various kernel function calls. This simplifies the application code.
- If an error is detected then an assertion failure will be reported immediately and the application will be halted. There is no possibility of an error condition being ignored because application code did not check for an error code.
Although none of the kernel functions return an error code, many of
them do return a status condition. For example the function
cyg_semaphore_timed_wait
waits until either an
event has been posted to a semaphore, or until a certain number of
clock ticks have occurred. Usually the calling code will need to know
whether the wait operation succeeded or whether a timeout occurred.
cyg_semaphore_timed_wait
returns a boolean: a
return value of zero or false indicates a timeout, a non-zero return
value indicates that the wait succeeded.
In conventional APIs one common error conditions is lack of memory.
For example the POSIX function pthread_create
usually has to allocate some memory dynamically for the thread stack
and other per-thread data. If the target hardware does not have enough
memory to meet all demands, or more commonly if the application
contains a memory leak, then there may not be enough memory available
and the function call would fail. The eCos kernel avoids such problems
by never performing any dynamic memory allocation. Instead it is the
responsibility of the application code to provide all the memory
required for kernel data structures and other needs. In the case of
cyg_thread_create
this means a
cyg_thread data structure to hold the thread
details, and a char array for the thread stack.
In many applications this approach results in all data structures
being allocated statically rather than dynamically. This has several
advantages. If the application is in fact too large for the target
hardware's memory then there will be an error at link-time rather than
at run-time, making the problem much easier to diagnose. Static
allocation does not involve any of the usual overheads associated with
dynamic allocation, for example there is no need to keep track of the
various free blocks in the system, and it may be possible to eliminate
malloc
from the system completely. Problems such
as fragmentation and memory leaks cannot occur if all data is
allocated statically. However, some applications are sufficiently
complicated that dynamic memory allocation is required, and the
various kernel functions do not distinguish between statically and
dynamically allocated memory. It still remains the responsibility of
the calling code to ensure that sufficient memory is available, and
passing null pointers to the kernel will result in assertions or
system failure.
2025-01-10 | Open Publication License |