Name
HAL Port — Implementation Details
Description
This section of the documentation gives an outline description of the most important parts of the architectural HAL package, and especially how the eCos HAL specification has been mapped on to the TILE-Gx hardware.
The TILE-Gx HAL is organized somewhat differently from HALs for other targets. Typically an eCos port involves three or four separate HAL packages: the architectural HAL handles features that are common to every chip within an architecture; a variant HAL copes with different families within an architecture, for example one family may support only on-chip memory and no MMU while another family is designed for use with external memory; within a family there may be different processors supporting different sets of peripherals; finally a platform HAL handles anything specific to the circuit board rather than to the chip, for example the amount of external memory. The TILE-Gx port does not require these complications. Most of the hardware variations will be handled by the hypervisor or within the Linux partition and do not affect the eCos side of things.
Data Types
The eCos port to the TILE-Gx architecture only supports 32-bit
mode. int
, long
and pointers
are all 32 bits, and a long long
is 64 bit. The
chip always runs in little-endian mode.
Header Files
The architectural HAL package provides the standard HAL header files
cyg/hal/hal_arch.h
,
cyg/hal/hal_intr.h
,
cyg/hal/hal_io.h
, and so on.
However there is one important difference between the TILE-Gx versions
of these headers and their equivalents for other architectures.
Typically hal_intr.h
provides
definitions of all the interrupt and exception vectors, directly or
indirectly. Similarly hal_io.h
provides definitions for some or all of the on-chip peripherals. These
definitions are required by some parts of eCos, for example the kernel
needs to know which interrupt corresponds to the system clock, but
they are specific to individual processors or variants. The gcc
toolchain supports the architecture as a whole so cannot supply these
definitions.
For TILE-Gx the situation is different. The multicore development
environment comes with a full set of definitions in various headers
below the arch
subdirectory,
for example arch/interrupts.h
defines all the interrupt vectors,
and arch/spr_def.h
defines the special purpose registers.
tile-gcc will find these header files
automatically. The headers are intended for use by the Linux kernel
but are mostly usable by eCos and eCos applications. There may be some
instances where these header files reference Linux-specific
functionality, or where they assume a 64-bit build. Duplicating all
this information in the eCos header files would serve no purpose.
Memory Layout and the Linker Script
The memory layout for a ROM startup application or for gdbstubs is as follows:
Location | Purpose |
0x00010000 | Code (.text section, read-only data) |
0x10000000 | Data (.data and .bss sections, eCos heap) |
0x6C000000 | Startup stack and Hypervisor Data |
0xFFFFFFFFFE000000 | Interrupt vectors |
The amount of memory allocated for the ROM region is just large enough to hold the application's code and read-only data, plus enough for a shadow copy of the .data initialized static data section. That shadow copy is needed to allow for platform restarts. The hypervisor will round up this amount to a suitable page boundary.
eCos memory accesses go through the MMU so the above memory locations are translated to physical addresses. Therefore one eCos tile's location 0x10000000 will correspond to a different physical memory address from another tile's location 0x10000000, and these memory regions are not shared between tiles.
The amount of memory allocated for the RAM region is determined by the
CYGNUM_HAL_TILEGX_RAM_SIZE configuration option,
but can be overridden by passing a suitable -m
<size>
option
to tile-ecos-32to64 when the executable is
converted into a format suitable for including into a hypervisor boot
image. The first 4K are reserved for a system data
structure hal_tilegx_global_state
which contains
information such as the virtual vector table, interrupt-related data,
and MMU settings. In a debug system this data structure must be shared
between the ROM-startup gdbstubs and the RAM-startup application,
which is most conveniently done by placing it at a well-known
location.
In a debug system the next 60K of the RAM region, up to location 0x10010000, is reserved for use by gdbstubs. The remaining memory holds the application code, data, and heap.
The hypervisor allocates some additional memory at location 0x6c000000
(strictly, at the pertile va location specified in the hypervisor
configuration file, but that location should be 0x6c000000). The
hypervisor assumes that the application is an ordinary BME application
which will need memory for its stack and heap, and which will also
need information from the hypervisor such as the tile's CPU speed. The
eCos requirements are different but there is no easy way to return
this memory to the hypervisor. Instead eCos can still make good use of
it for the startup and interrupt stack, and it does need some
of the same information from the hypervisor. If an application wishes
to access this hypervisor information it can do so via
hal_tilegx_global_state.hv_global_info
, as defined
in cyg/hal/hal_tilegx.h
.
When an interrupt occurs the cpu branches to a location determined by the current cpu protection level and the interrupt vector number. eCos always runs at protection level 2, which means that the relevant locations occupy approximately 16K starting at location 0xFFFF_FFFF_FE00_0000. eCos needs to provide these interrupt vectors so an executable contains a section for this. Strictly the location can be changed by manipulating the INTERRUPT_VECTOR_BASE_2 special purpose register but there is no good reason for doing so.
The linker script src/tilegx.ld
defines all the
above, in conjunction
with pkgconf/hal_tilegx.h
and pkgconf/mlt_tilegx.h
.
For a RAM startup application running on top of gdbstubs, the
application's code will be placed at location 0x10010000 onwards,
immediately after the memory reserved
for hal_tilegx_global_state
and gdbstubs. The
application's static data will follow immediately after the code. The
rest of the memory will be allocated to the system heap for dynamic
memory allocation. A RAM startup application will run in the memory
map set up by the hypervisor, so the RAM size is determined by the
CYGNUM_HAL_TILEGX_RAM_SIZE
option used when
gdbstubs was configured, or alternatively by the memory size passed to
tile-ecos-32to64. The RAM startup initialization
code will determine the actual amount of RAM and size the system heap
accordingly.
Startup
During bootstrap the hypervisor will initialize all BME tiles as per the configuration file, allocating memory as per the executable's memory map, then jumping to the executable's entry point. The hypervisor runs at protection level 2, and starts the executable with the same protection level. There is never any need for eCos to change this protection level, and doing so would introduce various complications especially in the interrupt handling code.
For ROM startup the application entry point
is hal_tilegx_start
in src/vectors.S
.
If the application has been started by the hypervisor then that will
have already taken care of much of the low-level initialization, for
example zeroing the .bss uninitialized static data region. However it
is also possible for an application to perform a restart. This happens
most commonly when a maintenance packet r command
is issued from inside tile-gdb
and the debug
session is then terminated, but a restart can also be caused by a
double fault exception or by using
the HAL_PLATFORM_RESET()
macro defined in
cyg/hal/hal_intr.h
. After a
restart the assembler initialization code needs to do rather more
work, including restoring all initialized static data to their
original values, zeroing all uninitialized static data, and switching
to an appropriate stack.
If eCos has been built with configuration
option CYGHWR_HAL_TILEGX_SIMULATOR
enabled and if
it is actually running inside the simulator then the application will
halt at this point, allowing the user to
attach tile-gdb.
Once the assembler initialization code has finished it jumps
to the C function hal_tilegx_c_startup()
,
defined in src/tilegx.c
. This performs
initialization or reinitialization of various other subsystems
including the memory management unit's translation lookaside buffers
(TLBs), interrupt handling, virtual vectors, and gdbstubs as
appropriate. Finally it runs through any C++ static constructors,
including those for other eCos packages like the eCos kernel, and
calls the generic cyg_start()
routine.
For RAM startup the application entry point is
again hal_tilegx_start
in src/vectors.S
, but the code executed is
somewhat different from that for ROM startup. Again there is a jump to
hal_tilegx_c_startup()
in src/tilegx.c
, and from there
to cyg_start()
.
Thread Contexts
The HAL_SavedRegisters structure defined
in cyg/hal/hal_arch.h
defines
the storage needed for saving and restoring a thread context during
context switches and interrupt handling. Mostly it consists of the
registers r0-r53, but there is some additional state which overlaps
the stack frames defined by the TILE-Gx ABI. The details are generally
of no interest to application developers.
There is one piece of system state which is not held in the saved
context structure and which arguably should be: the special purpose
register SPR_CMPEXCH_VALUE
. This register is not
used in ordinary code. It serves only to help implement shared memory
spinlocks:
int result; __insn_mtspr(SPR_CMPEXCH_VALUE, oldval); result = __insn_cmpexch4(&spinlock, newval);
It is possible for an interrupt to occur between
setting SPR_CMPEXCH_VALUE
and applying
the cmpexch4
or cmpexch
instructions, and the register may get overwritten before the code
resumes. The Linux kernel saves and restores this special purpose
register during interrupt handling, adding several cycles to the
interrupt latency. The eCos HAL does not save this register on the
assumption, and instead the spinlock code has to disable interrupts
around the above pair of instructions:
CYG_INTERRUPT_STATE ints_state; int result; HAL_DISABLE_INTERRUPTS(ints_state); __insn_mtspr(SPR_CMPEXCH_VALUE, oldval); result = __insn_cmpexch4(&spinlock, newval); HAL_RESTORE_INTERRUPTS(ints_state);
This makes interrupt handling more efficient but spinlocks more expensive. Since eCos does not support SMP operations spinlocks are unlikely to be used often, and it is expected that this approach will be a net performance gain.
Interrupts
Interrupt management requires several pieces of functionality. First
it must be possible to disable and reenable interrupts, so that
critical code sections can run atomically. Second it must be possible
to mask and unmask individual interrupt sources. Third, if support for
nested interrupts is enabled via the configuration option
CYGSEM_HAL_COMMON_INTERRUPTS_ALLOW_NESTING
then it
should be possible to assign priorities to the various interrupts,
such that inside an interrupt handler lower priority interrupts are
masked and higher priority interrupts are unmasked. Finally other eCos
code including the kernel and any device drivers must be able to
register their own interrupt handling functions. There are two
versions of such handlers: a low-level VSR must be written in
assembler, but is called very early after an interrupt triggers; a
higher-level ISR can be written in C, but the system needs to do more
work before the ISR can be called.
Each TILE-Gx tile has two special purpose registers or SPRs which
control how interrupts are
handled. INTERRUPT_MASK_2
can be used to mask or
unmask the various interrupt
sources (there are other registers for protection levels 0, 1,
and 3 but those are irrelevant to the eCos port).
INTERRUPT_CRITICAL_SECTION
can be used to block all
maskable interrupts. At first glance this second register could be
used to implement the disable/reenable functionality. Unfortunately
that does not quite work. If a CPU exception occurs while
INTERRUPT_CRITICAL_SECTION
is set then that is
treated as a non-recoverable double fault. Since gdbstubs depends on
CPU exceptions for some of the debug functionality, the implementation
takes a different approach.
During normal execution INTERRUPT_MASK_2
holds the
set of all interrupts that are currently masked, as expected. A shadow
copy of this set is held in the
global hal_tilegx_global_state.global_interrupt_mask
.
Disabling interrupts involves
setting INTERRUPT_MASK_2
to 0xFFFF_FFFF_FFFF_FFFF,
and reenabling interrupts involves
restoring INTERRUPT_MASK_2
as per the shadow copy.
The above explanation is actually oversimplified. Implementing
prioritized nested interrupts requires some additional complications.
Associated with each interrupt source is an interrupt mask holding
the set of all interrupts with equal or lower priorities. There are
also two pseudo-interrupt sources, none
and disabled
, with associated masks 0 and
0xFFFF_FFFF_FFFF_FFFF. At any time the value of the
INTERRUPT_MASK_2
SPR is the union of the global
interrupt mask and the current interrupt's mask. During normal
execution the current interrupt is none
so
INTERRUPT_MASK_2
holds the same value as the global
interrupt mask. When interrupts are disabled the current interrupt is
disabled
so INTERRUPT_MASK_2
holds 0xFFFF_FFFF_FFFF_FFFF. While processing an
interrupt INTERRUPT_MASK_2
holds all globally
masked interrupts and all interrupts masked for the current interrupt.
Keeping everything up to date in the right order requires considerable
care, but achieves the desired functionality.
Assuming an unmasked interrupt triggers, the hardware jumps to
location 0xFFFF_FFFF_FE00_0000 + (0x100 * interrupt_number). The ROM
startup executable or gdbstubs provides the code that resides at that
location, as per the macro intvec
in
src/vectors.S
. This initial code allocates space
for a HAL_SavedRegisters structure on the
stack , saves a small number of registers, loads a per-interrupt VSR
function pointer from hal_tilegx_global_state
, and
jumps to that VSR. Usually that VSR will
be hal_default_interrupt_vsr
, again in
src/vectors.S
, but applications can install their
own VSR functions if interrupt latency is particularly critical for an
interrupt source. Any such VSR is likely to be based at least in part
on the default one.
The default VSR saves additional registers, updates the current
interrupt field in hal_tilegx_global_state
and
the INTERRUPT_MASK_2
SPR, synchronizes with the
kernel, and enables nested interrupts. It then calls the ISR
associated with the current interrupt. ISRs can be written in C but
there are constraints on what they are allowed to do. More information
on this is provided in the kernel documentation. When the ISR returns
the VSR performs additional processing, possibly including a context
switch to a higher-priority thread that is now runnable, before
eventually returning to the interrupted code.
CPU Exceptions
On TILE-Gx exceptions like SIGILL, an illegal instruction
exception, are implemented in much the same way as interrupts.
However exceptions cannot be masked. The CPU jumps to a location near
0xFFFF_FFFF_FE00_0000, where the ROM startup executable or gdbstubs
will have placed suitable code. That code jumps to a VSR, which this
time will usually be hal_default_exception_vsr
instead of hal_default_interrupt_vsr
. This in
turn calls hal_tilegx_exception_handler()
in
src/tilegx.c
which will usually deliver the
exception to the kernel. There are various special cases, for example
a SIGILL exception may be the result of hitting
a tile-gdb breakpoint.
One of the exceptions is special: double fault. This occurs when
a CPU exception occurs while
the INTERRUPT_CRITICAL_SECTION
SPR is set.
Double faults are not recoverable: critical information held in other
SPRs will have been overwritten. If running in the simulator and
CYGHWR_HAL_TILEGX_SIMULATOR
is set then the
simulation will be halted. Otherwise, in the absence of a better
solution, an attempt will be made to restart the system. The structure
field hal_tilegx_global_state.started_by
will be
set to hal_tilegx_started_by_double_fault
, allowing
application code to detect this after the restart and take any action
that might be appropriate. Double faults should be rare, but
application developers should be aware of the possibility.
The Idle Thread
The kernel's idle thread will execute the nap
instruction, causing the tile to sleep until the next interrupt
occurs.
The System Clock
The kernel clock has been implemented using the TILE_TIMER_CONTROL special purpose register, so that hardware is not available for use by application code. The auxiliary tile timer is available for use by the application, but note that the simulator does not implement that timer.
The counter value programmed into the TILE_TIMER_CONTROL register is
determined from the system clock frequency, which is information
provided by the hypervisor, and from the configuration options
CYGNUM_HAL_RTC_NUMERATOR
and CYGNUM_HAL_RTC_DENOMINATOR
. The hardware does
not support automatic reloading of the counter when a clock interrupt
occurs, so the interrupt handler has to reload it explicitly. That
code attempts to compensate for the time taken from the interrupt
triggering to the counter being reloaded. The accuracy cannot be
completely guaranteed, especially in a debug system, so a small amount
of clock drift may occur.
The Cache
The TILE-Gx architecture has a complicated caching system including
per-tile primary and secondary caches and a distributed tertiary
cache. The hardware maintains data cache coherency so there is very
rarely any need for code to exercise fine-grained control over the
cache such as flushing cachelines. Therefore the various cache-related
eCos macros like HAL_DCACHE_SYNC()
are defined as
no-ops. The hardware does not maintain coherency between the
instruction and data cache so eCos does define a number of instruction
cache macros like HAL_ICACHE_INVALIDATE()
. These
are needed by the gdbstubs code to implement breakpoints, and are
unlikely to be of any interest to application developers.
Diagnostics
The port supports two destinations for diagnostic output. Applications
built for ROM startup will send their diagnostic output to the
hypervisor over the IDN bus, and the hypervisor will output the text
on the system console. For a RAM startup application diagnostic output
will normally be sent to tile-gdb via LittleBoPeep,
but the output can be redirected to the hypervisor if desired. This
behaviour is controlled by
the CYGHWR_HAL_TILEGX_DIAGNOSTICS_DESTINATION
configuration option.
Other Functionality
The TILE-Gx architectural HAL provides two non-standard functions which can be used to manage the MMU settings:
#include <cyg/hal/hal_tilegx.h> int hal_tilegx_mmap(unsigned long long virtual_address, unsigned long long physical_address, unsigned long long dtlb_attributes); void hal_tilegx_munmap(unsigned long long virtual_address);
hal_tilegx_mmap()
can be used to map a physical
address into the tile's virtual address space with the specified
attributes. The physical address can correspond to real memory.
Typically this will be allocated by a process running in the Linux
partition, and the details can then be forwarded to an eCos
application which will map it into its address space. Alternatively
the physical address can correspond to a memory-mapped device.
The virtual address can be anywhere in the address space that is not
already used, but preferably in the range 0x0000_0000 to 0x7FFF_FFFF
to avoid problems with 32-bit pointers. Low memory is normally used
for the system's ROM and RAM regions and 0x6C00_0000 is used for the
hypervisor data, but anywhere between 0x4000_0000 to 0x6800_0000 or
0x7000_0000 to 0x7FFF_FFFF is normally fine. The address should be
aligned to a boundary suitable for the block size. The final argument
will be written to the DTLB_CURRENT_ATTR
SPR and
consists of numerous fields. The Tilera documentation should be
consulted for more information. hal_tilegx_mmap()
returns 0 if the operation fails, typically because all of the data
TLBs are already in use, or 1 on success.
hal_tilegx_munmap()
can be used to undo a
previous hal_tilegx_mmap()
call.
2024-03-18 | eCosPro Non-Commercial Public License |