HAL Port — Implementation Details


This section of the documentation gives an outline description of the most important parts of the architectural HAL package, and especially how the eCos HAL specification has been mapped on to the TILE-Gx hardware.

The TILE-Gx HAL is organized somewhat differently from HALs for other targets. Typically an eCos port involves three or four separate HAL packages: the architectural HAL handles features that are common to every chip within an architecture; a variant HAL copes with different families within an architecture, for example one family may support only on-chip memory and no MMU while another family is designed for use with external memory; within a family there may be different processors supporting different sets of peripherals; finally a platform HAL handles anything specific to the circuit board rather than to the chip, for example the amount of external memory. The TILE-Gx port does not require these complications. Most of the hardware variations will be handled by the hypervisor or within the Linux partition and do not affect the eCos side of things.

Data Types

The eCos port to the TILE-Gx architecture only supports 32-bit mode. int, long and pointers are all 32 bits, and a long long is 64 bit. The chip always runs in little-endian mode.

Header Files

The architectural HAL package provides the standard HAL header files cyg/hal/hal_arch.h, cyg/hal/hal_intr.h, cyg/hal/hal_io.h, and so on. However there is one important difference between the TILE-Gx versions of these headers and their equivalents for other architectures. Typically hal_intr.h provides definitions of all the interrupt and exception vectors, directly or indirectly. Similarly hal_io.h provides definitions for some or all of the on-chip peripherals. These definitions are required by some parts of eCos, for example the kernel needs to know which interrupt corresponds to the system clock, but they are specific to individual processors or variants. The gcc toolchain supports the architecture as a whole so cannot supply these definitions.

For TILE-Gx the situation is different. The multicore development environment comes with a full set of definitions in various headers below the arch subdirectory, for example arch/interrupts.h defines all the interrupt vectors, and arch/spr_def.h defines the special purpose registers. tile-gcc will find these header files automatically. The headers are intended for use by the Linux kernel but are mostly usable by eCos and eCos applications. There may be some instances where these header files reference Linux-specific functionality, or where they assume a 64-bit build. Duplicating all this information in the eCos header files would serve no purpose.

Memory Layout and the Linker Script

The memory layout for a ROM startup application or for gdbstubs is as follows:

0x00010000Code (.text section, read-only data)
0x10000000Data (.data and .bss sections, eCos heap)
0x6C000000Startup stack and Hypervisor Data
0xFFFFFFFFFE000000Interrupt vectors

The amount of memory allocated for the ROM region is just large enough to hold the application's code and read-only data, plus enough for a shadow copy of the .data initialized static data section. That shadow copy is needed to allow for platform restarts. The hypervisor will round up this amount to a suitable page boundary.

eCos memory accesses go through the MMU so the above memory locations are translated to physical addresses. Therefore one eCos tile's location 0x10000000 will correspond to a different physical memory address from another tile's location 0x10000000, and these memory regions are not shared between tiles.

The amount of memory allocated for the RAM region is determined by the CYGNUM_HAL_TILEGX_RAM_SIZE configuration option, but can be overridden by passing a suitable -m <size> option to tile-ecos-32to64 when the executable is converted into a format suitable for including into a hypervisor boot image. The first 4K are reserved for a system data structure hal_tilegx_global_state which contains information such as the virtual vector table, interrupt-related data, and MMU settings. In a debug system this data structure must be shared between the ROM-startup gdbstubs and the RAM-startup application, which is most conveniently done by placing it at a well-known location.

In a debug system the next 60K of the RAM region, up to location 0x10010000, is reserved for use by gdbstubs. The remaining memory holds the application code, data, and heap.

The hypervisor allocates some additional memory at location 0x6c000000 (strictly, at the pertile va location specified in the hypervisor configuration file, but that location should be 0x6c000000). The hypervisor assumes that the application is an ordinary BME application which will need memory for its stack and heap, and which will also need information from the hypervisor such as the tile's CPU speed. The eCos requirements are different but there is no easy way to return this memory to the hypervisor. Instead eCos can still make good use of it for the startup and interrupt stack, and it does need some of the same information from the hypervisor. If an application wishes to access this hypervisor information it can do so via hal_tilegx_global_state.hv_global_info, as defined in cyg/hal/hal_tilegx.h.

When an interrupt occurs the cpu branches to a location determined by the current cpu protection level and the interrupt vector number. eCos always runs at protection level 2, which means that the relevant locations occupy approximately 16K starting at location 0xFFFF_FFFF_FE00_0000. eCos needs to provide these interrupt vectors so an executable contains a section for this. Strictly the location can be changed by manipulating the INTERRUPT_VECTOR_BASE_2 special purpose register but there is no good reason for doing so.

The linker script src/tilegx.ld defines all the above, in conjunction with pkgconf/hal_tilegx.h and pkgconf/mlt_tilegx.h.

For a RAM startup application running on top of gdbstubs, the application's code will be placed at location 0x10010000 onwards, immediately after the memory reserved for hal_tilegx_global_state and gdbstubs. The application's static data will follow immediately after the code. The rest of the memory will be allocated to the system heap for dynamic memory allocation. A RAM startup application will run in the memory map set up by the hypervisor, so the RAM size is determined by the CYGNUM_HAL_TILEGX_RAM_SIZE option used when gdbstubs was configured, or alternatively by the memory size passed to tile-ecos-32to64. The RAM startup initialization code will determine the actual amount of RAM and size the system heap accordingly.


During bootstrap the hypervisor will initialize all BME tiles as per the configuration file, allocating memory as per the executable's memory map, then jumping to the executable's entry point. The hypervisor runs at protection level 2, and starts the executable with the same protection level. There is never any need for eCos to change this protection level, and doing so would introduce various complications especially in the interrupt handling code.

For ROM startup the application entry point is hal_tilegx_start in src/vectors.S. If the application has been started by the hypervisor then that will have already taken care of much of the low-level initialization, for example zeroing the .bss uninitialized static data region. However it is also possible for an application to perform a restart. This happens most commonly when a maintenance packet r command is issued from inside tile-gdb and the debug session is then terminated, but a restart can also be caused by a double fault exception or by using the HAL_PLATFORM_RESET() macro defined in cyg/hal/hal_intr.h. After a restart the assembler initialization code needs to do rather more work, including restoring all initialized static data to their original values, zeroing all uninitialized static data, and switching to an appropriate stack.

If eCos has been built with configuration option CYGHWR_HAL_TILEGX_SIMULATOR enabled and if it is actually running inside the simulator then the application will halt at this point, allowing the user to attach tile-gdb.

Once the assembler initialization code has finished it jumps to the C function hal_tilegx_c_startup(), defined in src/tilegx.c. This performs initialization or reinitialization of various other subsystems including the memory management unit's translation lookaside buffers (TLBs), interrupt handling, virtual vectors, and gdbstubs as appropriate. Finally it runs through any C++ static constructors, including those for other eCos packages like the eCos kernel, and calls the generic cyg_start() routine.

For RAM startup the application entry point is again hal_tilegx_start in src/vectors.S, but the code executed is somewhat different from that for ROM startup. Again there is a jump to hal_tilegx_c_startup() in src/tilegx.c, and from there to cyg_start().

Thread Contexts

The HAL_SavedRegisters structure defined in cyg/hal/hal_arch.h defines the storage needed for saving and restoring a thread context during context switches and interrupt handling. Mostly it consists of the registers r0-r53, but there is some additional state which overlaps the stack frames defined by the TILE-Gx ABI. The details are generally of no interest to application developers.

There is one piece of system state which is not held in the saved context structure and which arguably should be: the special purpose register SPR_CMPEXCH_VALUE. This register is not used in ordinary code. It serves only to help implement shared memory spinlocks:

      int result;
      __insn_mtspr(SPR_CMPEXCH_VALUE, oldval);
      result = __insn_cmpexch4(&spinlock, newval);

It is possible for an interrupt to occur between setting SPR_CMPEXCH_VALUE and applying the cmpexch4 or cmpexch instructions, and the register may get overwritten before the code resumes. The Linux kernel saves and restores this special purpose register during interrupt handling, adding several cycles to the interrupt latency. The eCos HAL does not save this register on the assumption, and instead the spinlock code has to disable interrupts around the above pair of instructions:

      CYG_INTERRUPT_STATE ints_state;
      int result;
      __insn_mtspr(SPR_CMPEXCH_VALUE, oldval);
      result = __insn_cmpexch4(&spinlock, newval);

This makes interrupt handling more efficient but spinlocks more expensive. Since eCos does not support SMP operations spinlocks are unlikely to be used often, and it is expected that this approach will be a net performance gain.


Interrupt management requires several pieces of functionality. First it must be possible to disable and reenable interrupts, so that critical code sections can run atomically. Second it must be possible to mask and unmask individual interrupt sources. Third, if support for nested interrupts is enabled via the configuration option CYGSEM_HAL_COMMON_INTERRUPTS_ALLOW_NESTING then it should be possible to assign priorities to the various interrupts, such that inside an interrupt handler lower priority interrupts are masked and higher priority interrupts are unmasked. Finally other eCos code including the kernel and any device drivers must be able to register their own interrupt handling functions. There are two versions of such handlers: a low-level VSR must be written in assembler, but is called very early after an interrupt triggers; a higher-level ISR can be written in C, but the system needs to do more work before the ISR can be called.

Each TILE-Gx tile has two special purpose registers or SPRs which control how interrupts are handled. INTERRUPT_MASK_2 can be used to mask or unmask the various interrupt sources (there are other registers for protection levels 0, 1, and 3 but those are irrelevant to the eCos port). INTERRUPT_CRITICAL_SECTION can be used to block all maskable interrupts. At first glance this second register could be used to implement the disable/reenable functionality. Unfortunately that does not quite work. If a CPU exception occurs while INTERRUPT_CRITICAL_SECTION is set then that is treated as a non-recoverable double fault. Since gdbstubs depends on CPU exceptions for some of the debug functionality, the implementation takes a different approach.

During normal execution INTERRUPT_MASK_2 holds the set of all interrupts that are currently masked, as expected. A shadow copy of this set is held in the global hal_tilegx_global_state.global_interrupt_mask. Disabling interrupts involves setting INTERRUPT_MASK_2 to 0xFFFF_FFFF_FFFF_FFFF, and reenabling interrupts involves restoring INTERRUPT_MASK_2 as per the shadow copy.

The above explanation is actually oversimplified. Implementing prioritized nested interrupts requires some additional complications. Associated with each interrupt source is an interrupt mask holding the set of all interrupts with equal or lower priorities. There are also two pseudo-interrupt sources, none and disabled, with associated masks 0 and 0xFFFF_FFFF_FFFF_FFFF. At any time the value of the INTERRUPT_MASK_2 SPR is the union of the global interrupt mask and the current interrupt's mask. During normal execution the current interrupt is none so INTERRUPT_MASK_2 holds the same value as the global interrupt mask. When interrupts are disabled the current interrupt is disabled so INTERRUPT_MASK_2 holds 0xFFFF_FFFF_FFFF_FFFF. While processing an interrupt INTERRUPT_MASK_2 holds all globally masked interrupts and all interrupts masked for the current interrupt. Keeping everything up to date in the right order requires considerable care, but achieves the desired functionality.

Assuming an unmasked interrupt triggers, the hardware jumps to location 0xFFFF_FFFF_FE00_0000 + (0x100 * interrupt_number). The ROM startup executable or gdbstubs provides the code that resides at that location, as per the macro intvec in src/vectors.S. This initial code allocates space for a HAL_SavedRegisters structure on the stack , saves a small number of registers, loads a per-interrupt VSR function pointer from hal_tilegx_global_state, and jumps to that VSR. Usually that VSR will be hal_default_interrupt_vsr, again in src/vectors.S, but applications can install their own VSR functions if interrupt latency is particularly critical for an interrupt source. Any such VSR is likely to be based at least in part on the default one.

The default VSR saves additional registers, updates the current interrupt field in hal_tilegx_global_state and the INTERRUPT_MASK_2 SPR, synchronizes with the kernel, and enables nested interrupts. It then calls the ISR associated with the current interrupt. ISRs can be written in C but there are constraints on what they are allowed to do. More information on this is provided in the kernel documentation. When the ISR returns the VSR performs additional processing, possibly including a context switch to a higher-priority thread that is now runnable, before eventually returning to the interrupted code.

CPU Exceptions

On TILE-Gx exceptions like SIGILL, an illegal instruction exception, are implemented in much the same way as interrupts. However exceptions cannot be masked. The CPU jumps to a location near 0xFFFF_FFFF_FE00_0000, where the ROM startup executable or gdbstubs will have placed suitable code. That code jumps to a VSR, which this time will usually be hal_default_exception_vsr instead of hal_default_interrupt_vsr. This in turn calls hal_tilegx_exception_handler() in src/tilegx.c which will usually deliver the exception to the kernel. There are various special cases, for example a SIGILL exception may be the result of hitting a tile-gdb breakpoint.

One of the exceptions is special: double fault. This occurs when a CPU exception occurs while the INTERRUPT_CRITICAL_SECTION SPR is set. Double faults are not recoverable: critical information held in other SPRs will have been overwritten. If running in the simulator and CYGHWR_HAL_TILEGX_SIMULATOR is set then the simulation will be halted. Otherwise, in the absence of a better solution, an attempt will be made to restart the system. The structure field hal_tilegx_global_state.started_by will be set to hal_tilegx_started_by_double_fault, allowing application code to detect this after the restart and take any action that might be appropriate. Double faults should be rare, but application developers should be aware of the possibility.

The Idle Thread

The kernel's idle thread will execute the nap instruction, causing the tile to sleep until the next interrupt occurs.

The System Clock

The kernel clock has been implemented using the TILE_TIMER_CONTROL special purpose register, so that hardware is not available for use by application code. The auxiliary tile timer is available for use by the application, but note that the simulator does not implement that timer.

The counter value programmed into the TILE_TIMER_CONTROL register is determined from the system clock frequency, which is information provided by the hypervisor, and from the configuration options CYGNUM_HAL_RTC_NUMERATOR and CYGNUM_HAL_RTC_DENOMINATOR. The hardware does not support automatic reloading of the counter when a clock interrupt occurs, so the interrupt handler has to reload it explicitly. That code attempts to compensate for the time taken from the interrupt triggering to the counter being reloaded. The accuracy cannot be completely guaranteed, especially in a debug system, so a small amount of clock drift may occur.

The Cache

The TILE-Gx architecture has a complicated caching system including per-tile primary and secondary caches and a distributed tertiary cache. The hardware maintains data cache coherency so there is very rarely any need for code to exercise fine-grained control over the cache such as flushing cachelines. Therefore the various cache-related eCos macros like HAL_DCACHE_SYNC() are defined as no-ops. The hardware does not maintain coherency between the instruction and data cache so eCos does define a number of instruction cache macros like HAL_ICACHE_INVALIDATE(). These are needed by the gdbstubs code to implement breakpoints, and are unlikely to be of any interest to application developers.


The port supports two destinations for diagnostic output. Applications built for ROM startup will send their diagnostic output to the hypervisor over the IDN bus, and the hypervisor will output the text on the system console. For a RAM startup application diagnostic output will normally be sent to tile-gdb via LittleBoPeep, but the output can be redirected to the hypervisor if desired. This behaviour is controlled by the CYGHWR_HAL_TILEGX_DIAGNOSTICS_DESTINATION configuration option.

Other Functionality

The TILE-Gx architectural HAL provides two non-standard functions which can be used to manage the MMU settings:

#include <cyg/hal/hal_tilegx.h>

int hal_tilegx_mmap(unsigned long long virtual_address,
                    unsigned long long physical_address,
                    unsigned long long dtlb_attributes);

void hal_tilegx_munmap(unsigned long long virtual_address);

hal_tilegx_mmap() can be used to map a physical address into the tile's virtual address space with the specified attributes. The physical address can correspond to real memory. Typically this will be allocated by a process running in the Linux partition, and the details can then be forwarded to an eCos application which will map it into its address space. Alternatively the physical address can correspond to a memory-mapped device. The virtual address can be anywhere in the address space that is not already used, but preferably in the range 0x0000_0000 to 0x7FFF_FFFF to avoid problems with 32-bit pointers. Low memory is normally used for the system's ROM and RAM regions and 0x6C00_0000 is used for the hypervisor data, but anywhere between 0x4000_0000 to 0x6800_0000 or 0x7000_0000 to 0x7FFF_FFFF is normally fine. The address should be aligned to a boundary suitable for the block size. The final argument will be written to the DTLB_CURRENT_ATTR SPR and consists of numerous fields. The Tilera documentation should be consulted for more information. hal_tilegx_mmap() returns 0 if the operation fails, typically because all of the data TLBs are already in use, or 1 on success.

hal_tilegx_munmap() can be used to undo a previous hal_tilegx_mmap() call.