Chapter 55. Writing NAND device drivers

55.1. Planning a port

Before you start, you will need to have sight of appropriate spec sheets for both the NAND chip and the board into which it is connected, and you need to know how the chip is to be partitioned.

55.1.1. Driver structure and layout

A typical NAND device driver falls into two parts:

  • high-level operations specific to the NAND chip (page reads and writes); and
  • board-specific plumbing (sending commands and data to the chip; reading data back from the chip).

This distinction is important in the interests of code reuse; the same part may appear on different boards, or indeed multiple times, but connected differently. It need not be maintained if there are good reasons not to.

The NAND library device interface consists of a C struct, cyg_nand_device, comprising a number of data fields and function pointers. Each NAND chip to be made available to the library requires exactly one instance of this struct.


The cyg_nand_device structure includes a void* priv member which is treated as opaque. The driver may use this member as it sees fit; it is intended to provide an easy means to identify the NAND array, MMIO addresses or function pointers to use and so on. Typically this is used by the chip driver for its own purposes, and includes a further opaque member for the use of the HAL port.

The function pointers in the struct form the driver's high-level functions; they make use of the low-level functions to talk to the chip. We present the high-level functions first, although there is no intrinsic reason to prefer either ordering during driver development.

The high-level chip-specific functions are traditionally laid out as an inline file in an appropriate package in devs/nand/CHIP. The board-specific functions should normally appear in the platform HAL and #include the inline.

55.1.2. Chip partitions

Before embarking on the port, you should determine how the NAND array will be partitioned. This is necessarily a board-specific question, and your layout must accommodate any other software users of the array. You will need to know either the fixed layout - converted to eraseblock addresses - or how to determine the layout at initialisation time.


It may be worthwhile to set up partitioning by way of some parameters in your platform's CDL, with sensible defaults, instead of outright hard-coding the partition layout.

55.1.3. Locking against concurrent access

The eCos NAND library provides per-device locking, to guard against concurrent access during high-level operations. This support is fully automatic; drivers need take no action to make use of it.

This strategy may not be sufficient on all target boards: sometimes, accessing a NAND chip requires mediation by CPLD or other device, which must be shared with other NAND chips or even other peripherals. If this applies, it is the responsibility of the driver and platform port to provide further locking as appropriate!


When using mutexes in a driver, one should use the driver API as defined in <cyg/hal/drv_api.h> instead of the full kernel API. This has the useful property that mutex operations are very cheaply implemented when the eCos kernel is not present, such as when operating in RedBoot.

55.1.4. Required CDL declarations

An individual NAND chip driver must declare the largest page size it supports by means of CDL. This is done with a statement like the following in its cdl_package stanza:

requires ( CYGNUM_NAND_PAGEBUFFER >= 2048 )

This requirement is due to the internal workings of the eCos NAND library: a buffer is required for certain operations which manipulate up to a NAND page worth of data, internally to the library. This is declared once as a global buffer for safety under low-memory conditions; a page may be too big to use temporary storage on the C stack, and the NAND library deliberately avoids the use of malloc.

By convention, a driver package would declare CYGPKG_IO_NAND as its parent and use cyg/devs/nand as its include_dir, but there is no intrinsic reason why this should be so.

55.2. High-level (chip) functions

The high-level functions provided by the chip driver are typically created as an inline file providing a fully-populated cyg_nand_dev_fns_v1 struct, instantiated by the CYG_NAND_FUNS macro. The high-level driver should not directly read or write to the hardware itself, but instead call into functions in the low-level driver.

The form the low-level functions should take is not prescribed; typically functions will be required to write commands to the device, to read and write data, and to query any status line which may be present. The high-level driver should normally provide a header file containing prototypes for the functions it requires from the low-level driver. (The low-level source file would provide the low-level functions required, include the high-level include, then instantiate the combined driver using the CYG_NAND_DEVICE macro.)

This source code layout is not intended as a prescription. It would for example be entirely in order to store pointers to the low-level functions in a struct and set priv to point to that struct, which could be useful in some cases.


The device driver must not call malloc or otherwise allocate memory; all data should be in the stack or set as globals. This is because the driver may be required to run within a minimal eCos configuration.

These functions should all return 0 on success, or a negative eCos error code. In the event of an error, do not call back into the NAND library; use the NAND_CHATTER macro to report, in case a human is watching, and return an error code. The library will take care of ensuring the correct response to the application and updating the BBT as necessary.

55.2.1. Device initialisation

static int my_devinit (cyg_nand_device *dev);

The devinit function is the most complex, and logically one to write first. It is responsible for:

  • initialising the device, typically by sending a reset command;
  • interrogating the device to confirm its presence and properties;
  • setting up the partition table list (see "Planning a port" above);
  • setting up mutexes as necessary (see "Locking against concurrent access" above);
  • populating the other members of the cyg_nand_device struct (see below).

Interrogating the device is normally performed by sending a Read ID command and examining the result, which typically encodes some or all of the chip parameters.

Given the similarity between many NAND parts, it may be possible to write a generic driver to cover all of one or more manufacturer's parts, or indeed for all ONFI-compliant parts. At the time of writing, this has not yet been attempted.

The devinit function must set up the following struct members:

The size of the regular (non-spare) part of a page, expressed as the logarithm in base 2 of the number of bytes. For example, if pages are 2048 bytes long, page_bits would be 11. Obviously, the size of a page must be an exact power of two.
The number of bytes of spare area available in each page.
The base-2 log of the number of pages per eraseblock.
The total number of erase blocks in the device, expressed as a base-2 log.
The total size of the chip, not counting the spare areas. This is required so that the library can double-check that the given parameters make sense by comparing with the preceding fields. Again, this field is itself a base-2 logarithm.
Space for the in-memory Bad Block Table for this device.
This is the size of, in bytes. At present, this should be two bits times the number of blocks in the device; in other words, 1<<(blockcount_bits-2) bytes.

The cyg_nand_device struct has two further members ecc and oob which must be set up to point to the ECC and OOB descriptors to use for the device. This is normally done by the CYG_NAND_DEVICE low-level instantiation macro, so will be better described in that section, but at this level you should be aware that it is also safe to set up the descriptor block during devinit. for example if multiple semantics might be you had included logic to detect what semantics to use.

The Bad Block Table itself is implemented in a way which intends to be compatible with the Linux MTD layer. A full parameter struct is not currently provided, though one may be in future.

55.2.2. Reading, writing and erasing data

The read and write operations are divided into three phases, with the following flow:

  • Begin. This is called once; the driver should lock any platform-level mutex and send the command and address.
  • Stride. This is called one or more times to read the page data from the device.


    The reason for this is if the platform provides a NAND controller with hardware ECC: it is often necessary to read out the ECC registers every so often.

  • Finish. This is called once; it should read or write the spare area, (on programming) send a "program confirm" command and check its status, and unlock any platform-level mutex.

Erasing is a single-shot call which should lock any platform-specific mutex, send the command, check its status and unlock the mutex.

static int my_read_begin(   cyg_nand_device *dev, cyg_nand_page_addr   page);
static int my_read_stride(  cyg_nand_device *dev, void * dest,  size_t size);
static int my_read_finish(  cyg_nand_device *dev, void * spare, size_t spare_size);

static int my_write_begin(  cyg_nand_device *dev, cyg_nand_page_addr  page);
static int my_write_stride( cyg_nand_device *dev, const void * src,   size_t size);
static int my_write_finish( cyg_nand_device *dev, const void * spare, size_t spare_size);

static int my_erase_block(  cyg_nand_device *dev, cyg_nand_block_addr blk);

55.2.3. Searching for factory-bad blocks

static int my_is_factory_bad(cyg_nand_device *dev, cyg_nand_block_addr blk);

The very first time a NAND chip is used, the library has to scan it to check for factory-bad eraseblocks and build up the Bad Block Table. This function is called repeatedly to do so, one block at a time; it should return 1 if the block is marked bad, or 0 if the block appears to be OK.

Typically this function will invoke read_page; blocks are usually marked factory-bad by the presence of a particular signature in the out-of-band area of the first or second page of that block.


It is extremely important that you get this function right; after an eraseblock has been written to, it is no longer possible to reliably determine whether the block was factory-bad. It is never safe to assume that the factory-bad signature for a chip is the same as that of a similarly-sized chip or another by the same manufacturer; always check the correct spec sheet for the actual part or part-family in use!


Because this function is critical and a subtle error could cripple your application some time later in the field when it runs across undetected factory-bad blocks, you might find it handy to have a double-check before proceeding. If you enable CYGSEM_IO_NAND_READONLY in your eCos configuration during early development, you can safely fire up a test application (which calls cyg_nand_lookup) whilst watching the chatter output: the scan will be performed, but no BBT will be written. You can then compare the number of bad blocks reported against the manufacturer's specification of the maximum. Double-check that your is_factory_bad function is correct before enabling read-write mode!

55.2.4. Declaring the function set

CYG_NAND_FUNS_V2(mydev_funs, my_devinit,
        my_read_begin, my_read_stride, my_read_finish,
        my_write_begin, my_write_stride, my_write_finish,
        my_erase_block, my_is_factory_bad);

This macro ties the above functions together into a struct whose name is given as its first argument. The name of the resulting struct must be quoted when the driver is formally instantiated, which is normally done by the low-level functions.


Earlier versions of this library used a slightly different device interface, keyed off the macro CYG_NAND_FUNS. This interface has been retired.

55.3. Low-level (board) functions

The set and prototypes of the functions required here will necessarily depend on the board and to a lesser extent on the NAND part itself. The following functionality is typically required:

  • Very low-level hardware initialisation - for example, GPIO pin direction and interrupt config - if this has not already been done by the platform HAL
  • Set up the chip partition table (see below)
  • Runtime hardware config as required, such as commanding an FPGA or CPLD to route lines to the NAND part
  • Write a command (byte)
  • Write an address (handful of bytes)
  • Write data, usually at the chip's full bus width (typically 8 or 16 bits)
  • Read data at full bus width
  • Read data at 8-bit width (if the chip has a 16 bit data bus, some commands - commonly ReadID - may return 8-bit data)
  • Poll any status lines required or - if supported - set them up as interrupts to allow sleeping-wait

55.3.1. Talking to the chip

It is impossible to prescribe how to achieve this, as it depends entirely on how the NAND part is wired up on the board.

The ideal situation is that the NAND part is wired in via the CPU's memory controller and that the controller is set up to do most of the hard work for you. In that case, reading and writing the device is as simple as accessing the correct memory-mapped I/O address; usually different address ranges connect to the device's command, address and data registers respectively.


The HAL provides a number of macros in <cyg/hal/hal_io.h> to read and write memory-mapped I/O.


On platforms with an MMU, MMIO may be rerouted to different addresses to those on the board spec sheet. Check the MMU setup in the platform HAL.

On some platforms, you may have to invoke an FPGA or CPLD to be able to talk to the NAND chip. This might typically take the form of a handful of MMIO accesses, but should hopefully be fairly straightforward once you've figured out how the components interrelate.

The worst case is where you have no support from any sort of controller hardware and have to bit-bang GPIO lines to talk to the chip. This is a much more involved process; you have to take great care to get the timings right with carefully tuned delays. The result is usually quite CPU intensive, and could be clock speed sensitive too; you should check for and take account of any CDL settings in the architecture and variant HAL which allow the CPU clock frequency to be changed.


If your low-level functions take a cyg_nand_device pointer as an argument, you can use its priv member to hold or point to some relevant data like the MMIO addresses to use, which is preferable to hard-coding them. Indeed, if you wish your board port to support more than one chip, you should use the priv member to distinguish between them.

55.3.2. Setting up the chip partition table

It is the responsibility of the high-level devinit function to set up the device's partition table. (It may be appropriate for it to invoke a low-level function to do this.)

The partition definition is an array of cyg_nand_partition entries in the cyg_nand_device.

struct _cyg_nand_partition_t {
    cyg_nand_device *dev;
    cyg_nand_block_addr first;
    cyg_nand_block_addr last;
typedef struct _cyg_nand_partition_t cyg_nand_partition;

struct _cyg_nand_device_t {
    cyg_nand_partition partition[CYGNUM_NAND_MAX_PARTITIONS];

Application-visible partition numbers are simply indexes into this array.

  • On a live partition, dev must point back to the cyg_nand_device containing it. If NULL, the partition is inactive.
  • first is the number of the first block of the partition.
  • last is the number of the last block of the partition (not the number of blocks, unless the partition starts at block 0).

55.3.3. Putting it all together…

Finally, with everything else in place, we turn to the CYG_NAND_DEVICE macro to instantiate it.

CYG_NAND_DEVICE(my_nand, "onboard", &mydev_funs, &my_priv_struct, &linux_mtd_ecc, &nand_mtd_oob_64);

In order, the arguments to this macro are:

  • The name to give the resultant cyg_nand_device struct;
  • the device identifier string, application-visible to be used in cyg_nand_lookup() ;
  • a pointer to the device high-level function set to use, normally set up by the CYG_NAND_FUNS macro;
  • the priv member to include in the struct;
  • a pointer to the ECC semantics block to use. linux_mtd_ecc provides software ECC compatible with the Linux MTD layer, but it is strongly recommended to use onboard hardware ecc support if this is present as it gives a huge speed boost. See Section 55.4, “ECC implementation” for more details.
  • a pointer to the OOB-area layout descriptor to use (see nand_oob.h : nand_mtd_oob_16 and nand_mtd_oob_64 are Linux-compatible layouts for devices with 16 and 64 bytes of spare area per page respectively).

The macro invokes the appropriate linker magic to pull all the compiled NAND device structs into one section so the NAND library can find them.

55.4. ECC implementation

The use of ECC is strongly recommended with NAND flash parts owing to their tendency to occasionally bit-flip. This is usually done with a variant of a Hamming code which calculates column and line parity. The computed ECC is stored in the spare area of the page to which it relates.

The NAND library automatically computes and stores the ECC of data as it is written to the chip. On read, the code is calculated for the data actually read; this is compared with the stored code and the data repaired if necessary.

The NAND library comes with a software ECC implementation named linux_mtd_ecc. This is compatible with the ECC used in the Linux MTD layer, hence its name. It calculates a 3-byte ECC on a 256-byte data block. This algorithm is adequate for most circumstances, but it is strongly recommended to use any hardware ECC support which may be available because of the performance gains it yields. (In testing, we observed that up to two thirds of the time taken by every page read and program call was used in computing ECC in software.)

55.4.1. The ECC interface

This library draws a semantic distinction between hardware and software ECC implementations.

  • A software ECC implementation will typically not require an initialisation step. The calculation function will always be called with a pointer to the data bytes to compute.
  • A hardware implementation is assumed to read and act upon the data as it goes past. Therefore, it will not be passed a pointer to the data when its calculate step is invoked.

An ECC is defined by the following parameters:

  • The size of data block it handles, in bytes.
  • The size of ECC it calculates on those blocks, in bytes.
  • Whether the algorithm is hardware or software.

An ECC algorithm must provide the following functions:

/* Initialises an ECC computation. May be NULL if not required. */
void my_ecc_init(struct _cyg_nand_device_t *dev);

/* Returns the ECC for the given data block.
 *  - dat and nbytes are ignored
 *  - dat and nbytes are required
 *  - if nbytes is less than the chunk size, the remainder are
 *    assumed to be 0xff.
void my_ecc_calc(struct _cyg_nand_device_t *dev,
                 const CYG_BYTE *dat, size_t nbytes, CYG_BYTE *ecc);

/* Repairs the ECC for the given data block, if needed.
 * Call this if your read-from-chip ECC doesn't match what you computed
 * over the data block. Both *dat and *ecc_read may be corrected.
 * `nbytes' is the number of bytes we're interested in; if a correction
 * is indicated outside of that range, it will be ignored.
 * Returns:
 *       0 for no errors
 *       1 for a corrected single bit error in the data
 *       2 for a corrected single bit error in the ECC
 *      -1 for an uncorrectable error (more than one bit)
int my_ecc_repair(struct _cyg_nand_device_t *dev,
                  CYG_BYTE *dat, size_t nbytes,
                  CYG_BYTE *ecc_read, const CYG_BYTE *ecc_calc);

In some cases - particularly where hardware assistance is in use - it is necessary to specify different functions for calculating the ECC depending on whether the operation at hand is a page read or a page write. In that case, two init and calc functions may be supplied, each taking the same prototype.

The algorithm parameters and functions are then tied together with one of the following macros:

CYG_NAND_ECC_ALG_SW(my_ecc,  _datasize, _eccsize, my_ecc_init, my_ecc_calc, my_ecc_repair);

CYG_NAND_ECC_ALG_HW(my_ecc,  _datasize, _eccsize, my_ecc_init, my_ecc_calc, my_ecc_repair);

CYG_NAND_ECC_ALG_HW2(my_ecc, _datasize, _eccsize, my_ecc_init, my_ecc_calc_read,
                     my_ecc_calc_write, my_ecc_repair);

CYG_NAND_ECC_ALG_HW3(my_ecc, _datasize, _eccsize, my_ecc_init_read, my_ecc_init_write,
                     my_ecc_calc_read,  my_ecc_calc_write,    my_ecc_repair);

It's OK to use software ECC while getting things going, but if you do then switch to a hardware implementation, you probably need to erase your entire NAND chip including its Bad Block Table. The nanderase utility may come in handy for this.)


You must be sure that your ECC repair algorithm is correct. This can be quite tricky to test. However, it is often possible to hoodwink the controller into computing ECCs for you even if the data is not going to affect the data stored on the NAND chip, for example if you send it data but haven't told it to program a page. A variant of the sweccwalk test may come in handy for this purpose.

An example implementation, including an ECC calculation and repair test named eccwalk, may be found in the STM3210E evaluation board platform HAL, packages/hal/cortexm/stm32/stm3210e_eval. The chip NAND controller has on-board ECC calculation, but does not undertake to repair data; a repair function was written specially.