Synthetic Target NAND Flash Device

Synthetic Target NAND Flash Device
	Part XVI. NAND Device Drivers

Overview

The device driver CYGPKG_DEVS_NAND_SYNTH emulates NAND flash hardware inside the eCos synthetic target. In addition it provides a number of debug facilities which cannot readily be implemented on real embedded hardware, including:

The emulated NAND contents are held on a file in the Linux host. This makes it easy to archive and restore NAND images, allowing test runs to be repeated with the exact same state each time.
The device driver can log details of all NAND I/O to a separate logfile in the Linux host. This makes it easier to work out exactly what is happening in the application, and more importantly it can help with figuring out what went wrong when. For extended runs it is possible to limit the disk space used for logging. It is also possible to generate checkpoints, where the current NAND image is saved to a separate file.
It is possible to inject bad blocks at run-time, to check how the application would cope on real hardware if and when a NAND erase block developed a fault. These can be made to affect random blocks or specific blocks, for example ones holding filesystem metadata.

Some of the functionality is always available and uses compile-time configuration via CDL. This allows applications to be run stand-alone. The more advanced functionality such as logging and bad block injection is only available when running in conjunction with the synthetic target I/O auxiliary, when --io is used on the command line. The settings for logging and bad block injection usually come from the default.tdf target definition file. These can be changed on a per-run basis by adding --nanddebug to the command line, which will cause a suitable dialog box to pop up during NAND driver initialization.

Compile-time Configuration

This package CYGPKG_DEVS_NAND_SYNTH will automatically be loaded when creating a new eCos configuration for the Linux synthetic target. However the package will be inactive until the generic NAND support is added to the configuration.

The synthetic target NAND driver has been designed to be functional both when running stand-alone and when used with the I/O auxiliary. Hence some of the basic parameters of the emulated NAND device must be specified at compile-time, and this is handled via CDL configuration options.

CYGDAT_NAND_SYNTH_FILENAME specifies the host-side file that will be used to hold the NAND data. The default is synth_nand.dat in the current directory. If the file does not exist then the driver will create it during initialization. All data in a newly-created image file will be set to 0xFF, corresponding to an erased device. Hence deleting the current image file makes it possible to start a test run with a blank NAND device.

A NAND device consists of some number of erase blocks: erase operations affect all data in an erase block. Erase blocks are made up of some number of pages, and write operations typically affect a page at a time. Each page consists of a main data block plus some spare bytes, also known as out of band or OOB data. There are four CDL configuration options controlling the size and layout of the emulated flash device: CYGNUM_NAND_SYNTH_BLOCK_COUNT, CYGNUM_NAND_SYNTH_PAGES_PER_BLOCK, CYGNUM_NAND_SYNTH_PAGESIZE, and CYGNUM_NAND_SYNTH_SPARE_PER_PAGE. The default settings are 1024 erase blocks, 32 pages per block, 2K of data per page, and 64 bytes of OOB data per page. This gives an emulated device size of 64MB plus 2M OOB.

The size and layout parameters are encoded in each NAND image file. If these configuration options are changed then existing image files will be incompatible and the device driver will report a fatal error at run-time. This avoids compatibility problems with higher-level code: if a file system has formatted the NAND device for a 2K page size then it is likely to get very confused if the page size suddenly changes to 512 bytes.

The NAND device can be partitioned manually by enabling the component CYGSEM_DEVS_NAND_SYNTH_PARTITION_MANUAL_CONFIG and manipulating the options below this. The default is for a single partition occupying the entire NAND device.

Option CYGSEM_NAND_SYNTH_RANDOMLY_LOSE activates code in the driver which triggers frequent bit errors during read operations. These should be handled by error correcting codes within the generic NAND layer so should be transparent to higher-level code. The option exists mainly as an easy way of testing the automatic error correction support.

Run-time Customization

Logging and bad block injection are controlled by run-time customization via the synthetic target I/O auxiliary, not by compile-time CDL options. This allows the same test executable to be run with and without logging or with different sequences of injected bad blocks. If the executable is run without the I/O auxiliary, without the --io command line option, then both logging and bad block injection will be disabled.

The main way of customizing both logging and bad block injection is via the target definition file, usually default.tdf. The driver comes with a file nand.tdf holding the various options and explanatory text. This file should be incorporated into default.tdf and edited as appropriate. Note that all NAND-related settings should be inside a synth_device nand section.

Logging

The target definition file settings related to logging are as follows:

logfile            "/tmp/synthnand.log"
log                read write erase error
max_logfile_size   16M
number_of_logfiles 4
generate_checkpoint_images

The logfile setting controls the location of the logfile. The default is to add a suffix .log to the CYGDAT_NAND_SYNTH_FILENAME setting, thus creating logfiles in the same directory as the NAND image file.

The log setting specifies which events should be logged. It is followed by a list of some or all of the following: read, READ, write, WRITE, erase, and error. read logs all calls to the driver's page read function, but not the data actually read. READ is like read but logs the actual data as well as the event. Similarly write and WRITE log calls to the driver's page write function, without or with the data being written. erase logs calls to the driver's block erase function. error logs any bad block injection events.

Logging can quickly generate very large files, especially when READ or WRITE debugging is enabled. This can have unfortunate side effects, for example an overnight stress test can fail because the logfile has filled all available disk space. To avoid this it is possible to limit the size of each logfile using the max_logfile_size setting. This is simply a number followed by a unit K, M or G.

When max_logfile_size is exceeded the NAND driver takes appropriate action. The default behaviour is just to delete the current logfile and create a new one, with the same name as before. This can be unfortunate if some particularly interesting event happened just before the maximum logfile size was exceeded because all logging information related to that event will have been lost. To avoid this it is possible to have multiple logfiles, limited by the number_of_logfiles setting. Assume a logfile name of /tmp/synthnand.log, a maximum logfile size of 16M and four logfiles. After the first 16MB of logging data has been written to /tmp/synthnand.log that file will be renamed to /tmp/synthnand.log.0 and a new current logfile /tmp/synthnand.log will be created. After another 16MB the current logfile will be renamed to /tmp/synthnand.log.1, and so on. After 64MB the maximum allowed number of logfiles has been created, so /tmp/synthnand.log.0 will be deleted and then /tmp/synthnand.log will be renamed to /tmp/synthnand.log.3. Hence the maximum disk space used will be between 48MB and 64MB, plus a small amount of overflow for each logfile. All logfiles are in plain text, one event per line, and a single event will not be spread over multiple logfiles.

In addition to the logfiles the NAND driver will generate image checkpoint files if generate_checkpoint_images is enabled. At the start of the test run the driver will copy the current NAND image to a new file /tmp/synthnand.log.checkpoint, using the same base filename as for logfiles. If support for multiple logfiles is enabled then the current checkpoint file will be renamed in the same way and at the same time as the current logfile, and a new checkpoint file will be created using the current image data. Hence for any logfile it is possible to examine both the starting image and the final image (which may be the current one).

Bad Block Injection

There are two settings related to bad block injection: factory_bad and inject. The former is straightforward:

factory_bad 17 42 256 1019

This setting is used only when the NAND image file does not yet exist and the driver has to create a new one. The specified erase blocks are marked as factory-bad and hence unusable. The setting consists simply of one or more numbers, each in the range 0 to CYGNUM_NAND_SYNTH_BLOCK_COUNT-1. A maximum of 32 blocks can be marked factory bad.

Bad block injection is rather more complicated, in an attempt to make it sufficiently flexible for a variety of uses. The target definition file can contain one or more inject settings. There is a limit of eight erase definitions and eight write definitions, giving a maximum of sixteen bad block injection definitions. However some of these can use repeat, so the number of bad blocks injected during a test run is limited only by the size of the emulated device. Example settings are:

inject erase current after rand% 1024 erases
inject write current after rand% 100000 calls repeat
inject erase block 1 after 3 block_erases
inject write page 9860 after 1000 writes

The inject keyword should be followed by either erase or write. If erase then the bad block injection happens during a call to the driver's erase function, otherwise it happens during a call to write. This is followed by a block or page definition, the keyword after, an event counter, and a couple of optional flags.

The simplest block or page definition is the keyword current. This simply means that whichever block is being erased, or whichever block contains the page being written, will be marked bad. Typically this will be used for injecting random faults. If a given block is the subject of an above-average number of erase or write operations then it is more likely to be the current block in one of these definitions, so heavily-used blocks are more likely to fail.

The alternative to current is to list a specific block for an erase definition, or a specific page for a write definition. Blocks go from 0 to CYGNUM_NAND_SYNTH_BLOCK_COUNT - 1. Pages go from 0 to (CYGNUM_NAND_SYNTH_BLOCK_COUNT * CYGNUM_NAND_SYNTH_PAGES_PER_BLOCK) - 1. This can be particularly useful when testing higher-level code that uses certain blocks specially, for example for storing filesystem metadata. It can also be useful when attempting to repeat a test run using information from a logfile.

The fields immediately following the after keyword specify when the bad block injection should trigger. They consist of the optional keyword rand%, a count, and an event identifier. If rand% is not specified then exactly the specified number of events must occur before the bad block injection triggers. Otherwise some number between 0 and count-1 events must occur. The event identifier can be one of erases, writes, calls, block_erases, or page_writes. erases means the number of calls to the driver's block erase function. writes means the number of calls to the page write function. calls means the total number of read, write or erase calls. block_erases means erase calls for a specific block. Similarly page_writes means write calls for a specific page. block_erases and page_writes cannot be used together with current, only with a specific block or page.

So, consider the first example again:

inject erase current after rand% 1024 erases

The driver will calculate a random number between 0 and 1023, say 427. Once there have been at least 427 erase calls the bad block injection will trigger. Since the definition is for erase current, the injection can happen immediately. Hence whichever block is specified during the 427th erase call will be marked bad and that erase call will fail with error code EIO.

Suppose that the definition used write current instead of erase current. The definition will still trigger during erase call 427, but it will not take effect immediately. Instead whichever page is the subject of the next write operation will be marked bad. Alternatively, suppose that the definition was for erase block 42 but call 427 was for block 512 instead. If some time after call 427 there was another erase call for block 42, that later erase call will fail and cause the block to be marked bad.

Now consider the next definition:

inject write current after rand% 100000 calls repeat

This event will trigger after between 0 and 99999 calls into the driver. These calls can be reads, writes, or erases. If the triggering call is a write then the block affected will be the one containing the page being written. Otherwise whichever page is the subject of the next write operation will be the one affected.

inject erase block 1 after 3 block_erases

This definition will only ever affect block 1. The first two calls to erase block 1 will succeed. The third call will fail. Erase calls for any other block have no effect on this definition.

inject write page 9860 after 1000 writes

This definition will only ever affect page 9860. The definition will trigger after 1000 writes to any page. If the thousandth write happens to be for page 9860, it will fail. Otherwise the next write for page 9860 will fail, whenever that happens. If the definition specified event type page_writes instead of writes it would trigger only after 1000 writes to page 9860 instead of 1000 writes to any page.

Trigger definitions can be followed by two optional keywords. The first is repeat and indicates that the definition should trigger multiple times, not just once. repeat can only be used for block or page current, not for a specific block or page, since any given block can only fail once. The second keyword is disabled. This can be used to create a bad block injection definition which is not active by default but which can be enabled in the GUI interface.

All bad block injection definitions operate in parallel, not sequentially. It is possible for multiple definitions to trigger during a single call but a given block can only fail once.

Interactive Dialog

Although it is possible to change the logging and other settings between test runs by editing the target definition file, the package also provides a way of changing these during driver initialization time. If the command line option --nanddebug is used in addition to --io then the I/O auxiliary will pop up a dialog box allowing the various settings to be edited.

Figure 2. I/O auxiliary Dialog, Files

The dialog consists of a tabbed notebook with separate tabs for Files, Logging, and Errors. The default tab is Files. This shows the NAND device settings as configured via CDL options. If a current NAND image file exists then it can be deleted with a single click, allowing the test run to proceed with a new blank NAND device. The current image file can be saved away to an archive, either using a default name based on the current image name or by letting the user select a file. It is also possible to restore a previously-saved archive, again using the default name if that archive exists, or by selecting an alternative file. If any of the relevant files are changed outside this dialog then the refresh button forces the dialog to check the filesystem again. The various operations will be reported at the bottom of the dialog.

If the Logging tab is selected then the dialog changes to the following:

Figure 3. I/O auxiliary Dialog, Logging

There are separate checkbuttons for the six types of events that can be logged. It is also possible to change the current logfile and to edit the other settings related to logfiles.

The Errors tab is not yet implemented.

The dialog can be dismissed using the OK button or by hitting the ESC key. At that point the eCos application will resume running with the selected settings.

File Formats

A NAND image is a binary file with six sections: a 64-byte header block; an array of erase counts; an array of write counts; an array of the factory-bad blocks; a bitmap of the good and bad blocks; and the actual data. The header consists of the following data structure:

struct header {
  cyg_uint32 magic;           // 0xEC05A11F
  cyg_uint32 page_size;       // CYGNUM_DEVS_NAND_SYNTH_PAGESIZE
  cyg_uint32 spare_size;      // CYGNUM_DEVS_NAND_SYNTH_SPARE_PER_PAGE
  cyg_uint32 pages_per_block; // CYGNUM_DEVS_NAND_SYNTH_PAGES_PER_BLOCK
  cyg_uint32 number_of_blocks;// CYGNUM_DEVS_NAND_SYNTH_BLOCK_COUNT
  cyg_uint32 tv_sec;
  cyg_uint32 tv_usec;
  cyg_uint32 spare[9];
};

All integers in the header and in the following three sections are stored in bigendian format. The magic field is used to check that a file really holds a NAND image. The next four fields specify the emulated device size and layout, as per the corresponding CDL options. The tv_sec and tv_usec fields are filled in with the result of a gettimeofday() system call during driver initialization. These fields also appear in logfiles, so code can check that a logfile and an image file correspond to the same test run.

Following the header is an array of BLOCK_COUNT integers holding the number of erase calls for each block. Next is an array of (BLOCK_COUNT * PAGES_PER_BLOCK) integers holding the number of write calls for each page. These arrays can be used to check that higher-level code is performing wear levelling.

The array of factory-bad blocks consists of 32 integers holding the settings of the target definition's file factory_bad setting at the time that the image file was created.

The bitmap following the factory-bad block array holds the current state of each erase block. Bit 0 of byte 0 corresponds to erase block 0, bit 0 of byte 1 corresponds to erase block 8, and so on. If a bit is set then the erase block is ok. If a bit is clear then the erase block is bad, either factory-bad or because it has failed subsequently due to a bad block injection.

The bad block bitmap is followed by the actual data. This consists of all erase blocks concatenated without padding, starting with erase block 0. Each erase block is stored starting from page 0 within that block, and again all the pages are concatenated without padding. For each page the actual data is stored first, followed by the spare (OOB) data.

A logfile is a plain text file, not a binary file. It holds one log event per line. Some of these lines can be rather long if READ or WRITE logging is enabled. The fields within each line are separated by a single space. The first field indicates the type of record. The next two fields are call counts, one for the event in question and one for the total number of calls into the driver. The remaining fields depend on the record type.

I 0 0 1247514233 562000 synth_nand.dat 2048 64 32 1024

This is an initialization record when the logfile is created. There have been no calls to the driver yet so the two counts are 0. The next two fields are the tv_sec and tv_usec timestamp values which are also written into the NAND image file. This allows logfiles and image files to be matched up. The remaining fields identify the page size, the spare size, the number of pages per erase block, and the number of erase blocks.

F 2 4 43 0

This is a call into the driver's factorybad function. It is the second such call, and the fourth call into the driver. The query is for erase block 42, and that block has not been marked as factory-bad.

 r 5 12 32736 0x0200f0c0 2048 0x0200eeb0 64

This is a read event. It is the fifth page read into the driver and the 12th call. The request is for page number 32736. 2048 bytes of data should be read into a buffer at location 0x0200f0c0, and 64 bytes of OOB data should be read into 0x0200eeb0.

Rd 5 12 32736 0x0200f0c0 2048 FFFFFFFFFFFF…
Ro 5 12 32736 0x0200eeb0 64   FFFFFFFFFFFF…

These are two READ log events corresponding to the previous read. The first line Rd is for the data part, and the final field consists of 4096 bytes of hexadecimal data. The second line Ro is for the OOB part and the final field consists of 128 bytes of hexadecimal data.

w 1 1030 32736 0x0200f0c0 2048 0x0200eed0 64
Wd 1 1030 32736 0x0200f0c0 2048 FFFFFFFFF3FFFFFFFFFFCF…
Wo 1 1030 32736 0x0200eed0 64 FFFFFFFFFFFFFFFF426274…

These lines show a write log event and a WRITE log event for the same call into the driver. The fields are the same as for read and READ.

E 2 1031 1022

This logs an erase call into the library for block 1022. It is the second erase call, and by this time there have been 1031 calls into the driver.

Bb 1 2857 631
…
Bp 2 4012 3189 99

These lines show bad block injections. The first is for an erase operation for block 631. That erase call is about to fail with EIO and the block will be marked bad. The second is for a write operation for page 3189, which is part of erase block 99. That write operation is about to fail and all of erase block 99 will be marked bad.

Installation

Before a synthetic target eCos application can use a NAND device it is necessary to build and install host-side support. The relevant code resides in the host subdirectory of the synthetic target NAND package and building it involves the standard configure, make and make install steps. The implementation of the NAND support does not require any executables, just a Tcl script nand.tcl and some support files, so the make step is a no-op.

There are two main ways of building the host-side software. It is possible to build both the generic host-side software and all package-specific host-side software, including the NAND support, in a single build tree. This involves using the configure script at the toplevel of the eCos repository. For more information on this, see the README.host file at the top of the repository. Note that if you have an existing build tree which does not include the synthetic target NAND support then it will be necessary to rerun the toplevel configure script: the search for appropriate packages happens at configure time.

The alternative is to build just the host-side for this package. This requires a separate build directory, building directly in the source tree is disallowed. The configure options are much the same as for a build from the toplevel, and the README.host file can be consulted for more details. It is essential that the NAND support be configured with the same --prefix option as other eCos host-side software, especially the I/O auxiliary provided by the synthetic target architectural HAL package, otherwise the I/O auxiliary will be unable to locate the NAND support.

Test programs

bbt: Bad Block Table unit test. Finds a readable block, then fiddles with its status in the BBT confirming expected behaviour. Requires the synthetic NAND device.
multipagebbt: As for bbt but insists that the device parameters mean that the BBT spans multiple pages on-chip. (This is perhaps a contrived case, but might crop up in future with larger devices, so needed to be tested.)
eccdamage: An ECC error fuzzing exercise. Requires CYGSEM_NAND_SYNTH_RANDOMLY_LOSE, which induces pseudo-random bit errors; after 1,000 runs, the number of errors corrected is reported.


Chapter 60. Micron MT29F family NAND chips		Part XVII. Journalling Flash File System v2 (JFFS2)

Name