Name
Synthetic Target NAND Flash Device — Emulate NAND flash hardware in the synthetic target
Overview
The device driver CYGPKG_DEVS_NAND_SYNTH
emulates
NAND flash hardware inside the eCos synthetic target. In addition it
provides a number of debug facilities which cannot readily be
implemented on real embedded hardware, including:
- The emulated NAND contents are held on a file in the Linux host. This makes it easy to archive and restore NAND images, allowing test runs to be repeated with the exact same state each time.
- The device driver can log details of all NAND I/O to a separate logfile in the Linux host. This makes it easier to work out exactly what is happening in the application, and more importantly it can help with figuring out what went wrong when. For extended runs it is possible to limit the disk space used for logging. It is also possible to generate checkpoints, where the current NAND image is saved to a separate file.
- It is possible to inject bad blocks at run-time, to check how the application would cope on real hardware if and when a NAND erase block developed a fault. These can be made to affect random blocks or specific blocks, for example ones holding filesystem metadata.
Some of the functionality is always available and uses compile-time
configuration via CDL. This allows applications to be run stand-alone.
The more advanced functionality such as logging and bad block
injection is only available when running in conjunction with the
synthetic target I/O auxiliary, when
--io
is used on the command line. The settings
for logging and bad block injection usually come from the
default.tdf
target definition file. These can be
changed on a per-run basis by adding
--nanddebug
to the command line, which will
cause a suitable dialog box to pop up during NAND driver
initialization.
Compile-time Configuration
This package CYGPKG_DEVS_NAND_SYNTH
will
automatically be loaded when creating a new eCos configuration for the
Linux synthetic target. However the package will be inactive until
the generic NAND support is added to the configuration.
The synthetic target NAND driver has been designed to be functional both when running stand-alone and when used with the I/O auxiliary. Hence some of the basic parameters of the emulated NAND device must be specified at compile-time, and this is handled via CDL configuration options.
CYGDAT_NAND_SYNTH_FILENAME
specifies the host-side
file that will be used to hold the NAND data. The default is
synth_nand.dat
in the current directory. If the
file does not exist then the driver will create it during
initialization. All data in a newly-created image file will be set to
0xFF, corresponding to an erased device. Hence deleting the current
image file makes it possible to start a test run with a blank NAND
device.
A NAND device consists of some number of erase blocks: erase
operations affect all data in an erase block. Erase blocks are made up
of some number of pages, and write operations typically affect a page
at a time. Each page consists of a main data block plus some spare
bytes, also known as out of band or OOB data. There are four CDL
configuration options controlling the size and layout of the emulated
flash device: CYGNUM_NAND_SYNTH_BLOCK_COUNT
,
CYGNUM_NAND_SYNTH_PAGES_PER_BLOCK
, CYGNUM_NAND_SYNTH_PAGESIZE
,
and CYGNUM_NAND_SYNTH_SPARE_PER_PAGE
. The default
settings are 1024 erase blocks, 32 pages per block, 2K of data per
page, and 64 bytes of OOB data per page. This gives an emulated device
size of 64MB plus 2M OOB.
The size and layout parameters are encoded in each NAND image file. If these configuration options are changed then existing image files will be incompatible and the device driver will report a fatal error at run-time. This avoids compatibility problems with higher-level code: if a file system has formatted the NAND device for a 2K page size then it is likely to get very confused if the page size suddenly changes to 512 bytes.
The NAND device can be partitioned manually by enabling the component
CYGSEM_DEVS_NAND_SYNTH_PARTITION_MANUAL_CONFIG
and
manipulating the options below this. The default is for a single
partition occupying the entire NAND device.
Option CYGSEM_NAND_SYNTH_RANDOMLY_LOSE
activates
code in the driver which triggers frequent bit errors during read
operations. These should be handled by error correcting codes within
the generic NAND layer so should be transparent to higher-level code.
The option exists mainly as an easy way of testing the automatic error
correction support.
Run-time Customization
Logging and bad block injection are controlled by run-time
customization via the synthetic target I/O auxiliary, not by
compile-time CDL options. This allows the same test executable to be
run with and without logging or with different sequences of injected
bad blocks. If the executable is run without the I/O auxiliary,
without the --io
command line option,
then both logging and bad block injection will be disabled.
The main way of customizing both logging and bad block injection is
via the target definition file,
usually default.tdf
. The driver comes with a
file nand.tdf
holding the various options and
explanatory text. This file should be incorporated
into default.tdf
and edited as appropriate.
Note that all NAND-related settings should be inside
a synth_device nand
section.
Logging
The target definition file settings related to logging are as follows:
logfile "/tmp/synthnand.log" log read write erase error max_logfile_size 16M number_of_logfiles 4 generate_checkpoint_images
The logfile
setting controls the location of the
logfile. The default is to add a suffix .log
to
the CYGDAT_NAND_SYNTH_FILENAME
setting, thus
creating logfiles in the same directory as the NAND image file.
The log
setting specifies which events should be
logged. It is followed by a list of some or all of the following:
read
, READ
, write
, WRITE
,
erase
,
and error
. read
logs all calls
to the driver's page read function, but not the data actually read.
READ
is like read
but logs the
actual data as well as the event. Similarly write
and WRITE
log calls to the driver's page write
function, without or with the data being
written. erase
logs calls to the driver's block
erase function. error
logs any bad block injection
events.
Logging can quickly generate very large files, especially
when READ
or WRITE
debugging is
enabled. This can have unfortunate side effects, for example an
overnight stress test can fail because the logfile has filled all
available disk space. To avoid this it is possible to limit the size
of each logfile using the max_logfile_size
setting.
This is simply a number followed by a unit K, M or G.
When max_logfile_size
is exceeded the NAND driver
takes appropriate action. The default behaviour is just to delete the
current logfile and create a new one, with the same name as before.
This can be unfortunate if some particularly interesting event
happened just before the maximum logfile size was exceeded because all
logging information related to that event will have been lost. To
avoid this it is possible to have multiple logfiles, limited by the
number_of_logfiles
setting. Assume a logfile name
of /tmp/synthnand.log
, a maximum logfile size of
16M and four logfiles. After the first 16MB of logging data has been
written to /tmp/synthnand.log
that file will be
renamed to /tmp/synthnand.log.0
and a new current
logfile /tmp/synthnand.log
will be created.
After another 16MB the current logfile will be renamed to
/tmp/synthnand.log.1
, and so on. After 64MB
the maximum allowed number of logfiles has been created, so
/tmp/synthnand.log.0
will be deleted and then
/tmp/synthnand.log
will be renamed to
/tmp/synthnand.log.3
. Hence the maximum disk
space used will be between 48MB and 64MB, plus a small amount of
overflow for each logfile. All logfiles are
in plain
text, one event per line, and a single event will not be spread
over multiple logfiles.
In addition to the logfiles the NAND driver will generate image
checkpoint files if generate_checkpoint_images
is
enabled. At the start of the test run the driver will copy the current
NAND image to a new
file /tmp/synthnand.log.checkpoint
, using the
same base filename as for logfiles. If support for multiple logfiles
is enabled then the current checkpoint file will be renamed in the
same way and at the same time as the current logfile, and a new
checkpoint file will be created using the current image data. Hence
for any logfile it is possible to examine both the starting image and
the final image (which may be the current one).
Bad Block Injection
There are two settings related to bad block
injection: factory_bad
and inject
. The former is straightforward:
factory_bad 17 42 256 1019
This setting is used only when the NAND image file does not yet exist
and the driver has to create a new one. The specified erase blocks are
marked as factory-bad and hence unusable. The setting consists simply
of one or more numbers, each in the range 0
to CYGNUM_NAND_SYNTH_BLOCK_COUNT-1
. A maximum of 32
blocks can be marked factory bad.
Bad block injection is rather more complicated, in an attempt to make
it sufficiently flexible for a variety of uses. The target definition
file can contain one or more inject
settings. There
is a limit of eight erase definitions and eight write definitions,
giving a maximum of sixteen bad block injection definitions. However
some of these can use repeat
, so the number of bad blocks injected
during a test run is limited only by the size of the emulated device.
Example settings are:
inject erase current after rand% 1024 erases inject write current after rand% 100000 calls repeat inject erase block 1 after 3 block_erases inject write page 9860 after 1000 writes
The inject
keyword should be followed by
either erase
or write
.
If erase
then the bad block injection happens
during a call to the driver's erase function, otherwise it happens
during a call to write. This is followed by a block or page
definition, the keyword after
, an event counter,
and a couple of optional flags.
The simplest block or page definition is the
keyword current
. This simply means that whichever
block is being erased, or whichever block contains the page being
written, will be marked bad. Typically this will be used for injecting
random faults. If a given block is the subject of an above-average
number of erase or write operations then it is more likely to be the
current block in one of these definitions, so heavily-used blocks are
more likely to fail.
The alternative to current
is to list a specific
block for an erase definition, or a specific page for a write
definition. Blocks go from 0
to CYGNUM_NAND_SYNTH_BLOCK_COUNT
- 1. Pages go from
0 to (CYGNUM_NAND_SYNTH_BLOCK_COUNT
*
CYGNUM_NAND_SYNTH_PAGES_PER_BLOCK
) - 1. This can be
particularly useful when testing higher-level code that uses certain
blocks specially, for example for storing filesystem metadata. It can
also be useful when attempting to repeat a test run using information
from a logfile.
The fields immediately following the after
keyword
specify when the bad block injection should trigger. They consist of
the optional keyword rand%
, a count, and an event
identifier. If rand%
is not specified then exactly
the specified number of events must occur before the bad block
injection triggers. Otherwise some number between 0 and count-1 events
must occur. The event identifier can be one
of erases
, writes
, calls
, block_erases
,
or page_writes
. erases
means the
number of calls to the driver's block erase
function. writes
means the number of calls to the
page write function. calls
means the total number
of read, write or erase calls. block_erases
means
erase calls for a specific block.
Similarly page_writes
means write calls for a
specific page. block_erases
and page_writes
cannot be used together with
current
, only with a specific block or page.
So, consider the first example again:
inject erase current after rand% 1024 erases
The driver will calculate a random number between 0 and 1023, say 427.
Once there have been at least 427 erase calls the bad block injection
will trigger. Since the definition is for erase current, the injection
can happen immediately. Hence whichever block is specified during
the 427th erase call will be marked bad and that erase call will fail
with error code EIO
.
Suppose that the definition used write current
instead of erase current
. The definition will still
trigger during erase call 427, but it will not take effect
immediately. Instead whichever page is the subject of the next write
operation will be marked bad. Alternatively, suppose that the
definition was for erase block 42
but call 427 was
for block 512 instead. If some time after call 427 there was another
erase call for block 42, that later erase call will fail and cause the block to
be marked bad.
Now consider the next definition:
inject write current after rand% 100000 calls repeat
This event will trigger after between 0 and 99999 calls into the driver. These calls can be reads, writes, or erases. If the triggering call is a write then the block affected will be the one containing the page being written. Otherwise whichever page is the subject of the next write operation will be the one affected.
inject erase block 1 after 3 block_erases
This definition will only ever affect block 1. The first two calls to erase block 1 will succeed. The third call will fail. Erase calls for any other block have no effect on this definition.
inject write page 9860 after 1000 writes
This definition will only ever affect page 9860. The definition will
trigger after 1000 writes to any page. If the thousandth write happens
to be for page 9860, it will fail. Otherwise the next write for page
9860 will fail, whenever that happens. If the definition specified
event type page_writes
instead
of writes
it would trigger only after 1000 writes
to page 9860 instead of 1000 writes to any page.
Trigger definitions can be followed by two optional keywords. The
first is repeat
and indicates that the definition
should trigger multiple times, not just
once. repeat
can only be used for block or
page current
, not for a specific block or page,
since any given block can only fail once. The second keyword is
disabled
. This can be used to create a bad block
injection definition which is not active by default but which can be
enabled in the GUI interface.
All bad block injection definitions operate in parallel, not sequentially. It is possible for multiple definitions to trigger during a single call but a given block can only fail once.
Interactive Dialog
Although it is possible to change the logging and other settings
between test runs by editing the target definition file, the package
also provides a way of changing these during driver initialization
time. If the command line option --nanddebug
is
used in addition to --io
then the I/O auxiliary
will pop up a dialog box allowing the various settings to be edited.
Figure 2. I/O auxiliary Dialog, Files
The dialog consists of a tabbed notebook with separate tabs for Files, Logging, and Errors. The default tab is Files. This shows the NAND device settings as configured via CDL options. If a current NAND image file exists then it can be deleted with a single click, allowing the test run to proceed with a new blank NAND device. The current image file can be saved away to an archive, either using a default name based on the current image name or by letting the user select a file. It is also possible to restore a previously-saved archive, again using the default name if that archive exists, or by selecting an alternative file. If any of the relevant files are changed outside this dialog then the refresh button forces the dialog to check the filesystem again. The various operations will be reported at the bottom of the dialog.
If the Logging tab is selected then the dialog changes to the following:
Figure 3. I/O auxiliary Dialog, Logging
There are separate checkbuttons for the six types of events that can be logged. It is also possible to change the current logfile and to edit the other settings related to logfiles.
The Errors tab is not yet implemented.
The dialog can be dismissed using the OK button or by hitting the ESC key. At that point the eCos application will resume running with the selected settings.
File Formats
A NAND image is a binary file with six sections: a 64-byte header block; an array of erase counts; an array of write counts; an array of the factory-bad blocks; a bitmap of the good and bad blocks; and the actual data. The header consists of the following data structure:
struct header { cyg_uint32 magic; // 0xEC05A11F cyg_uint32 page_size; // CYGNUM_DEVS_NAND_SYNTH_PAGESIZE cyg_uint32 spare_size; // CYGNUM_DEVS_NAND_SYNTH_SPARE_PER_PAGE cyg_uint32 pages_per_block; // CYGNUM_DEVS_NAND_SYNTH_PAGES_PER_BLOCK cyg_uint32 number_of_blocks;// CYGNUM_DEVS_NAND_SYNTH_BLOCK_COUNT cyg_uint32 tv_sec; cyg_uint32 tv_usec; cyg_uint32 spare[9]; };
All integers in the header and in the following three sections are
stored in bigendian format. The magic
field
is used to check that a file really holds a NAND image. The next four
fields specify the emulated device size and layout, as per the
corresponding CDL options. The tv_sec
and
tv_usec
fields are filled in with the
result of a gettimeofday()
system call during
driver initialization. These fields also appear in logfiles, so code
can check that a logfile and an image file correspond to the same test
run.
Following the header is an array of BLOCK_COUNT integers holding the number of erase calls for each block. Next is an array of (BLOCK_COUNT * PAGES_PER_BLOCK) integers holding the number of write calls for each page. These arrays can be used to check that higher-level code is performing wear levelling.
The array of factory-bad blocks consists of 32 integers holding the
settings of the target definition's file
factory_bad
setting at the time that the image file
was created.
The bitmap following the factory-bad block array holds the current state of each erase block. Bit 0 of byte 0 corresponds to erase block 0, bit 0 of byte 1 corresponds to erase block 8, and so on. If a bit is set then the erase block is ok. If a bit is clear then the erase block is bad, either factory-bad or because it has failed subsequently due to a bad block injection.
The bad block bitmap is followed by the actual data. This consists of all erase blocks concatenated without padding, starting with erase block 0. Each erase block is stored starting from page 0 within that block, and again all the pages are concatenated without padding. For each page the actual data is stored first, followed by the spare (OOB) data.
A logfile is a plain text file, not a binary file. It holds one log event per line. Some of these lines can be rather long if READ or WRITE logging is enabled. The fields within each line are separated by a single space. The first field indicates the type of record. The next two fields are call counts, one for the event in question and one for the total number of calls into the driver. The remaining fields depend on the record type.
I 0 0 1247514233 562000 synth_nand.dat 2048 64 32 1024
This is an initialization record when the logfile is created. There
have been no calls to the driver yet so the two counts are 0. The next
two fields are the tv_sec
and tv_usec
timestamp values which are also
written into the NAND image file. This allows logfiles and image files
to be matched up. The remaining fields identify the page size, the
spare size, the number of pages per erase block, and the number of
erase blocks.
F 2 4 43 0
This is a call into the driver's factorybad
function. It is the second such call, and the fourth call into the
driver. The query is for erase block 42, and that block has not been
marked as factory-bad.
r 5 12 32736 0x0200f0c0 2048 0x0200eeb0 64
This is a read
event. It is the fifth page read
into the driver and the 12th call. The request is for page number
32736. 2048 bytes of data should be read into a buffer at location
0x0200f0c0, and 64 bytes of OOB data should be read into 0x0200eeb0.
Rd 5 12 32736 0x0200f0c0 2048 FFFFFFFFFFFF… Ro 5 12 32736 0x0200eeb0 64 FFFFFFFFFFFF…
These are two READ
log events corresponding to the
previous read
. The first line Rd
is for the data part, and the final field consists of 4096 bytes of
hexadecimal data. The second line Ro
is for the OOB
part and the final field consists of 128 bytes of hexadecimal data.
w 1 1030 32736 0x0200f0c0 2048 0x0200eed0 64 Wd 1 1030 32736 0x0200f0c0 2048 FFFFFFFFF3FFFFFFFFFFCF… Wo 1 1030 32736 0x0200eed0 64 FFFFFFFFFFFFFFFF426274…
These lines show a write
log event and
a WRITE
log event for the same call into the
driver. The fields are the same as for read
and READ
.
E 2 1031 1022
This logs an erase call into the library for block 1022. It is the second erase call, and by this time there have been 1031 calls into the driver.
Bb 1 2857 631 … Bp 2 4012 3189 99
These lines show bad block injections. The first is for an erase operation for block 631. That erase call is about to fail with EIO and the block will be marked bad. The second is for a write operation for page 3189, which is part of erase block 99. That write operation is about to fail and all of erase block 99 will be marked bad.
Installation
Before a synthetic target eCos application can use a NAND device it is
necessary to build and install host-side support. The relevant code
resides in the host
subdirectory of the synthetic target NAND package and building it
involves the
standard configure, make
and make install steps. The implementation of the
NAND support does not require any executables, just a Tcl
script nand.tcl
and some support files, so
the make step is a no-op.
There are two main ways of building the host-side software. It is
possible to build both the generic host-side software and all
package-specific host-side software, including the NAND support,
in a single build tree. This involves using the
configure script at the toplevel of the eCos
repository. For more information on this, see the
README.host
file at the top of the repository.
Note that if you have an existing build tree which does not include
the synthetic target NAND support then it will be necessary to
rerun the toplevel configure script: the search for appropriate
packages happens at configure time.
The alternative is to build just the host-side for this package.
This requires a separate build directory, building directly in the
source tree is disallowed. The configure options
are much the same as for a build from the toplevel, and the
README.host
file can be consulted for more
details. It is essential that the NAND support be configured with
the same --prefix
option as other eCos host-side
software, especially the I/O auxiliary provided by the
synthetic target architectural HAL package, otherwise the I/O auxiliary will be
unable to locate the NAND support.
Test programs
- bbt
Bad Block Table unit test. Finds a readable block, then fiddles with its status in the BBT confirming expected behaviour. Requires the synthetic NAND device.
- multipagebbt
As for bbt but insists that the device parameters mean that the BBT spans multiple pages on-chip. (This is perhaps a contrived case, but might crop up in future with larger devices, so needed to be tested.)
- eccdamage
An ECC error fuzzing exercise. Requires
CYGSEM_NAND_SYNTH_RANDOMLY_LOSE
, which induces pseudo-random bit errors; after 1,000 runs, the number of errors corrected is reported.
2024-12-10 | Open Publication License |