Chapter 36. Non-ASCII Character Set Support
If long filename support is enabled (by setting
CYGCFG_FS_FAT_LONG_FILE_NAMES
) then all strings
passed to and from the filesystem may be encoded using UTF-8. This
allows files to be named using characters beyond the basic ASCII set.
If long filename support is disabled then file names are limited to the standard 8.3 format. However, these file names are preserved in an 8-bit clean format, if they contain non-ASCII characters, so that any multi-byte encodings are preserved.
Filesystems created by devices that do not support long filenames may have 8.3 names that are encoded using non-ASCII and non-Unicode character sets. Typically these will be encoded according to Microsoft code page character sets. To permit these names to pass through the rest of the filesystem, and compare correctly during file searches, when long filename support is enabled, these names need to be translated into Unicode. Since the filesystem has no built-in internationalization support, beyond Unicode, it is the responsibility of the application or middleware layers to supply the translation of these values to and from Unicode. The FILEIO package defines callbacks that may be used to do this:
typedef int cyg_fs_mbcs_to_utf16le( CYG_ADDRWORD data, const cyg_uint8 *mbcs, int size, cyg_uint16 *utf16le); typedef int cyg_fs_utf16le_to_mbcs( CYG_ADDRWORD data, const cyg_uint16 *utf16le, int size, cyg_uint8 *mbcs); struct cyg_fs_mbcs_translate { cyg_fs_mbcs_to_utf16le *mbcs_to_utf16le; cyg_fs_utf16le_to_mbcs *utf16le_to_mbcs; CYG_ADDRWORD data; };
These callback functions may be registered after a filesystem has been
mounted by using cyg_fs_setinfo()
as follows:
struct cyg_fs_mbcs_translate translate; … translate.mbcs_to_utf16le = my_mbcs_to_utf16le; translate.utf16le_to_mbcs = my_utf16le_to_mbcs; translate.data = (CYG_ADDRWORD)my_data; err = cyg_fs_setinfo("/disk0", FS_INFO_MBCS_TRANSLATE, &translate, sizeof(translate));
Following this, whenever the filesystem encounters a short file name
that contains non-ASCII characters the registered
mbcs_to_utf16le()
function will be called to
translate it. In the call, the data
argument
will be a copy of the data
field of the
cyg_fs_mbcs_translate structure. The
mbcs
argument points to the sequence of
size
bytes to be translated. The resulting
translation should be stored in utf16le
and the
number of 16-bit values stored returned from the function.
When the filesystem needs to encode a string into the multibyte
character set, it will call the utf16le_to_mbcs()
function. In the call, the data
argument
will be a copy of the data
field of the
cyg_fs_mbcs_translate structure. The
utf16le
argument points to the sequence of
size
16-bit values to be translated. The resulting
translation should be stored in mbcs
and the
number of bytes stored returned from the function.
It is important to note that translation is to and from UTF-16LE. All 16 bit values are stored in little endian byte order and Unicode code points outside the Basic Multilingual Plane are encoded as surrogate pairs. This is the format mandated by Microsoft for long file names in the FAT filesystem. See IETF RFC2781 for details of the encoding.
In the current implementation the
utf16le_to_mbcs()
will not be called. If long
filename support is disabled, then the filesystem will store multibyte
characters as they are supplied. If long filename support is enabled
then new files will be created with long names if any non-ASCII
characters are present. Renamed files will be converted to the long
name form automatically. This function is present in case future
enhancements require it. For now applications should install a
function that simply returns zero.
2025-01-10 | eCosPro Non-Commercial Public License |