Book Contents

Book Index

Next Topic

Home

Loading Character Data

Character Set

The type of a data file must be compatible with the database character set. For example, if the data file is type ASCII or UTF-8, you can be load it into a UTF-8 database. However, if the data file is type ISO 8859-1 (Latin1), which is not compatible with UTF-8, the column values containing multi-byte characters cannot be displayed in the result set.

  1. Use the file command to check the type of a data file. For example:

    $ file Date_Dimension.tbl

    Date_Dimension.tbl: ASCII text

    The file command may indicate ASCII TEXT even though the file contains multi-byte characters.

  2. Use the wc command to check for this problem. For example:

    $ wc Date_Dimension.tbl

    1828 5484 221822 Date_Dimension.tbl

    If the wc command returns an error such as Invalid or incomplete multibyte or wide character, the data file is using an incompatible character set.

Using Quoted Characters as Literals

You can use the backslash character (\) to quote data characters that would otherwise be taken as special characters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value:

Examples

In these examples, the DELIMITER is comma for visibility.

,1,2,3,

,1,2,3

1,2,3,

Leading and trailing delimiters are ignored. Thus, the rows all have three columns.

123,\n,\\n,456

Using the default null string (\n), the row would be interpreted as:

123

NULL

\n

456

Using a non-default null string, the row would be interpreted as:

123

newline

\n

456

123,this\, that\, or the other,something else,456

 

The row would be interpreted as:

123

this, that, or the other

something else

456