COPY

The COPY command is designed for bulk loading data from a file on a cluster host into a Vertica database. (See LCOPY to load from a data file on a client system.) COPY reads data from a delimited text file and inserts tuples either into the WOS (memory) or directly into the ROS (disk).

You must connect as the database superuser in order to COPY from a file.

Syntax

COPY table [ column [ ,...] ]

FROM { 'file' | STDIN }

[ WITH ]

[ DELIMITER [ AS ] 'char' ]

[ NULL [ AS ] 'string' ]

[ RECORD TERMINATOR 'string' ]

[ EXCEPTIONS 'pathname' ]

[ REJECTED DATA 'pathname' ]

[ ABORT ON ERROR ]

[ DIRECT ]

Semantics

table

specifies the name of a schema table (not a projection). Vertica loads the data into all projections that include columns from the schema table. It does not delete or overwrite any existing data.

column

restricts the load to one or more specific columns in the table (all columns are loaded by default). Table columns that are not in the column list are given their default values. If no default value is defined for a column, COPY inserts NULL.

The data file must contain the same number of columns as the COPY command's column list. For example, in a table T1 with nine columns (C1 through C9), COPY T1 (C1, C6, C9) would load the three columns of data in each record to columns C1, C6, and C9 respectively.

FROM 'file'

specifies the absolute pathname of the text file containing the data. The file must be accessible to the host on which the COPY statement runs. (You can use variables to construct the pathname as described in Using Load Scripts.)

STDIN

reads from the standard input instead of a file.

WITH

are for readability and have no effect.

DELIMITER 'char'

specifies the single-character column delimiter in the text file. For example, comma ',' is the delimiter commonly used in textual (CSV) data files. In data files, the number of delimited column values is significant; rows can begin and/or end with a delimiter or a column value.

The default delimiter is the tab character. The example database data files use a different delimiter: vertical bar (|).

Use the backslash character (\) to specify special (non-printing, control) characters as the delimiter. For example:

'\t' = tab character (the default)

'\\' = backslash character (not a good choice)

If the delimiter character appears in string data values, you can use the backslash character to indicate that it is a literal (see Loading Character Data).

NULL 'string'

specifies the multi-character string that represents a null value such as 'NULL'. The null string is case-insensitive and must be the only value between the delimiters. For example, if the null string is NULL and the delimiter is the vertical bar (|):

|nuLL| indicates a null value

| nuLL | does not indicate a null value

The default null string is \N and \n (backslash uppercase en or backslash lowercase en). The example database data files use the default null string.

When you use the COPY command in a script, you must use a double-backslash in a null string that includes a backslash.

For example, the scripts used to load the example databases contain:

COPY ... NULL '\\n' ...

The example scripts specify the null string to demonstrate this requirement, in spite of the fact that it is the default null string.

The null string that you specify for the Database Designer does not require a double-backslash.

RECORD TERMINATOR 'string'

specifies the literal character string that indicates the end of a data file record. You can include non-printing characters and backslash characters in the string according to the following convention:

Sequence	Description	Abbreviation	ASCII Decimal
\0	Null character	NUL	0
\a	Bell	BEL	7
\b	Backspace	BS	8
\t	Horizontal Tab	HT	9
\n	Linefeed	LF	10
\v	Vertical Tab	VT	11
\f	Formfeed	FF	12
\r	Carriage Return	CR	13
\\	Backslash		92

EXCEPTIONS 'pathname'

specifies the filename or absolute pathname in which to write messages indicating the input line number and the reason for each rejected data record. The default pathname is:

catalog-dir/CopyErrorLog/input-filename-copy-from-exceptions

where catalog-dir represents the directory in which the database catalog files are stored, and input-filename is the name of the data file. If copying from STDIN, the input-filename is STDIN.

REJECTED DATA 'pathname'

specifies the filename or absolute pathname in which to write rejected rows. This file can then be edited to resolve problems and reloaded.

The default pathname is:

catalog-dir/CopyErrorLog/input-filename-copy-from-rejected-data

ABORT ON ERROR

stops the COPY command if a row is rejected and rolls back the command. No data is loaded.

DIRECT

specifies that the data should go directly to the ROS (Read Optimized Store. By default, data goes to the WOS (Write Optimized Store).

Although they both specify the same things, the syntax of the COPY command is different from the Database Designer input parameter syntax.

Note to author: list exactly which parameters this refers to.

Notes

The COPY command automatically commits itself and any current transaction. Vertica recommends that you COMMIT or ROLLBACK the current transaction before using COPY.
You cannot use the same character in both the DELIMITER and NULL strings.
NULL values are not allowed for columns with primary key or foreign key referential integrity constraints.
Referential integrity in Vertica consists of a set of constraints (logical schema objects) that define primary key and foreign key columns. In a star schema or snowflake schema:
- Each dimension table must have a PRIMARY KEY constraint.
- The fact table must contain columns that can be used to join the fact table to dimension tables.
- Fact table join columns must have FOREIGN KEY constraints in order to participate in pre-join projections.
- Outer join queries produce expected results only when the fact table join column used in the query does not have a FOREIGN KEY constraint.
String data in load files is considered to be all characters between the specified delimiters. Do not enclose character strings in quotes. In other words, quote characters are treated as ordinary data.
Invalid input is defined as:
- Missing columns (too few columns in an input line).
- Extra columns (too many columns in an input line).
- Empty columns for INTEGER or date/time data types. COPY does not use the default data values defined by the CREATE TABLE command.
- Incorrect representation of data type. For example, non-numeric data in an integer column is invalid.
Empty values (two consecutive delimiters) are accepted as valid input data for CHAR and VARCHAR data types. Empty columns are stored as an empty string (''), which is not equivalent to a null string.
Cancelling a COPY statement rolls back all rows loaded by that statement.

Examples

COPY Store_Dimension

FROM :input_file

DELIMITER '|'

NULL '\\n'

RECORD TERMINATOR '\f'

DIRECT;