The COPY command is designed for bulk loading data from a file on a cluster host into a Vertica database. (See LCOPY to load from a data file on a client system.) COPY reads data from a delimited text file and inserts tuples either into the
WOS (memory) or directly into the
ROS (disk).
The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of
compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The WOS (Write Optimized Store) is a memory-resident data structure into which INSERT, UPDATE, DELETE, and COPY (without DIRECT hint) actions are recorded. Like the
ROS, the WOS is arranged by projection but it stores tuples without sorting,
compression, or indexing and thus supports very fast load speeds. The WOS organizes data by epoch and holds uncommitted transaction data.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of
compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The database superuser is the automatically-created database user who has the same name as the Linux database administrator account and who can bypass all GRANT/REVOKE authorization. Superuser status cannot be granted to another user. (The concept of a database superuser should not be confused with Linux superuser (root) privilege. In fact, a database superuser cannot have Linux superuser privilege.)
table
|
specifies the name of a schema table (not a projection). Vertica loads the data into all projections that include columns from the schema table. It does not delete or overwrite any existing data.
|
column
|
restricts the load to one or more specific columns in the table (all columns are loaded by default). Table columns that are not in the column list are given their default values. If no default value is defined for a column, COPY inserts NULL.
The data file must contain the same number of columns as the COPY command's column list. For example, in a table T1 with nine columns (C1 through C9), COPY T1 (C1, C6, C9) would load the three columns of data in each record to columns C1, C6, and C9 respectively.
|
FROM 'file'
|
specifies the absolute pathname of the text file containing the data. The file must be accessible to the host on which the COPY statement runs. (You can use variables to construct the pathname as described in Using Load Scripts.)
|
STDIN
|
reads from the standard input instead of a file.
|
WITH
AS
|
are for readability and have no effect.
|
DELIMITER 'char'
|
specifies the single-character column delimiter in the text file. For example, comma ',' is the delimiter commonly used in textual (CSV) data files. In data files, the number of delimited column values is significant; rows can begin and/or end with a delimiter or a column value.
The default delimiter is the tab character. The example database data files use a different delimiter: vertical bar (|).
Use the backslash character (\) to specify special (non-printing, control) characters as the delimiter. For example:
'\t' = tab character (the default)
'\\' = backslash character (not a good choice)
If the delimiter character appears in string data values, you can use the backslash character to indicate that it is a literal (see Loading Character Data).
|
NULL 'string'
|
specifies the multi-character string that represents a null value such as 'NULL'. The null string is case-insensitive and must be the only value between the delimiters. For example, if the null string is NULL and the delimiter is the vertical bar (|):
|nuLL| indicates a null value
| nuLL | does not indicate a null value
The default null string is \N and \n (backslash uppercase en or backslash lowercase en). The example database data files use the default null string.
When you use the COPY command in a script, you must use a double-backslash in a null string that includes a backslash.
For example, the scripts used to load the example databases contain:
COPY ... NULL '\\n' ...
The example scripts specify the null string to demonstrate this requirement, in spite of the fact that it is the default null string.
The null string that you specify for the Database Designer does not require a double-backslash.
The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of superprojections to ensure K-Safety, and optionally pre-join projections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints. You can, however, write a custom projection script should the Database Designer not meet your needs.
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
|
RECORD TERMINATOR 'string'
|
specifies the literal character string that indicates the end of a data file record. You can include non-printing characters and backslash characters in the string according to the following convention:
Sequence
|
Description
|
Abbreviation
|
ASCII Decimal
|
\0
|
Null character
|
NUL
|
0
|
\a
|
Bell
|
BEL
|
7
|
\b
|
Backspace
|
BS
|
8
|
\t
|
Horizontal Tab
|
HT
|
9
|
\n
|
Linefeed
|
LF
|
10
|
\v
|
Vertical Tab
|
VT
|
11
|
\f
|
Formfeed
|
FF
|
12
|
\r
|
Carriage Return
|
CR
|
13
|
\\
|
Backslash
|
|
92
|
|
EXCEPTIONS 'pathname'
|
specifies the filename or absolute pathname in which to write messages indicating the input line number and the reason for each rejected data record. The default pathname is:
catalog-dir/CopyErrorLog/input-filename-copy-from-exceptions
where catalog-dir represents the directory in which the database catalog files are stored, and input-filename is the name of the data file. If copying from STDIN , the input-filename is STDIN .
|
REJECTED DATA 'pathname'
|
specifies the filename or absolute pathname in which to write rejected rows. This file can then be edited to resolve problems and reloaded.
The default pathname is:
catalog-dir/CopyErrorLog/input-filename-copy-from-rejected-data
where catalog-dir represents the directory in which the database catalog files are stored, and input-filename is the name of the data file. If copying from STDIN , the input-filename is STDIN .
|
ABORT ON ERROR
|
stops the COPY command if a row is rejected and rolls back the command. No data is loaded.
|
DIRECT
|
specifies that the data should go directly to the ROS (Read Optimized Store. By default, data goes to the WOS (Write Optimized Store).
The WOS (Write Optimized Store) is a memory-resident data structure into which INSERT, UPDATE, DELETE, and COPY (without DIRECT hint) actions are recorded. Like the ROS, the WOS is arranged by projection but it stores tuples without sorting, compression, or indexing and thus supports very fast load speeds. The WOS organizes data by epoch and holds uncommitted transaction data.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
|
Although they both specify the same things, the syntax of the COPY command is different from the
Database Designer input parameter syntax.
The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of
superprojections to ensure K-Safety, and optionally pre-join projections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints. You can, however, write a custom projection script should the Database Designer not meet your needs.
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
Note to author: list exactly which parameters this refers to.