CREATE PROJECTION

The CREATE PROJECTION command creates metadata for a projection in the Vertica catalog. It does not load data into physical storage unless the tables(s) over which the projection is defined already contain data. In that case, the new projection becomes available as soon as the database has automatically refreshed it. This process may take a long time, depending on how much data is in the table(s).

Use CREATE PROJECTION only when advised to do so by Technical Support. Improper use can corrupt data and/or damage a database.

Syntax

CREATE PROJECTION projection-name

( projection-column [ ENCODING encoding-type ] [ , ... ] )

AS SELECT table-column [ , ... ]

FROM table-reference [ , ... ]

[ WHERE join-predicate [ AND join-predicate ] ...

[ ORDER BY table-column [ , ... ] ]

[ hash-segmentation-clause | range-segmentation-clause

Range segmentation allows you to segment a projection based on a known range of values stored in a specific column chosen to provide even distribution of data across a set of nodes, resulting in optimal query execution.

Vertica Systems, Inc. recommends that you use Hash Segmentation instead of range segmentation.

Syntax

SEGMENTED BY expression

NODE node VALUES LESS THAN value

NODE node VALUES LESS THAN MAXVALUE

Semantics (Range Segmentation)

SEGMENTED BY expression

is a single column reference to a column in the SELECT list of the CREATE PROJECTION command. Choose a column that has:

INTEGER or FLOAT data type
a known range of data values
an even distribution of data values
a large number of unique data values
Avoid columns that:
are foreign keys
are used in query predicates
have a date/time data type
have correlations with other columns due to functional dependencies.
Segmenting on date/time data types is valid but guaranteed to produce temporal skew in the data distribution and is not recommended. If you choose this option, do not use TIME or TIMETZ because their range is only 24 hours.

NODE node

a symbolic name for a node. You can use a specific node only once in any projection. For a list of the nodes in a database, use the View Database command in the Administration Tools.

VALUES LESS THAN value

specifies that this segment can contain a range of data values less than the specified value, except that segments cannot overlap. In other words, the minimum value of the range is determined by the value of the previous segment (if any).

MAXVALUE

specifies a sub-range with no upper limit. In other words, it represents a value greater than the maximum value that can exist in the data. The maximum value depends on the data type of the segmentation column.

Notes

The SEGMENTED BY expression syntax allows a general SQL expression but there is no reason to use anything other than a single

column reference for range segmentation. If you wish to use a different expression, the following restrictions apply:

All leaf expressions must be either

constants or

column-references toa column in the SELECT list of the CREATE PROJECTION command

Aggregate functions are not allowed
The expression must return the same value over the life of the database.

During INSERT or COPY to a segmented projection, if expression produces a value outside the expected range (a negative value for example), no error occurs, and the row is added to a segment of the projection.
CREATE PROJECTION with range segmentation accepts the deprecated syntax SITE node for compatibility with previous releases.
CREATE PROJECTION with range segmentation allows the SEGMENTED BY expression to be a single column-reference to a column in the projection-column list for compatibility with previous releases. This syntax is considered to be a deprecated feature and causes a warning message. See DEPRECATED syntax in the Troubleshooting Guide.

Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across some or all of the nodes in a cluster, resulting in optimal query execution.

Hash segmentation is the preferred method of segmentation in Vertica 2.0 and later.

Syntax

SEGMENTED BY expression

[ ALL NODES [ OFFSET offset ] | NODES node [ ,... ] ]

Semantics

SEGMENTED BY expression	can be a general SQL expression but there is no reason to use anything other than the built-in HASH function with one or more NOT NULL columns as arguments. Choose columns that have a large number of unique data values and acceptable skew in the data distribution. Primary key columns that meet the criteria may be an excellent choice for hash segmentation.
ALL NODES	automatically distributes the data evenly across all nodes at the time the `CREATE PROJECTION` statement is executed. The ordering of the nodes is fixed.
OFFSET offset	is an integer that specifies the node within the ordered sequence on which to start the segmentation distribution, relative to 0. See example below.
NODES node [ ,... ]	specifies a subset of the nodes in the cluster over which to distribute the data. You can use a specific node only once in any projection. For a list of the nodes in a database, use the View Database command in the Administration Tools. The tools needed for administering a Vertica database are provided in the form of a graphical user interface that allows you to perform various tasks quickly and easily. The tools also provide a convenient way to connect to a database using vsql. Always run the Administration Tools on the Administration Host if possible. vsql is the Vertica implementation of psql, a character-based, interactive, front-end that is part of PostgreSQL and used by other database management systems. It allows you to type in SQL statements and see the results. It also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks. $ /opt/vertica/bin/adminTools See the Administration Tools Reference for a complete description.

Notes

Hash segmentation is the preferred method of segmentation in Vertica 2.0 and later. The Database Designer uses hash segmentation by default.
The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of superprojections to ensure K-Safety, and optionally pre-join projections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints. You can, however, write a custom projection script should the Database Designer not meet your needs.

A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
During INSERT or COPY to a segmented projection, if expression produces a value outside the expected range (a negative value for example), no error occurs, and the row is added to a segment of the projection.
No OFFSET clause is equivalent to OFFSET 0.
CREATE PROJECTION accepts the deprecated syntax SITES node for compatibility with previous releases.

If you wish to use a different SEGMENTED BY expression, the following restrictions apply:

All leaf expressions must be either

constants or

column-references to a column in the SELECT list of the CREATE PROJECTION command

Aggregate functions are not allowed
The expression must return the same value over the life of the database.
The expression must return non-negative INTEGER values in the range 0 to 263.

Examples

CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES;

CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES OFFSET 1;

The example produces two hash-segmented buddy projections that form part of a K-Safe design. The projections can use different sort orders.

| UNSEGMENTED { NODE node | ALL NODES } ]

Semantics

projection-name

specifies the name of the projection (see note below).

projection-column

specifies the name of a column in the projection. The data type is inferred from the corresponding column in the schema table (based on ordinal position). Different projection-column names can be used to distinguish multiple columns of the same name from different tables so that no aliases are needed.

encoding-type

specifies the type of encoding to use on the column. The Database Designer automatically chooses an appropriate encoding for each projection column.

SELECT table-column

specifies a list of schema table columns corresponding (in ordinal position) to the projection columns.

table-reference

specifies a list of schema tables containing the columns to include in the projection in the form:

table-name [ AS ] alias [ ( column-alias [ , ...] ) ] [ , ...] ]

WHERE join-predicate

specifies foreign-key = primary-key equijoins between the fact table and dimension tables. Foreign key columns must be NOT NULL. No other predicates are allowed.

ORDER BY table-column

specifies which columns to sort. Because all projection columns are sorted in ascending order in physical storage, CREATE PROJECTION does not allow you to specify ascending or descending.

hash-segmentation-clause

allows you to segment a projection based on a built-in hash function that provides even distribution of data across nodes, resulting in optimal query execution. See hash-segmentation-clause.

Hash segmentation is the preferred method of segmentation in Vertica 2.0 and later.

Syntax

SEGMENTED BY expression

[ ALL NODES [ OFFSET offset ] | NODES node [ ,... ] ]

Semantics

SEGMENTED BY expression	can be a general SQL expression but there is no reason to use anything other than the built-in HASH function with one or more NOT NULL columns as arguments. Choose columns that have a large number of unique data values and acceptable skew in the data distribution. Primary key columns that meet the criteria may be an excellent choice for hash segmentation.
ALL NODES	automatically distributes the data evenly across all nodes at the time the `CREATE PROJECTION` statement is executed. The ordering of the nodes is fixed.
OFFSET offset	is an integer that specifies the node within the ordered sequence on which to start the segmentation distribution, relative to 0. See example below.
NODES node [ ,... ]	specifies a subset of the nodes in the cluster over which to distribute the data. You can use a specific node only once in any projection. For a list of the nodes in a database, use the View Database command in the Administration Tools. The tools needed for administering a Vertica database are provided in the form of a graphical user interface that allows you to perform various tasks quickly and easily. The tools also provide a convenient way to connect to a database using vsql. Always run the Administration Tools on the Administration Host if possible. vsql is the Vertica implementation of psql, a character-based, interactive, front-end that is part of PostgreSQL and used by other database management systems. It allows you to type in SQL statements and see the results. It also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks. $ /opt/vertica/bin/adminTools See the Administration Tools Reference for a complete description.

Notes

Hash segmentation is the preferred method of segmentation in Vertica 2.0 and later. The Database Designer uses hash segmentation by default.
The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of superprojections to ensure K-Safety, and optionally pre-join projections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints. You can, however, write a custom projection script should the Database Designer not meet your needs.

A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
During INSERT or COPY to a segmented projection, if expression produces a value outside the expected range (a negative value for example), no error occurs, and the row is added to a segment of the projection.
No OFFSET clause is equivalent to OFFSET 0.
CREATE PROJECTION accepts the deprecated syntax SITES node for compatibility with previous releases.

If you wish to use a different SEGMENTED BY expression, the following restrictions apply:

All leaf expressions must be either

constants or

column-references to a column in the SELECT list of the CREATE PROJECTION command

Aggregate functions are not allowed
The expression must return the same value over the life of the database.
The expression must return non-negative INTEGER values in the range 0 to 263.

Examples

CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES;

CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES OFFSET 1;

The example produces two hash-segmented buddy projections that form part of a K-Safe design. The projections can use different sort orders.

range-segmentation-clause

allows you to segment a projection based on a known range of values stored in a specific column chosen to provide even distribution of data across a set of nodes, resulting in optimal query execution. See range-segmentation-clause.

Vertica Systems, Inc. recommends that you use Hash Segmentation instead of range segmentation.

Syntax

SEGMENTED BY expression

NODE node VALUES LESS THAN value

NODE node VALUES LESS THAN MAXVALUE

Semantics (Range Segmentation)

SEGMENTED BY expression

is a single column reference to a column in the SELECT list of the CREATE PROJECTION command. Choose a column that has:

INTEGER or FLOAT data type
a known range of data values
an even distribution of data values
a large number of unique data values
Avoid columns that:
are foreign keys
are used in query predicates
have a date/time data type
have correlations with other columns due to functional dependencies.
Segmenting on date/time data types is valid but guaranteed to produce temporal skew in the data distribution and is not recommended. If you choose this option, do not use TIME or TIMETZ because their range is only 24 hours.

NODE node

a symbolic name for a node. You can use a specific node only once in any projection. For a list of the nodes in a database, use the View Database command in the Administration Tools.

VALUES LESS THAN value

MAXVALUE

Notes

The SEGMENTED BY expression syntax allows a general SQL expression but there is no reason to use anything other than a single

column reference for range segmentation. If you wish to use a different expression, the following restrictions apply:

All leaf expressions must be either

constants or

column-references toa column in the SELECT list of the CREATE PROJECTION command

Aggregate functions are not allowed
The expression must return the same value over the life of the database.

During INSERT or COPY to a segmented projection, if expression produces a value outside the expected range (a negative value for example), no error occurs, and the row is added to a segment of the projection.
CREATE PROJECTION with range segmentation accepts the deprecated syntax SITE node for compatibility with previous releases.
CREATE PROJECTION with range segmentation allows the SEGMENTED BY expression to be a single column-reference to a column in the projection-column list for compatibility with previous releases. This syntax is considered to be a deprecated feature and causes a warning message. See DEPRECATED syntax in the Troubleshooting Guide.

NODE node

creates an unsegmented projection on the specified node only. Dimension table projections must be UNSEGMENTED.

ALL NODES

creates a separate unsegmented projection on each node at the time the CREATE PROJECTION statement is executed (automatic replication). In order to do distributed query execution, Vertica requires an exact, unsegmented copy of each dimension table superprojection on each node. See projection naming note below.

Unsegmented Projection Naming

CREATE PROJECTION ... UNSEGMENTED takes a snapshot of the nodes defined at execution time to generate a node list in a predictable order. Thus, replicated projections have the name:

projection-name_node-name

For example, if the nodes are named NODE01, NODE02, and NODE03 then:

CREATE PROJECTION ABC ... UNSEGMENTED ALL NODES

creates projections named ABC_NODE01, ABC_NODE02, and ABC_NODE03.

This naming convention may impact functions that provide information about projections, for example, GET_TABLE_PROJECTIONS or GET_PROJECTION_STATUS where you will need to provide the name ABC_NODE01 instead of just ABC. To view a list of the nodes in a database, use the View Database command in the Administration Tools.

Notes

If no segmentation is specified, the default is UNSEGMENTED on the node where the CREATE PROJECTION was executed.

In This Section

encoding-type

hash-segmentation-clause

range-segmentation-clause

Type	Description	Use For
`RLE`	RLE (Run Length Encoding) replaces sequences of the same data values within a column by a single value and a count number.	sorted columns in which the average number of repeated rows exceeds ten.
`DELTAVAL`	DELTAVAL (Delta Encoding) stores only the differences between sequential data values rather than the values themselves.	columns that store sorted integer or date data that has a narrow range of values.
`NONE`	automatic encoding (default)	all columns not suitable for RLE or DELTAVAL.