The CREATE PROJECTION command creates metadata for a
projection in the Vertica catalog. It does not load data into physical storage unless the tables(s) over which the projection is defined already contain data. In that case, the new projection becomes available as soon as the database has automatically
refreshed it. This process may take a long time, depending on how much data is in the table(s).
A refresh operation ensures that all projections on a node are up-to-date (can participate in query execution). This process may take a long time, depending on how much data is in the table(s).
A projection is a special case of a
materialized view that provides physical storage for data. A projection can contain some or all of the columns of one or more tables. A projection that contains all of the columns of a table is called a
superprojection. A projection that joins one or more tables is called a pre-join projection. Most projections are used for ad-hoc query processing and K-safety but it is possible to have query-specific projections.
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
A materialized view is similar to a standard SQL view with one major exception: the data is actually stored on disk rather than computed each time the view is used in a query. A materialized view, then, must be refreshed whenever the data in the underlying tables is changed. A projection is a special case of a materialized view.
Use CREATE PROJECTION only when advised to do so by
Technical Support. Improper use can corrupt data and/or damage a database.
To submit problem reports, questions, comments, and suggestions, please use the Technical Support page on the Vertica Systems, Inc. Web site:
http://www.vertica.com/support
You must be a registered user in order to access the page.
Before reporting a problem, please run the Diagnostics Utility described in the Troubleshooting Guide and attach the resulting .zip file.
Syntax
CREATE PROJECTION projection-name
( projection-column [ ENCODING
encoding-type ] [ , ... ] )
Type |
Description |
Use For |
|
RLE (Run Length Encoding) replaces sequences of the same data values within a column by a single value and a count number. |
sorted columns in which the average number of repeated rows exceeds ten. |
|
DELTAVAL (Delta Encoding) stores only the differences between sequential data values rather than the values themselves. |
columns that store sorted integer or date data that has a narrow range of values. |
|
automatic encoding (default) |
all columns not suitable for RLE or DELTAVAL. |
AS SELECT table-column [ , ... ]
FROM table-reference [ , ... ]
[ WHERE
join-predicate [ AND
join-predicate ] ...
Vertica supports only equi-joins based on a primary key-foreign key relationship between the joined tables. See Adding Primary Key and Foreign Key Constraints in the Database Administrator's Guide for more information.
Syntax
column-reference =
column-reference
Syntax
[ tablename. ] columnname
Semantics
tablename |
is one of:
|
columnname |
is the name of a column that must be unique across all the tables being used in a query |
Notes
Syntax
[ tablename. ] columnname
Semantics
tablename |
is one of:
|
columnname |
is the name of a column that must be unique across all the tables being used in a query |
Notes
Semantics
column-reference |
refers to a column of one the tables specified in the FROM clause. |
Notes
Vertica supports only equi-joins based on a primary key-foreign key relationship between the joined tables. See Adding Primary Key and Foreign Key Constraints in the Database Administrator's Guide for more information.
Syntax
column-reference =
column-reference
Syntax
[ tablename. ] columnname
Semantics
tablename |
is one of:
|
columnname |
is the name of a column that must be unique across all the tables being used in a query |
Notes
Syntax
[ tablename. ] columnname
Semantics
tablename |
is one of:
|
columnname |
is the name of a column that must be unique across all the tables being used in a query |
Notes
Semantics
column-reference |
refers to a column of one the tables specified in the FROM clause. |
Notes
[ ORDER BY table-column [ , ... ] ]
[
hash-segmentation-clause |
range-segmentation-clause
Range segmentation allows you to segment a projection based on a known range of values stored in a specific column chosen to provide even distribution of data across a set of nodes, resulting in optimal query execution.
Vertica Systems, Inc. recommends that you use
Hash Segmentation instead of range segmentation.
Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across some or all of the nodes in a cluster, resulting in optimal query execution.
Syntax
SEGMENTED BY expression
NODE node VALUES LESS THAN value

NODE node VALUES LESS THAN MAXVALUE
Semantics (Range Segmentation)
SEGMENTED BY expression |
is a single Syntax [ tablename. ] columnname Semantics
Notes
|
||||
NODE node |
a symbolic name for a node. You can use a specific node only once in any projection. For a list of the nodes in a database, use the View Database command in the The tools needed for administering a Vertica database are provided in the form of a graphical user interface that allows you to perform various tasks quickly and easily. The tools also provide a convenient way to connect to a database using vsql is the Vertica implementation of psql, a character-based, interactive, front-end that is part of PostgreSQL and used by other database management systems. It allows you to type in SQL statements and see the results. It also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks. $ /opt/vertica/bin/adminTools See the Administration Tools Reference for a complete description. |
||||
VALUES LESS THAN value |
specifies that this segment can contain a range of data values less than the specified value, except that segments cannot overlap. In other words, the minimum value of the range is determined by the value of the previous segment (if any). |
||||
MAXVALUE |
specifies a sub-range with no upper limit. In other words, it represents a value greater than the maximum value that can exist in the data. The maximum value depends on the data type of the segmentation column. |
Notes
SEGMENTED BY expression syntax allows a general SQL expression but there is no reason to use anything other than a single
column reference for range segmentation. If you wish to use a different expression, the following restrictions apply:Syntax
[ tablename. ] columnname
Semantics
tablename |
is one of:
|
columnname |
is the name of a column that must be unique across all the tables being used in a query |
Notes
constants or
column-references toa column in the SELECT list of the CREATE PROJECTION commandSyntax
[ tablename. ] columnname
Semantics
tablename |
is one of:
|
columnname |
is the name of a column that must be unique across all the tables being used in a query |
Notes
SITE node for compatibility with previous releases.SEGMENTED BY expression to be a single column-reference to a column in the projection-column list for compatibility with previous releases. This syntax is considered to be a deprecated feature and causes a warning message. See DEPRECATED syntax in the Troubleshooting Guide.Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across some or all of the nodes in a cluster, resulting in optimal query execution.
Hash segmentation is the preferred method of segmentation in Vertica 2.0 and later.
Syntax
SEGMENTED BY expression
[ ALL NODES [ OFFSET offset ] | NODES node [ ,... ] ]
Semantics
SEGMENTED BY expression |
can be a general SQL expression but there is no reason to use anything other than the built-in HASH function with one or more NOT NULL columns as arguments. Choose columns that have a large number of unique data values and acceptable skew in the data distribution. Primary key columns that meet the criteria may be an excellent choice for hash segmentation. |
ALL NODES |
automatically distributes the data evenly across all nodes at the time the |
OFFSET offset |
is an integer that specifies the node within the ordered sequence on which to start the segmentation distribution, relative to 0. See example below. |
NODES node [ ,... ] |
specifies a subset of the nodes in the cluster over which to distribute the data. You can use a specific node only once in any projection. For a list of the nodes in a database, use the View Database command in the The tools needed for administering a Vertica database are provided in the form of a graphical user interface that allows you to perform various tasks quickly and easily. The tools also provide a convenient way to connect to a database using vsql is the Vertica implementation of psql, a character-based, interactive, front-end that is part of PostgreSQL and used by other database management systems. It allows you to type in SQL statements and see the results. It also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks. $ /opt/vertica/bin/adminTools See the Administration Tools Reference for a complete description. |
Notes
Database Designer uses hash segmentation by default.The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of
superprojections to ensure K-Safety, and optionally pre-join projections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints. You can, however, write a custom projection script should the Database Designer not meet your needs.
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
SITES node for compatibility with previous releases.SEGMENTED BY expression, the following restrictions apply:
constants or
column-references to a column in the SELECT list of the CREATE PROJECTION commandSyntax
[ tablename. ] columnname
Semantics
tablename |
is one of:
|
columnname |
is the name of a column that must be unique across all the tables being used in a query |
Notes
Examples
CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES;
CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES OFFSET 1;
The example produces two hash-segmented
buddy projections that form part of a
K-Safe design. The projections can use different sort orders.
K represents the maximum number of nodes in a database that can fail and recover with no loss of data. In Vertica V2.1, the value of K can be zero (0) or one (1). The value of K can be one (1) only when the
Physical Schema design meets certain requirements. The designs generated by the Database Designer are K-Safe.
A physical schema consists of a set of projections used to store data on disk. The projections in the physical schema are based on the objects in the Logical Schema.
Buddy projections are required for K-Safety. Two projections are considered to be buddies if they contain the same columns and have the same range segmentation or hash segmentation using different node ordering. Buddy projections usually have different sort orders for query performance purposes.
| UNSEGMENTED { NODE node | ALL NODES } ]
Semantics
projection-name |
specifies the name of the projection (see note below). |
||||||||||||||||||||
projection-column |
specifies the name of a column in the projection. The data type is inferred from the corresponding column in the schema table (based on ordinal position). Different projection-column names can be used to distinguish multiple columns of the same name from different tables so that no aliases are needed. |
||||||||||||||||||||
encoding-type |
specifies the The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
|
||||||||||||||||||||
SELECT table-column |
specifies a list of schema table columns corresponding (in ordinal position) to the projection columns. |
||||||||||||||||||||
table-reference |
specifies a list of schema tables containing the columns to include in the projection in the form: table-name [ AS ] alias [ ( column-alias [ , ...] ) ] [ , ...] ] |
||||||||||||||||||||
WHERE join-predicate |
specifies foreign-key = primary-key equijoins between the fact table and dimension tables. Foreign key columns must be NOT NULL. No other predicates are allowed. |
||||||||||||||||||||
ORDER BY table-column |
specifies which columns to sort. Because all projection columns are sorted in ascending order in physical storage, CREATE PROJECTION does not allow you to specify ascending or descending. |
||||||||||||||||||||
hash-segmentation-clause |
allows you to segment a projection based on a built-in hash function that provides even distribution of data across nodes, resulting in optimal query execution. See Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across some or all of the nodes in a cluster, resulting in optimal query execution. Hash segmentation is the preferred method of segmentation in Vertica 2.0 and later. Syntax SEGMENTED BY expression [ ALL NODES [ OFFSET offset ] | NODES node [ ,... ] ] Semantics
Notes
Examples
CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES; CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES OFFSET 1; The example produces two hash-segmented K represents the maximum number of nodes in a database that can fail and recover with no loss of data. In Vertica V2.1, the value of K can be zero (0) or one (1). The value of K can be one (1) only when the A physical schema consists of a set of projections used to store data on disk. The projections in the physical schema are based on the objects in the Logical Schema. Buddy projections are required for K-Safety. Two projections are considered to be buddies if they contain the same columns and have the same range segmentation or hash segmentation using different node ordering. Buddy projections usually have different sort orders for query performance purposes. |
||||||||||||||||||||
range-segmentation-clause |
allows you to segment a projection based on a known range of values stored in a specific column chosen to provide even distribution of data across a set of nodes, resulting in optimal query execution. See Range segmentation allows you to segment a projection based on a known range of values stored in a specific column chosen to provide even distribution of data across a set of nodes, resulting in optimal query execution. Vertica Systems, Inc. recommends that you use Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across some or all of the nodes in a cluster, resulting in optimal query execution. Syntax SEGMENTED BY expression NODE node VALUES LESS THAN value NODE node VALUES LESS THAN MAXVALUE Semantics (Range Segmentation)
Notes
|
||||||||||||||||||||
NODE node |
creates an unsegmented projection on the specified node only. Dimension table projections must be UNSEGMENTED. |
||||||||||||||||||||
ALL NODES |
creates a separate unsegmented projection on each node at the time the |
Unsegmented Projection Naming
CREATE PROJECTION ... UNSEGMENTED takes a snapshot of the nodes defined at execution time to generate a node list in a predictable order. Thus, replicated projections have the name:
projection-name_node-name
For example, if the nodes are named NODE01, NODE02, and NODE03 then:
CREATE PROJECTION ABC ... UNSEGMENTED ALL NODES
creates projections named ABC_NODE01, ABC_NODE02, and ABC_NODE03.
This naming convention may impact functions that provide information about projections, for example, GET_TABLE_PROJECTIONS or GET_PROJECTION_STATUS where you will need to provide the name ABC_NODE01 instead of just ABC. To view a list of the nodes in a database, use the View Database command in the
Administration Tools.
The tools needed for administering a Vertica database are provided in the form of a graphical user interface that allows you to perform various tasks quickly and easily. The tools also provide a convenient way to connect to a database using
vsql. Always run the Administration Tools on the Administration Host if possible.
vsql is the Vertica implementation of psql, a character-based, interactive, front-end that is part of PostgreSQL and used by other database management systems. It allows you to type in SQL statements and see the results. It also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.
$ /opt/vertica/bin/adminTools
See the Administration Tools Reference for a complete description.
Notes
|
In This Section |


