hash-segmentation-clause
Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across some or all of the nodes in a cluster, resulting in optimal query execution.
Hash segmentation is the preferred method of segmentation in Vertica 2.0 and later.
Syntax
SEGMENTED BY expression
[ ALL NODES [ OFFSET offset ] | NODES node [ ,... ] ]
Semantics
SEGMENTED BY expression
|
can be a general SQL expression but there is no reason to use anything other than the built-in HASH function with one or more NOT NULL columns as arguments. Choose columns that have a large number of unique data values and acceptable skew in the data distribution. Primary key columns that meet the criteria may be an excellent choice for hash segmentation.
|
ALL NODES
|
automatically distributes the data evenly across all nodes at the time the CREATE PROJECTION statement is executed. The ordering of the nodes is fixed.
|
OFFSET offset
|
is an integer that specifies the node within the ordered sequence on which to start the segmentation distribution, relative to 0. See example below.
|
NODES node [ ,... ]
|
specifies a subset of the nodes in the cluster over which to distribute the data. You can use a specific node only once in any projection. For a list of the nodes in a database, use the View Database command in the Administration Tools.
The tools needed for administering a Vertica database are provided in the form of a graphical user interface that allows you to perform various tasks quickly and easily. The tools also provide a convenient way to connect to a database using vsql. Always run the Administration Tools on the Administration Host if possible.
vsql is the Vertica implementation of psql, a character-based, interactive, front-end that is part of PostgreSQL and used by other database management systems. It allows you to type in SQL statements and see the results. It also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.
$ /opt/vertica/bin/adminTools
See the Administration Tools Reference for a complete description.
|
Notes
- Hash segmentation is the preferred method of segmentation in Vertica 2.0 and later. The
Database Designer uses hash segmentation by default.The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of
superprojections to ensure K-Safety, and optionally pre-join projections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints. You can, however, write a custom projection script should the Database Designer not meet your needs.
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
- During INSERT or COPY to a segmented projection, if expression produces a value outside the expected range (a negative value for example), no error occurs, and the row is added to a segment of the projection.
- No OFFSET clause is equivalent to OFFSET 0.
- CREATE PROJECTION accepts the deprecated syntax
SITES
node for compatibility with previous releases. - If you wish to use a different
SEGMENTED BY
expression, the following restrictions apply:- All leaf expressions must be either
constants or
column-references to a column in the SELECT list of the CREATE PROJECTION commandSyntax
[ tablename. ] columnname
Semantics
tablename
|
is one of:
- the name of a table
- an alias for a table defined by means of a FROM clause in a query
|
columnname
|
is the name of a column that must be unique across all the tables being used in a query
|
Notes
- There are no space characters in a column reference.
- Aggregate functions are not allowed
- The expression must return the same value over the life of the database.
- The expression must return non-negative INTEGER values in the range 0 to 263.
Examples
CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES;
CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES OFFSET 1;
The example produces two hash-segmented
buddy projections that form part of a
K-Safe design. The projections can use different sort orders.
K represents the maximum number of nodes in a database that can fail and recover with no loss of data. In Vertica V2.1, the value of K can be zero (0) or one (1). The value of K can be one (1) only when the
Physical Schema design meets certain requirements. The designs generated by the Database Designer are K-Safe.
A physical schema consists of a set of projections used to store data on disk. The projections in the physical schema are based on the objects in the Logical Schema.
Buddy projections are required for K-Safety. Two projections are considered to be buddies if they contain the same columns and have the same range segmentation or hash segmentation using different node ordering. Buddy projections usually have different sort orders for query performance purposes.