Hash Segmentation
Hash segmentation is the preferred method of segmentation in Vertica 2.0 and later. Refer to the CREATE PROJECTION command in the SQL Reference Manual for detailed information about using hash segmentation in a projection.
Hash segmentation allows you to segment a projection based on a built-in hash function that provides even distribution of data across some or all of the nodes in a cluster, resulting in optimal query execution. To use hash segmentation, simply choose one or more column values to use as input parameters to the built-in
Hash Function (see HASH in the SQL Reference Manual for more information).
A hash function is a reproducible method of converting data into a number that can serve as a digital "fingerprint" of the data. In Vertica, the built-in HASH function is used in segmentation to evenly distribute data over a cluster of nodes.
Hash segmentation's ALL NODES
... OFFSET
syntax provides an easy way to create the
buddy projections that form part of a
K-Safe design. For example:
K represents the maximum number of nodes in a database that can fail and recover with no loss of data. In Vertica V2.1, the value of K can be zero (0) or one (1). The value of K can be one (1) only when the
Physical Schema design meets certain requirements. The designs generated by the Database Designer are K-Safe.
A physical schema consists of a set of projections used to store data on disk. The projections in the physical schema are based on the objects in the Logical Schema.
Buddy projections are required for K-Safety. Two projections are considered to be buddies if they contain the same columns and have the same range segmentation or hash segmentation using different node ordering. Buddy projections usually have different sort orders for query performance purposes.
CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES;
CREATE PROJECTION ... SEGMENTED BY HASH(C1,C2,...) ALL NODES OFFSET 1;
produces two hash-segmented buddy projections. The projections can use different sort orders.