Book Contents

Book Index

Next Topic

Home

Projection Segmentation

To optimize query performance, projections can be segmented (distributed evenly across multiple database nodes). The term segment refers to the portion of a projection stored at a particular node. A node can store a collection of segments of various projections.

Dimension tables are assumed to be much smaller than the fact table and are not segmented; they are replicated on multiple nodes. The system catalogs are also stored redundantly at all nodes; they are small and are updated infrequently.

Segmentation is done using the following methods:

Hash Segmentation

Hash segmentation is the preferred method of segmentation. It allows you to segment a projection based on a built-in hash function that provides even distribution of data across multiple nodes, resulting in optimal query execution. A hash function is a reproducible method of converting data into a number that can serve as a digital "fingerprint" of the data. In a projection, the data to be hashed consists of one or more column values, each having a large number of unique values and an acceptable amount of skew in the value distribution. Primary key columns that meet the criteria may be an excellent choice for hash segmentation.

Hash segmentation is used in segmented projections produced by the Database Designer.

Range Segmentation

Range segmentation consists of designating one column of a projection as the segmentation column. The range of possible values of a segmentation column is decomposed into disjoint sub-ranges which are allocated to specific nodes. 

Range segmentation based on dates and/or timestamps is not recommended because it results in a skewed data distribution. For example, suppose you segment by year using a DATE column. All rows inserted within the current year would loaded into the same segments on specific nodes. Thus, a query asking for rows inserted during any specific year would be executed only on those nodes. The rest of the nodes would be idle.