Book Contents

Book Index

Next Topic

Home

Understanding the Automatic Tuple Mover

The tuple mover is the component of Vertica that moves the contents of the Write Optimized Store (WOS) into the Read Optimized Store (ROS). This data movement is known as a moveout. Normally, the tuple mover runs automatically in the background at preset intervals and is referred to as the ATM.

The tuple mover actually performs three different operations across all nodes:

Each of these operations occur at different intervals. The most frequent is advance epoch, followed by moveout, and lastly mergeout.

Advance Epoch

The logical structure of the WOS is a series of epochs. An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.

Advancing the epoch closes the current epoch and opens a new current epoch. The closed epoch contains only committed data; uncommitted data moves into the new current epoch. Epoch numbering begins at zero and increments by one every time an advance epoch occurs. The numbering continues throughout the life of the database. Thus, the epoch number can become quite large.

The automatic tuple mover advances the epoch periodically.

Moveout

Moveout moves all epochs other than the current epoch from the WOS into a new ROS container. It can be thought of as "flushing" all historical data from the WOS to the ROS. The illustration below shows the effect of a moveout of a projection on a single node:

Moveout

ROS Containers

ROS containers are subsets of the Read Optimized Store (ROS) that are created as the result of changes to the data stored within a projection as a result of bulk loads and DML. The Tuple Mover periodically merges ROS containers in order to maximize performance. A segmented projection can be temporarily stored within several ROS containers on any node at any moment but never fewer than one.

There is not necessarily a one-to-one correspondence between ROS containers and projection segments. For example, consider this projection:

CREATE PROJECTION P1 (A, B, C, D) AS

SELECT A, B, C, D

FROM T1

SEGMENTED BY D

NODE S1 VALUES LESS THAN 5

NODE S2 VALUES LESS THAN MAXVALUE;

Inserting a tuple with a segmentation column value of 9 creates a new ROS container on node S2 but not on node S1.

Mergeout

Mergeout is the process of consolidating ROS containers. Over time, the number of ROS containers will increase to the point at which it becomes necessary to merge some of them in order to avoid performance degradation. At that point, the tuple mover performs an automatic mergeout, which combines two or more ROS containers into a single container. It can be thought of as "defragmenting" the ROS.

The illustration below shows the effect of a mergeout of a projection on a single node:

Mergeout

Analyze Statistics

The tuple mover collects and aggregates data samples and storage information from all nodes on which a projection is stored, then writes statistics into the catalog so that they can be used by the query optimizer. Without these statistics, the query optimizer would assume uniform distribution of data values and equal storage usage for all projections.