Understanding the Automatic Tuple Mover

The tuple mover is the component of Vertica that moves the contents of the

Read Optimized Store (ROS). This data movement is known as a moveout. Normally, the tuple mover runs automatically in the background at preset intervals and is referred to as the ATM.

The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.

The WOS (Write Optimized Store) is a memory-resident data structure into which INSERT, UPDATE, DELETE, and COPY (without DIRECT hint) actions are recorded. Like the ROS, the WOS is arranged by projection but it stores tuples without sorting, compression, or indexing and thus supports very fast load speeds. The WOS organizes data by epoch and holds uncommitted transaction data.

Each of these operations occur at different intervals. The most frequent is advance epoch, followed by moveout, and lastly mergeout.

The logical structure of the WOS is a series of epochs. An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.

Advancing the epoch closes the current epoch and opens a new current epoch. The closed epoch contains only committed data; uncommitted data moves into the new current epoch. Epoch numbering begins at zero and increments by one every time an advance epoch occurs. The numbering continues throughout the life of the database. Thus, the epoch number can become quite large.

Moveout moves all epochs other than the current epoch from the WOS into a new

ROS container. It can be thought of as "flushing" all historical data from the WOS to the ROS. The illustration below shows the effect of a moveout of a projection on a single node:

ROS containers are subsets of the Read Optimized Store (ROS) that are created as the result of changes to the data stored within a projection as a result of bulk loads and DML. The Tuple Mover periodically merges ROS containers in order to maximize performance. A segmented projection can be temporarily stored within several ROS containers on any node at any moment but never fewer than one.

The tuple mover is the component of Vertica that moves the contents of the Write Optimized Store (WOS) into the Read Optimized Store (ROS). This data movement is known as a moveout. Normally, the tuple mover runs automatically in the background at preset intervals and is referred to as the ATM.

ROS containers are subsets of the

Read Optimized Store (ROS) that are created as the result of changes to the data stored within a projection as a result of bulk loads and DML. The

Tuple Mover periodically merges ROS containers in order to maximize performance. A

segmented projection can be temporarily stored within several ROS containers on any node at any moment but never fewer than one.

Segmentation is the horizontal partitioning of a projection so that it can be stored on multiple nodes. The goal is to distribute physical data storage evenly across a database so that all nodes can participate in query execution. See also:

There is not necessarily a one-to-one correspondence between ROS containers and projection segments. For example, consider this projection:

Inserting a tuple with a segmentation column value of 9 creates a new ROS container on node S2 but not on node S1.

Mergeout is the process of consolidating

ROS containers. Over time, the number of

ROS containers will increase to the point at which it becomes necessary to merge some of them in order to avoid performance degradation. At that point, the tuple mover performs an automatic mergeout, which combines two or more ROS containers into a single container. It can be thought of as "defragmenting" the ROS.

The illustration below shows the effect of a mergeout of a projection on a single node:

The tuple mover collects and aggregates data samples and storage information from all nodes on which a projection is stored, then writes statistics into the catalog so that they can be used by the query optimizer. Without these statistics, the query optimizer would assume uniform distribution of data values and equal storage usage for all projections.