Purge Deleted Data
Vertica 2.1.GA provides the ability to control how much historical data is retained in the physical storage used by your database.
By default, Vertica never purges historical data. The reason for this behavior is that unlike most databases, the DELETE command in Vertica does not actually delete data from disk storage; it simply marks tuples as deleted so that they remain available to
historical queries. This also applies to the UPDATE command, which is actually a combined INSERT and DELETE.
Vertica can execute a query from a snapshot of the database taken at a specific date and time. The syntax is:
AT TIME 'timestamp' SELECT...
The command queries all data in the database up to and including the
epoch representing the specified date and time without holding a lock or blocking write operations.
An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.
In Vertica 2.1.GA, you purge historical data that was deleted prior to a specific point in time called the Ancient History Marker (AHM). The Ancient History Mark (AHM), is the
epoch prior to which historical data can be purged from physical storage.
An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.
When does purge happen?
There are two ways to control how much historical data is stored on disk.
This part is just a placeholder. It will probably have different syntax:
- DELETE /*+direct*/ FROM...
operates directly on the ROS in the same manner as COPY and INSERT.
- DELETE /*+purge*/ FROM...
purges specific historical data from disk.
Are there symmetric UPDATEs as well?
- SET_AHM_TIME(timestamp)
specifies a point in time prior to which history need not be retained on disk (using the timestamp format set by ???)
Check format (TZ variable,
- SET_AHM_EPOCH(epoch)
specifies an epoch prior to which history need not be retained on disk.
- SET_HISTORY_RETENTION_INTERVAL(interval)
specifies how much history to retain on disk using interval format ???; the oldest history is purged continuously as time moves on.
Check format.
None of these commands initiates disk activity in real time; it occurs during
mergeouts.
Mergeout is the process of consolidating
ROS containers.
ROS containers are subsets of the
Read Optimized Store (ROS) that are created as the result of changes to the data stored within a projection as a result of bulk loads and DML. The
Tuple Mover periodically merges ROS containers in order to maximize performance. A
segmented projection can be temporarily stored within several ROS containers on any node at any moment but never fewer than one.
Segmentation is the horizontal partitioning of a projection so that it can be stored on multiple nodes. The goal is to distribute physical data storage evenly across a database so that all nodes can participate in query execution. See also:
The tuple mover is the component of Vertica that moves the contents of the
Write Optimized Store (WOS) into the
Read Optimized Store (ROS). This data movement is known as a moveout. Normally, the tuple mover runs automatically in the background at preset intervals and is referred to as the ATM.
The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of
compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The WOS (Write Optimized Store) is a memory-resident data structure into which INSERT, UPDATE, DELETE, and COPY (without DIRECT hint) actions are recorded. Like the
ROS, the WOS is arranged by projection but it stores tuples without sorting,
compression, or indexing and thus supports very fast load speeds. The WOS organizes data by epoch and holds uncommitted transaction data.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of
compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of
compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
For More Information