Projections
Vertica's physical storage consists of collections of table columns called projections. A projection is a special case of a materialized view. A materialized view is similar to a standard SQL view with one major exception: the result set is stored on disk rather than computed each time the view is used in a query. The result set is automatically refreshed whenever data values are changed.
Query Performance
Because projections are stored in sort order, a single disk read can fetch many rows at once. Disks read data extremely quickly once positioned, but can take many milliseconds to seek and fetch a single row. For example, if the WHERE clause of a query is (X=1 AND Y=2) and a projection is already sorted on (X,Y), the query runs almost instantaneously.
Another important performance factor is data
compression. Sorted data contains repeated sequences that compress very well using, for example, run-length encoding. Compression also makes multiple projections fit within your disk space budget, and thus more extremely fast queries.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
Creating Projections
Vertica's
Database Designer tool (discussed in a section to follow):
The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of
superprojections to ensure K-Safety, and optionally pre-join projections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints. You can, however, write a custom projection script should the Database Designer not meet your needs.
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
- analyzes your
logical schema, sample data, and sample queriesA logical schema consists of a set of tables and referential integrity constraints in a Vertica database. The objects in the logical schema are visible to SQL users. The logical schema does not include projections, which make up the physical schema.
- creates a
physical schema (projections) in the form of an SQL script that you executeA physical schema consists of a set of projections used to store data on disk. The projections in the physical schema are based on the objects in the Logical Schema.
The query performance of the physical schema produced by the Database Designer depends on the items you provide. For example, if your logical schema includes
dimension tables, your sample queries must include at least a minimal set of joins.
A dimension table (sometimes called a lookup or reference table) is one of a set of companion tables to a fact table in a star schema. It contains the PRIMARY KEY column corresponding to the join columns in fact tables. For example, a business might use a dimension table to contain item codes and descriptions.
Dimension tables can be connected to other dimension tables to form a hierarchy of dimensions in a snowflake schema.
Alternatively, you can write your own custom projections as described in the Database Administrator's Guide (Advanced). This option is intended for advanced users only and should be used as advised by
Technical Support.
To submit problem reports, questions, comments, and suggestions, please use the Technical Support page on the Vertica Systems, Inc. Web site:
http://www.vertica.com/support
You must be a registered user in order to access the page.
Before reporting a problem, please run the Diagnostics Utility described in the Troubleshooting Guide and attach the resulting .zip file.
Superprojections
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
Pre-Join Projections
A pre-join projection stores the result set obtained by joining a single fact table in the logical schema (the anchor table) with one or more dimension tables. The result set is typically sorted for optimal query execution performance. Most query-specific projections are pre-joins.
The anchor table of a join can be:
The diagram below shows a pre-join projection consisting of four columns, one from the fact table and three from dimension tables.
.png)