SQL Overview
SQL (Structured Query Language) is a widely-used, industry standard data definition and data manipulation language for relational databases.
Vertica Support for ANSI SQL Standards
Vertica SQL supports a subset of ANSI SQL 99. Over time, Vertica SQL will enlarge and eventually converge with ANSI SQL 99. For information about ANSI SQL 99 see:
Vertica Use of PostgreSQL
In addition to using
vsql as an interactive front-end, Vertica SQL supports a subset of the PostgreSQL language definition. For information about PostgreSQL see:
vsql is the Vertica implementation of psql, a character-based, interactive, front-end that is part of PostgreSQL and used by other database management systems. It allows you to type in SQL statements and see the results. It also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.
Vertica Major Extensions to SQL
Vertica provides several extensions to SQL that allow you to use the unique aspects of its column store architecture:
- AT EPOCH LATEST SELECT...
runs a SQL query in
snapshot isolation mode in which it does not hold locks or block other processes such as data loads.
Vertica can run any SQL query in snapshot isolation mode in order to obtain the fastest possible execution. To be precise, snapshot isolation mode is actually a form of
historical query. The syntax is:
Vertica can execute a query from a snapshot of the database taken at a specific date and time. The syntax is:
AT TIME 'timestamp' SELECT...
The command queries all data in the database up to and including the
epoch representing the specified date and time without holding a lock or blocking write operations.
An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.
AT EPOCH LATEST SELECT...
The command queries all data in the database up to but not including the current
epoch without holding a lock or blocking write operations. This may cause the query to miss tuples loaded by other users up to (but no more than) a specific number of minutes before execution.
An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.
- AT TIME 'timestamp' SELECT...
runs
historical queries against a snapshot of the database a specific date and time.
Vertica can execute a query from a snapshot of the database taken at a specific date and time. The syntax is:
AT TIME 'timestamp' SELECT...
The command queries all data in the database up to and including the
epoch representing the specified date and time without holding a lock or blocking write operations.
An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.
- CONSTRAINT ... CORRELATION (column) REFERENCES (column)
captures
Functional Dependencies that can be used by the
Database Designer to produce more efficient projections.
The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of
superprojections to ensure K-Safety, and optionally pre-join projections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints. You can, however, write a custom projection script should the Database Designer not meet your needs.
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
A functional dependency is a relationship between two sets of column values in a table where one column value can determine the other. It represents knowledge of the data that cannot otherwise be expressed in the logical schema. In Vertica functional dependencies are expressed as CORRELATION constraints and are used by the Database Designer to produce more efficient physical schema designs.
- COPY
is used for bulk loading data. It reads data from a text file and inserts
tuples into the
WOS (Write Optimized Store) or directly into the
ROS (Read Optimized Store).
The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of
compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The WOS (Write Optimized Store) is a memory-resident data structure into which INSERT, UPDATE, DELETE, and COPY (without DIRECT hint) actions are recorded. Like the
ROS, the WOS is arranged by projection but it stores tuples without sorting,
compression, or indexing and thus supports very fast load speeds. The WOS organizes data by epoch and holds uncommitted transaction data.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
The ROS (Read Optimized Store) is a highly optimized, read-oriented, physical storage structure that is organized by projection and that makes heavy use of
compression and indexing. You can use the COPY...DIRECT and INSERT (with direct hint) statements to load data directly into the ROS.
Compression is the process of transforming data into a more compact format. Compressed data cannot be directly processed; it must first be decompressed. Vertica uses integer packing for unencoded integers and
LZO for compressible data. Although compression is generally considered to be a form of encoding, the terms have different meanings in Vertica.
LZO is an abbreviation for Lempel-Ziv-Oberhumer. It is a data compression algorithm that is focused on decompression speed. The algorithm is lossless and the reference implementation is thread safe.
A tuple is a collection of data values corresponding conceptually to a row of a table, projection, or result set. A tuple in Vertica does not have the same physical representation as in a traditional RDBMS; it is a virtual entity that may have entirely different storage representations.
- CREATE/DROP/ALTER PROJECTION
are used for manipulating
projections as described in the Database Administrator's Guide (Advanced). CREATE PROJECTION commands are generated for you by the
Database Designer as described in the Database Administrator's Guide.
The Database Designer is a tool that analyzes a logical schema definition, sample queries, and sample data and generates a set of projections in the form of an SQL script to be executed after you create the tables but before you load any data. The script creates a minimal set of
superprojections to ensure K-Safety, and optionally pre-join projections. In most cases, the projections created by the Database Designer provide excellent query performance within physical constraints. You can, however, write a custom projection script should the Database Designer not meet your needs.
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
A projection is a special case of a
materialized view that provides physical storage for data. A projection can contain some or all of the columns of one or more tables. A projection that contains all of the columns of a table is called a
superprojection. A projection that joins one or more tables is called a pre-join projection. Most projections are used for ad-hoc query processing and K-safety but it is possible to have query-specific projections.
A superprojection is a projection that contains every column of a table in the Logical Schema. A table can have multiple superprojections with different sort orders.
A materialized view is similar to a standard SQL view with one major exception: the data is actually stored on disk rather than computed each time the view is used in a query. A materialized view, then, must be refreshed whenever the data in the underlying tables is changed. A projection is a special case of a materialized view.
- SELECT Vertica Function
executes special Vertica functions.
- SET DATESTYLE
chooses the format in which date/time values are displayed.
- SET SEARCH_PATH
specifies the order in which Vertica searches through multiple
schemas when a SQL statement contains an unqualified table name.
Schema has several related meanings in Vertica:
- SET TIMEZONE
specifies the TIMEZONE run-time parameter for the current
session.
A SQL session is an occurrence of a user interacting with a database through the use of SQL statements. A session can be invoked using
vsql or a
JDBC application. In Vertica, the scope of a session is the same as that of a connection.
JDBC (Java Database Connectivity) is a call-level API that provides connectivity between Java programs and data sources (SQL databases and other non-relational data sources, such as spreadsheets or flat files). JDBC is included in the Java 2 standard and enterprise editions.
vsql is the Vertica implementation of psql, a character-based, interactive, front-end that is part of PostgreSQL and used by other database management systems. It allows you to type in SQL statements and see the results. It also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.
- SHOW
displays run-time parameters for the current session.
Support for Historical Queries
Unlike most databases, the DELETE command in Vertica does not actually delete data; it simply marks tuples as deleted. The UPDATE command actually does an INSERT and a DELETE. This behavior is necessary for
historical queries. You can control how much historical data is stored on disk.
Vertica can execute a query from a snapshot of the database taken at a specific date and time. The syntax is:
AT TIME 'timestamp' SELECT...
The command queries all data in the database up to and including the
epoch representing the specified date and time without holding a lock or blocking write operations.
An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.
Non-Standard Syntax and Semantics
In case of non-standard SQL syntax or semantics, Vertica SQL follows Oracle whenever possible. For Oracle SQL documentation, you will need a web account to access the library:
Joins
Vertica supports:
- only star and snowflake joins based on fact and dimension tables
- standard inner equi-joins in the WHERE clause
- standard INNER JOIN and OUTER JOIN syntax (with certain restrictions)
Vertica does not support:
- self-joins (use temporary tables for this purpose)
Transactions
Vertica supports conventional SQL transactions with standard ACID properties. Specifically:
- Vertica supports ANSI SQL 92 style implicit transactions. You do not need to execute a BEGIN or START TRANSACTION command.
- Vertica does not use a redo/undo log or two-phase commit.
- The COPY command automatically commits itself and any current transaction. Vertica recommends that you COMMIT or ROLLBACK the current transaction before using COPY.
Session-Scoped Isolation Levels
Vertica supports a subset of the standard SQL isolation levels and access modes for a user
session as described in SERIALIZABLE Isolation and READ COMMITTED Isolation. These modes determine what data a transaction can access when other transactions are running concurrently. You can change the default isolation level for a user session using the SET SESSION CHARACTERISTICS command.
A SQL session is an occurrence of a user interacting with a database through the use of SQL statements. A session can be invoked using
vsql or a
JDBC application. In Vertica, the scope of a session is the same as that of a connection.
JDBC (Java Database Connectivity) is a call-level API that provides connectivity between Java programs and data sources (SQL databases and other non-relational data sources, such as spreadsheets or flat files). JDBC is included in the Java 2 standard and enterprise editions.
vsql is the Vertica implementation of psql, a character-based, interactive, front-end that is part of PostgreSQL and used by other database management systems. It allows you to type in SQL statements and see the results. It also provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.
Session-scoped isolation levels do not apply to temporary tables, which are scoped to the current transaction. They do not take table locks, are only visible to one user, and their contents are always visible regardless of advancing epochs and COMMITs.
Query-Scoped Isolation Levels
Snapshot Isolation (
historical queries) are part of the Vertica query syntax:
Vertica can execute a query from a snapshot of the database taken at a specific date and time. The syntax is:
AT TIME 'timestamp' SELECT...
The command queries all data in the database up to and including the
epoch representing the specified date and time without holding a lock or blocking write operations.
An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.
[ AT EPOCH LATEST ] | [ AT TIME 'timestamp' ] SELECT ...
AT EPOCH LATEST allows queries to access all historical data up to but not including the current
epoch . It does not hold locks and does not block write operations. This provides a profound query performance advantage. It does not apply to temporary tables.
An epoch represents committed changes to the data stored in a database between two specific points in time. In other words, an epoch contains all COPY, INSERT, UPDATE, and DELETE operations that have been executed and committed since the end of the previous epoch.
Automatic Rollback
When an error occurs or a user session is disconnected, the current transaction automatically rolls back. This behavior may be different from other databases.