Book Contents

Book Index

Next Topic

Home

Failure Recovery

Recovery is the process of restoring the database to a fully-functional state after one or more nodes in the system has experienced a software or hardware related failure. Vertica has a unique approach to recovering a node that is based on querying replicas of the data stored on other nodes. For example, a hardware failure may cause a node to lose database objects or to miss changes made to the database (INSERTs, UPDATEs, etc.) while offline. When the node comes back on line, it recovers lost objects and catches up with changes by querying the other nodes.

K represents the maximum number of nodes in a database that can fail and recover with no loss of data. In Vertica V2.1, the value of K can be zero (0) or one (1). The value of K can be one (1) only when the Physical Schema design meets certain requirements. The designs generated by the Database Designer are K-Safe.

For a database to be safe, no more than K nodes can be down, as shown below.

Nodes

Nodes DOWN

State of Database

4

0

Safe.

4

1

Precarious. Data loss can occur.

4

2

Inoperative. Automatic shutdown.

Automatic Recovery

Automatic recovery handles startup after a single node failure without intervention. For example, when a failed node comes back on line and rejoins the database, it recovers its lost data by querying the other nodes, as long as there are enough active nodes to ensure K-Safety. Transactions can continue to commit during the recovery process, except for a short period at the end of the recovery.

In the case of multiple node failures, automatic recovery shuts down the database. When you restart the database, automatic recovery attempts to start all the nodes and to recover objects on any node that lost objects as a result of a failure. If successful, View Database Cluster State shows all nodes UP. Otherwise, automatic recovery writes out information that can be used for manual recovery and then shuts down again.

Manual Recovery

When automatic recovery fails to restart the database, use the Start Database command and follow the built-in manual recovery steps. The database is not available for connections until manual recovery is complete.

Administrator Specified Recovery

Situations not handled by the built-in manual recovery steps are documented in both the Database Administrator's Guide and the Troubleshooting Guide for convenience. These are classified as:

Tools for correcting these situations are described in the Advanced section of the Administration Tools.

In This Section

Shutdown Problems

Startup Problems