Using Outer Joins
INFORMATION FROM RELEASE NOTES
Vertica now supports queries that use the ANSI standard INNER JOIN and OUTER JOIN syntax. A query that uses INNER JOIN syntax in the FROM clause produces the same result set as a query that uses the WHERE clause to state the join-predicate. For example, these queries are semantically identical:
SELECT COUNT(*)
FROM Fact_Table INNER JOIN Dim_Table
ON Fact_Table.Product_Key = Dim_Table.Product_Key;
produces the same result set as:
SELECT COUNT(*)
FROM Fact_Table, Dim_Table
WHERE Fact_Table.Product_Key = Dim_Table.Product_Key;
Question for reviewers: does the INNER JOIN syntax require a FOREIGN KEY constraint in the fact table? If not, then it's different from a WHERE clause join and this section should probably be retitled to something like "ANSI Standard Joined Tables."
However, a query that uses OUTER JOIN syntax in the FROM clause returns all rows for which the join-predicate is true, plus one copy of each row in the outer table for which the join-predicate is false. Thus, the result set contains all outer table rows.
When the join-predicate is false, the inner table columns named in the select list are given null values. This includes inner table columns whose constraints would otherwise prevent them from containing null values.
Vertica Restrictions on Outer Joins
- In Vertica the anchor table must be the outer table in an outer join. For example, both of the following queries are valid and produce identical result sets:
Question for reviewers: Is "anchor table" or "fact table" the right term to use? In other words, is it necessary to generalize the rule about which table is the outer table in order to cover snowflake joins in which a dimension (unsegmented) table functions as an anchor table?
SELECT COUNT(*)
FROM Fact_Table LEFT OUTER JOIN Dim_Table
ON Fact_Table.Product_Key = Dim_Table.Product_Key
SELECT COUNT(*)
FROM Dim_Table RIGHT OUTER JOIN Fact_Table
ON Dim_Table.Product_Key = Fact_Table.Product_Key
- In Vertica outer join queries require a
logical schema definition in which there exists:A logical schema consists of a set of tables and referential integrity constraints in a Vertica database. The objects in the logical schema are visible to SQL users. The logical schema does not include projections, which make up the physical schema.
- a PRIMARY KEY constraint on the dimension table column used in the join predicate
- no FOREIGN KEY constraint on the fact table column used in the join predicate
If there is a FOREIGN key constraint on the fact table column, an outer join query produces the same result set as an inner join query. In other words, the result set contains no null-extended rows.
Notes
- Queries can have any number of INNER JOIN or OUTER JOIN clauses. For example:
SELECT ...
FROM T1 INNER JOIN (T2 LEFT OUTER JOIN (T3 LEFT OUTER JOIN ...
- If an INNER JOIN or OUTER JOIN query contains a WHERE clause, the WHERE clause cannot contain a join-predicate.
Question for reviewers: what, if anything, should be said about INNER JOIN and OUTER JOIN queries, pre-join projections, and query performance?
Example
By adding an IS NULL predicate to a dimension table's PRIMARY KEY column in an outer join, it is possible to find data that represents problems. For example, this query finds telephone calls that are not associated with a carrier:
SELECT Call_date,
Call_duration,
Carrier_name
FROM Call_data_records Cdr
LEFT OUTER JOIN Carriers
ON Cdr.Carrier_id = Carriers.Carrier_id
WHERE Carriers.Carrier_id IS NULL;
For More Information
Using Outer Joins on Existing Databases
The ALTER TABLE statement allows you to drop a FOREIGN KEY
table-constraint from a table that does not have any associated projections. You can then insert rows that would otherwise violate the constraint and use outer join queries to find them.
Adds a join constraint or a constraint describing a
functional dependency to the metadata of a table. See Adding Constraints in the Database Administrator's Guide.
A functional dependency is a relationship between two sets of column values in a table where one column value can determine the other. It represents knowledge of the data that cannot otherwise be expressed in the logical schema. In Vertica functional dependencies are expressed as CORRELATION constraints and are used by the Database Designer to produce more efficient physical schema designs.
Syntax
[ CONSTRAINT constraint_name ]
{ PRIMARY KEY ( column [ , ... ] )
| FOREIGN KEY ( column [ , ... ] )
REFERENCES table [ ( column [ , ... ] ) ]
| CORRELATION ( column1 ) DETERMINES ( column2 )
}
Semantics
CONSTRAINT constraint-name
|
optionally assigns a name to the constraint. Vertica recommends that you name all constraints.
|
PRIMARY KEY
( column [ , ... ] )
|
adds a referential integrity constraint defining one or more NOT NULL numeric columns as the primary key.
Referential integrity in Vertica consists of a set of constraints (logical schema objects) that define primary key and foreign key columns. In a star schema or snowflake schema:
- Each dimension table must have a PRIMARY KEY constraint.
- The fact table must contain columns that can be used to join the fact table to dimension tables.
- Fact table join columns must have FOREIGN KEY constraints in order to participate in pre-join projections.
- Outer join queries produce expected results only when the fact table join column used in the query does not have a FOREIGN KEY constraint.
|
FOREIGN KEY
( column [ , ... ] )
|
adds a referential integrity constraint defining one or more NOT NULL numeric columns as a foreign key.
Referential integrity in Vertica consists of a set of constraints (logical schema objects) that define primary key and foreign key columns. In a star schema or snowflake schema:
- Each dimension table must have a PRIMARY KEY constraint.
- The fact table must contain columns that can be used to join the fact table to dimension tables.
- Fact table join columns must have FOREIGN KEY constraints in order to participate in pre-join projections.
- Outer join queries produce expected results only when the fact table join column used in the query does not have a FOREIGN KEY constraint.
|
REFERENCES
table [ ( column [ , ... ] )
|
specifies the table to which the FOREIGN KEY constraint applies. If column is omitted, the default is the primary key of table.
|
CORRELATION
|
describes a functional dependency. Given a tuple and the set of values in column1, one can determine the corresponding value of column2.
A functional dependency is a relationship between two sets of column values in a table where one column value can determine the other. It represents knowledge of the data that cannot otherwise be expressed in the logical schema. In Vertica functional dependencies are expressed as CORRELATION constraints and are used by the Database Designer to produce more efficient physical schema designs.
|
Notes
- Use the ALTER TABLE command to add a table constraint. The CREATE TABLE statement does not allow table constraints.
- You must define primary key and foreign key constraints in all tables that participate in inner joins. See Adding Join Constraints.
Examples
CORRELATION (Product_Description) DETERMINES (Category_Description)
The Retail Sales Example Database described in the Quick Start contains a table
Product_Dimension in which products have descriptions and categories. For example, the description "Seafood Product 1" exists only in the "Seafood" category. You can define several similar correlations between columns in the Product Dimension table.
The Product Dimension table describes all products sold by the grocery chain since its beginning. Typically, individual stores only carry a subset of the products. The generated data file contains data for 60,000 products by default.
Column Name
|
Data Type
|
Example
|
Product_Key
|
integer
|
1
|
Product_Description
|
varchar(128)
|
Seafood Product 1
|
SKU_Number
|
char(32)
|
SKU-#1
|
Category_Description
|
char(32)
|
Food
|
Department_Description
|
char(32)
|
Seafood
|
Package_Type_Description
|
char(32)
|
Box
|
Package_Size
|
char(32)
|
18 ox
|
Fat_Content
|
integer
|
89
|
Diet_Type
|
char(32)
|
South Beach
|
Weight
|
integer
|
50
|
Weight_Units_of_Measure
|
char(32)
|
gram
|
Shelf_Width
|
integer
|
2
|
Shelf_Height
|
integer
|
4
|
Shelf_Depth
|
integer
|
4
|