Back to Research Page

Pankaj Mehta




Join Strategies

Teradata supports a number of Join strategies. The join strategy for a query is chosen by the optimizer at compile time. The optimizer chooses by evaluating the relative cost of each possible strategy and choosing the best. Most Teradata joins operate on two tables at a time. The optimizer builds a query plan out of successive two table joins until the result relation has been built. All joins are fully parallelized.

The optimizer has three major joins to choose from:

  1. Hash Merge Join
    The Hash Merge join operates on two relations sorted on the hashed join fields and matches hashes to produce the result. If one or both relations participating in the Hash Merge join are not already hashed, distributed and sorted on the join fields, t hey will be prepared such that they are and that cost will be taken into account by the optimizer.

    Any time rows of data need to be brought together from different VPROCs and nodes in the system, hashing is the tool relied upon to divide up and to balance the work. The join column values are run through the hashing algorithm and the value of the has h bucket that comes out of that process tells the system which VPROC will manage the actual join of those particular rows. That way all the VPROCs in the system help perform these joins, each doing a subset of the work.

  2. Nested Message Join
    The Nested Message join retrieves a row form one table and sends it to the node where the row it is to be joined to exists. The join is completed there and the result returned.

  3. The Standard Product Join.

There are many variants available to the Teradata optimizer. It may join directly to an index sub-table rather than to the base table if all the fields required are present in the index. It may join to an index and extract rowids from the index table, placing them into a spool table to later be sorted and joined back to the base table to extract other rows.

Different Joins Available in Teradata

  • A set of join operators specific to OUTER JOIN exists, one for each type of OUTER JOIN. Each of these operators can operate in the major join modes listed above.

  • A set of exclusion join operators exist to implement the INTERSECT and MINUS set operations as well as certain subquery operations.

  • A special join operator exists to perform Star Joins (see below). Called Hash Star Join, it performs cross product joins in a single execution step (patent pending). The optimizer identifies Star and Snowflake joins, generat ing this special high performance Star Join step in the right place in the overall plan to maximize the performance of the query.

Teradata’s Star Join

Teradata implements a star join with minimal data movement between VPROCs by only moving rows from the smaller dimension tables. The rows from the much larger fact table remain in place, local to their VPROC. For this to work, the primary index of the large fact table (the BIG table in the diagram below) must be specified as a composite of the primary index columns of all of the dimension tables (Weeks, Stores, and Items). In the example, the primary index of the BIG table would be (B_Itemkey,B_StoreID,B-WeekID).

Following steps are carried out in Star Joins:

  • Duplication (broadcasting) all the rows from one of the smaller dimension tables (Stores, for example) to all VPROCs in the system, after selection has been applied.

  • A product join is then perfomed between these Store rows and all of the selected Weeks rows in parallel across all VPROCs in the system.

  • The result of the Weeks-Stores join is then itself duplicated.

  • A second product join is then done between selected rows from the Item table and the temporary file containing Weeks-Stores rows.

  • Once all the selected rows from the dimension tables are joined, the combination of their three primary index columns is hashed as if it were one value. Each row of Weeks-Stores-Items hash to the VPROC that owns the matching BIG table row.

  • Finally, a merge join is then performed without having to relocate any of the millions or billions of rows from the BIG table.Each row of Weeks-Stores-Items hash to the VPROC that owns the matching BIG table row.

The Teradata optimizer will automatically consider a star join if the physical database has been designed based on a star schema.