Pankaj Mehta |
Query PerformanceHash algorithms used in teradata for hash joins have been the key to high performance of the Teradata DBMS. For queries that must join large sets and cannot utilize any indexes however, hash joins will improve the performance by up to 40%. Teradata’s cost based optimizer will automatically include hash joins as an option and will pick the best alternative for the situation. | |
Temporary Tables:In order to perform the complex questions that the users are asking, today's query tools must store temporary results from multiple queries into working tables. Storing these working tables as permanent tables carries overhead that reduces overall performance for the end user. Temporary tables are the answer.For example, consider the following query;
Select e.first_name,d.department_name,ep.phone The above query requires the join of three different tables. A traditional database such as MySQL, Informix, DB2 joins two tables at a time and a temporary table is created. This temporary table looses all the indexing information and hence a join with the third table is costly. Teradata supports two types of temporary tables:
Both types of temporary tables vanish automatically upon session logoff, again with no DDL or cleanup required. DBAs will love these temporary tables since they get their space from the global temporary space, automatically, requiring no space allocations at any level. Both types of temporary tables offer optimization above and beyond the removal of the interaction with the data dictionary. Since temporary tables are local to the user no locking or logging is required, removing all overhead from data manipulation language (DML) operations against the table (e.g. INSERT SELECT and UPDATE/DELETE WHERE/ALL). | |
|
|
Once these temporary tables are created the indexes are lost. For eg, once the join between Employee and Employee_phone is done the indexes employee_number is lost. Teradata creates indexes for temporary according to the information for Global or Volatile temporary tables. |
|
Example 2:
EXPLAIN Select e.first_name,d.department_name,ep.phone | |
*** Help information returned. 31 rows.
*** Total elapsed time was 1 second.
Explanation
---------------------------------------------------------------------------
1) First, we lock a distinct customer_service."pseudo table" for read
on a RowHash to prevent global deadlock for customer_service.ep.
2) Next, we lock a distinct customer_service."pseudo table" for read
on a RowHash to prevent global deadlock for customer_service.e.
3) We lock a distinct customer_service."pseudo table" for read on a
RowHash to prevent global deadlock for customer_service.d.
4) We lock customer_service.ep for read, we lock customer_service.e
for read, and we lock customer_service.d for read.
5) We do an all-AMPs JOIN step from customer_service.ep by way of a
RowHash match scan with no residual conditions, which is joined to
customer_service.e. customer_service.ep and customer_service.e
are joined using a merge join, with a join condition of (
"customer_service.e.employee_number =
customer_service.ep.employee_number"). The result goes into Spool
2, which is redistributed by hash code to all AMPs. Then we do a
SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with low confidence to be 80 rows. The estimated time
for this step is 0.06 seconds.
6) We do an all-AMPs JOIN step from customer_service.d by way of a
RowHash match scan with a condition of (
"customer_service.d.budget_amount > 500000.00"), which is joined
to Spool 2 (Last Use). customer_service.d and Spool 2 are joined
using a merge join, with a join condition of (
"Spool_2.department_number = customer_service.d.department_number").
The result goes into Spool 1, which is built locally on the AMPs.
The size of Spool 1 is estimated with no confidence to be 27 rows.
The estimated time for this step is 0.18 seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.24 seconds.
| |
![]() |
Executing the Selection for the Department with the condition d.budget_amount > 500000 before the join, results in a smaller table and hence the join resulting in the second temporary table executes much faster. |
|
Example 3:
EXPLAIN select UNIQUE e.first_name,e.last_name | |
*** Help information returned. 39 rows.
*** Total elapsed time was 1 second.
Explanation
--------------------------------------------------------------------------------
1) First, we lock a distinct customer_service."pseudo table" for read
on a RowHash to prevent global deadlock for customer_service.lp.
2) Next, we lock a distinct customer_service."pseudo table" for read
on a RowHash to prevent global deadlock for customer_service.l.
3) We lock a distinct customer_service."pseudo table" for read on a
RowHash to prevent global deadlock for customer_service.e.
4) We lock customer_service.lp for read, we lock customer_service.l
for read, and we lock customer_service.e for read.
5) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from customer_service.l by
way of an all-rows scan with a condition of (
"customer_service.l.state = 'california'") into Spool 2,
which is redistributed by hash code to all AMPs. Then we do
a SORT to order Spool 2 by row hash. The size of Spool 2 is
estimated with no confidence to be 8 rows. The estimated
time for this step is 0.03 seconds.
2) We do an all-AMPs JOIN step from customer_service.lp by way
of a RowHash match scan with no residual conditions, which is
joined to customer_service.e. customer_service.lp and
customer_service.e are joined using a merge join, with a join
condition of ("customer_service.e.employee_number =
customer_service.lp.employee_number"). The result goes into
Spool 3, which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 3 by row hash. The size of
Spool 3 is estimated with low confidence to be 80 rows. The
estimated time for this step is 0.06 seconds.
6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
RowHash match scan, which is joined to Spool 3 (Last Use). Spool
2 and Spool 3 are joined using a merge join, with a join condition
of ("Spool_3.location_number = Spool_2.location_number"). The
result goes into Spool 1, which is redistributed by hash code to
all AMPs. Then we do a SORT to order Spool 1 by the sort key in
spool field1 eliminating duplicate rows. The size of Spool 1 is
estimated with no confidence to be 80 to 640 rows. The estimated
time for this step is 0.20 to 0.21 seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.26 seconds.
| |
![]() |
The Selection on table location with condition location.state='california' is executed in parallel to the join between the employee and location_employee tables. The results are stored in Spool 2 and Spool 3 respectively. Then Spool 2 and Spool 3 and joined using merge join and the results are stored in Spool 1. The result in Spool 1 are sent back to the user. |