VPROC -
the Unit of Parallelism
- Teradata RDBMS is designed for parallelism.
This is a great strength of the Teradata database, resulting in higher
performance.
- The virtual processor (VPROC) is the basic
unit of parallelism. One symmetrical multi-processing (SMP) node is consisted
of several VPROCs, as many as 8 or 12.
- Each VPROC controls and manages its
system-assigned data, which is associated with specific disks, for example one
rank in a redundant array of independent drives (RAID5) disk array. Figure
below shows how one VPROC works.
The Parallel Unit (VPROC)
owns and manages all database activity against its
data.
- Teradata Automatically Spreads the Rows
Evenly Across VPROCs. In parallel processing, skewed data distribution can
cause problem because the total process time will depend on the slowest
process. Teradata evenly assigns data rows to VPROCs using a single
partitioning scheme hash partitioning. The value found in the columns,
table primary index is put through the hashing algorithm and two outputs - a
hash bucket ( maps to one VPROC ) and a hash ID (physical identifier of the
row on disk).
- To retrieve a row, the primary index value
is passed to the hashing algorithm. Hash algorithm generates a hash bucket
pointing to the VPROC and a hash ID locating on that particular VPROC
disk. There is no space or processing overhead involved in either building a
primary index or accessing a row through its primary index value, as no
special index structure is built to support the primary index.