Back to Research Page

Pankaj Mehta




Data Partitioning Schemes

Teradata uses a single partitioning scheme, a hash partition. Data are processed through an algorithm and automatically distributed across all Virtual AMPs. While this limits the DBA’s flexibility, it has proven to be an extremely effective algorithm and also eliminates data skewing.

Hash Partitioning
Evenly distributing up to 100 terabytes of data is the key to evenly distributing the application workload. In addition, since data placement is done without Database Administrator intervention or planning, a Teradata system has extremely low support costs.
How Rows are Distributed Across the Disks

In a Teradata database, the rows of every table are distributed randomly and evenly across all of the VPROCs (the units of parallelism) in the system. The DBA is never given a choice to only populate some selected VPROCs or nodes. This even and automatic distribution ensures equal processing effort, as well as data balance across the entire system, no matter how large it grows or what type of query activity it faces. Achieving this balance will depend on the table’s primary index columns being unique or nearly unique, as discussed later.

Traditional File systems:

  • rows are stored either randomly or sequentially within pre-allocated file space with some space reserved for overflow.
  • Rows for any given table may span one or more disk drives.
  • This file storage technique tends to serialize access.
  • Adding a large number of rows often requires a total reorganization or a migration.

Traditional databases uses:

  • value-based table partitioning, they require sizing,
  • pre-allocation and placement is manpower-intensive and complex, because partitions will need to be monitored and adjusted overtime. As such databases grow, unloading and reloading the data to re-align partitions is common.

There is never a need for DBA-intensive activities such as database reorgs with Teradata.