How to choose partitioned primary index (PPI)

What is the difference between NPPI and PPI?

NPPI: Non partitioned primary index
The good old regular PI. The rows are distributed by HASHAMP(HASHBUCKET(HASHROW(PI))), and ordered by HASHROW(PI), nothing special
PPI: Partitioned primary index
Distribution is the same, but ordering different: <PartitionID><HASHROW(PI)>. The <PartitionID> is a stored value in each rows, allocating 2 or 8 bytes (see below).

The only difference is the storing order of the records (and the 2/8 bytes overhead).

What is PPI good for?

The partitioning feature - like in many other databases - usually solves some performance issues, say enables to eliminate some needless work in specific situations.

SELECT
Eligible "where conditions" result serious partition-elimination, which means that usually only a small fraction of the table should be scanned instead of the whole one.
INSERT
Check the storing order of the NPPI tables: the records are in "hash" order, that is if I want to insert a series of records into a Teradata table, they will reside in spreadly distributed data blocks. If the table is big enough, my new eg. 1000 records will get into ~1000 different data blocks, what means 1000 pieces of expensive "random writes". However if my 1000 records got to a PPI table and they will have the same PartitionID, they will get into far less than 1000 data blocks with high probability. In real life situations we often will write to continuous data blocks with much cheaper "sequential write"
DELETE
Same as INSERT
BACKUP
Teradata allows archiving only one or more partitions, saving lot of time and tape. Older data in transaction tables usually does not change therefore it is unnecessary to backup them every time

"Good-to-know"s

Costs of partitioning

Like all good things in the world, partitioning has trade-offs also:

Extra 2/8 bytes per record allocated storage space
Depending on maximal number of partitions. See "Number of partitions" chapter
Slower "SAMPLE" scans
Proper random sampling is more complex, since the physical storing order is in correlation with partitioning value
Extra sort operations / Sliding window joins
If joined to a table which has NPPI or PPI with not exactly same definition will result a preparation "sort" step, or leads to a "sliding window merge join", which is technically N x M merge joins between the partitions of TableA and TableB.

Number of partitions

How many partitions should I have?
How many partitions do I have?
How is an empty partition looks like?
They are all interesting questions, let's analyze the implementation of Teradata implementation.

Partition is not an object, it is just a calculated (and stored) value in the record, which will determine the physical storing order of the record. A partition will not allocate space, an "empty partition" technically means that no record exists with that partition's partitionID, nothing else.
How many partitions I have in the table? As many different PartitionID in the existing records occure, which depends on the occurring values of the partitioning column.
How many partitions can I have in the table? It depends on the table definition. One must use the RANGE_N or the CASE_N function to define the PartitionID calculation. Its definition unambiguously determines how many different PartitionID values may occur. In versions up to V13.10 65535 is allowed, from V14.00 we can have as many as 9.2 Quintillion (8 bytes PartitionID). The table definition cannot be altered to switch between 2 and 8 bytes layout.

What is the drawback of having many partition? The sliding-window merge join. Mind including partitioning column into the PI if possible (otherwise PI based filtering will cause as many accesses as many partitions exist).

What happens with the out-of-range records?

We have the clauses NO RANGE and NO CASE in the PPI definition. They mean an ID value for that partition that is out of the defined range or case, those records got into this partition. It can be a hidden trap, if you forget to maintain your date partition definition on a transaction table, and all records got to get into this partition from a moment. And the partition keeps fattening, queries keep go slowing somehow...

Multi level partitioning

This is a good trick. One can define partitioning "hierarchically", which is simply a "Cartesian product" of the partitions at each levels, the result is a single PartitionID. In case of 2 bytes partitioning, the "Cartesian product" should fall below 65535.

What is sensational in the Teradata implementation of multi level PPI? You can filter only lower level partitioning key(s) also, partition elimination will happen. How? It calculates all possible combinations, and produces the PartitionID list to be scanned, excellent.

Partitioning granularity

The next good question is: how fine should I define partitioning?

It depends. Basically I'd branch to two main cases:

"Temporal" (date) partitioning
The best partition size is the day. Most of the filtering is on day level, and we have ~365 days a year, not too much partitions for your lifetime. If we partition on monthly units, then the partition elimination ranges are more rough, and we have 12 partitions a year, which is also too much in case of a PI-NPPI join.
All others
It really depends. Depends on the goal, and the value demographics. It's good to correlate with the filtering pattern (what is the frequent relevant 'where' condition parcel).

Hope it helped, please ask, if something is missing or confusing.

TERADATA

Tuesday, May 13, 2014