Tuesday, May 13, 2014

How to choose partitioned primary index (PPI)

How to choose partitioned primary index (PPI)


What is the difference between NPPI and PPI?

  • NPPI: Non partitioned primary index
    The good old regular PI. The rows are distributed by HASHAMP(HASHBUCKET(HASHROW(PI))), and ordered by HASHROW(PI), nothing special
  • PPI: Partitioned primary index
    Distribution is the same, but ordering different: <PartitionID><HASHROW(PI)>. The <PartitionID> is a stored value in each rows, allocating 2 or 8 bytes (see below).
The only difference is the storing order of the records (and the 2/8 bytes overhead).

What is PPI good for?

The partitioning feature - like in many other databases - usually solves some performance issues, say enables to eliminate some needless work in specific situations.
  • SELECT
    Eligible "where conditions" result serious partition-elimination, which means that usually only a small fraction of the table should be scanned instead of the whole one.
  • INSERT
    Check the storing order of the NPPI tables: the records are in "hash" order, that is if I want to insert a series of records into a Teradata table, they will reside in spreadly distributed data blocks. If the table is big enough, my new eg. 1000 records will get into ~1000 different data blocks, what means 1000 pieces of expensive "random writes". However if my 1000 records got to a PPI table and they will have the same PartitionID, they will get into far less than 1000 data blocks with high probability. In real life situations we often will write to continuous data blocks with much cheaper "sequential write" 
  • DELETE
    Same as INSERT
  • BACKUP
    Teradata allows archiving only one or more partitions, saving lot of time and tape. Older data in transaction tables usually does not change therefore it is unnecessary to backup them every time

"Good-to-know"s

Costs of partitioning

Like all good things in the world, partitioning has trade-offs also:
  • Extra 2/8 bytes per record allocated storage space
    Depending on maximal number of partitions. See "Number of partitions" chapter
  • Slower "SAMPLE" scans
    Proper random sampling is more complex, since the physical storing order is in correlation with partitioning value
  • Extra sort operations / Sliding window joins
    If joined to a table which has NPPI or PPI with not exactly same definition will result a preparation "sort" step, or leads to a "sliding window merge join", which is technically N x M merge joins between the partitions of TableA and TableB.

Number of partitions

How many partitions should I have?
How many partitions do I have?
How is an empty partition looks like?
They are all interesting questions, let's analyze the implementation of Teradata implementation.

Partition is not an object, it is just a calculated (and stored) value in the record, which will determine the physical storing order of the record. A partition will not allocate space, an "empty partition" technically means that no record exists with that partition's partitionID, nothing else.
How many partitions I have in the table? As many different PartitionID in the existing records occure, which depends on the occurring values of the partitioning column.
How many partitions can I have in the table? It depends on the table definition. One must use the RANGE_N or the CASE_N function to define the PartitionID calculation. Its definition unambiguously determines how many different PartitionID values may occur. In versions up to V13.10 65535 is allowed, from V14.00 we can have as many as 9.2 Quintillion (8 bytes PartitionID). The table definition cannot be altered to switch between 2 and 8 bytes layout.

What is the drawback of having many partition? The sliding-window merge join. Mind including partitioning column into the PI if possible (otherwise PI based filtering will cause as many accesses as many partitions exist).

What happens with the out-of-range records?

We have the clauses NO RANGE and NO CASE in the PPI definition. They mean an ID value for that partition that is out of the defined range or case, those records got into this partition. It can be a hidden trap, if you forget to maintain your date partition definition on a transaction table, and all records got to get into this partition from a moment. And the partition keeps fattening, queries keep go slowing somehow...

Multi level partitioning

This is a good trick. One can define partitioning "hierarchically", which is simply a "Cartesian product" of the partitions at each levels, the result is a single PartitionID. In case of 2 bytes partitioning, the "Cartesian product" should fall below 65535.

What is sensational in the Teradata implementation of multi level PPI? You can filter only lower level partitioning key(s) also, partition elimination will happen. How? It calculates all possible combinations, and produces the PartitionID list to be scanned, excellent.

Partitioning granularity

The next good question is: how fine should I define partitioning?
It depends. Basically I'd branch to two main cases:
  • "Temporal" (date) partitioning
    The best partition size is the day. Most of the filtering is on day level, and we have ~365 days a year, not too much partitions for your lifetime. If we partition on monthly units, then the partition elimination ranges are more rough, and we have 12 partitions a year, which is also too much in case of a PI-NPPI join.
  • All others
    It really depends. Depends on the goal, and the value demographics. It's good to correlate with the filtering pattern (what is the frequent relevant 'where' condition parcel).
Hope it helped, please ask, if something is missing or confusing.

1 comment:

  1. What's the Row Key in the case of a Multi-Level PPI ?

    ReplyDelete