Friday, September 26, 2025

Handling Teradata SKEW with RANDOM function

Data skew in Teradata occurs when data is unevenly distributed across the AMPs (Access Module Processors), leading to performance issues, particularly during joins or aggregations. The RANDOM() function can be employed as a salting technique to mitigate this skew.

Using RANDOM() to Handle Skew:
  • Identify the Skewed Column: 
    Determine the column in a table that exhibits significant data skew, meaning a few values occur much more frequently than others.
  • Generate Random Values: 
    Create a new column or a temporary column populated with random numbers using the RANDOM() function. The range of random numbers should be chosen based on the desired level of distribution. For example, RANDOM(1, 100) generates random integers between 1 and 100.
  • Combine with Skewed Column: 
    Concatenate the skewed column with the newly generated random value. This creates a composite key.
    SELECT 
        PrimaryKeyColumn,
        SkewedColumn,
        TRIM(CAST(RANDOM(1,100) AS VARCHAR(3))) AS RandomSalt,
        SkewedColumn || TRIM(CAST(RANDOM(1,100) AS VARCHAR(3))) AS SaltedJoinColumn
    FROM 
        YourSkewedTable;

  • Perform Join/Aggregation on Composite Key: When performing joins or aggregations, use this composite key instead of the original skewed column. This distributes the frequently occurring values across different AMPs, as the random component ensures unique hash values for each instance of the skewed value.
    CREATE VOLATILE TABLE NotSkewingTable AS
    (
        SELECT 
            PrimaryKey, 
            SkewedColumn,
            SkewedColumn || TRIM(CAST(RANDOM(1,100) AS VARCHAR(3))) AS SaltedJoinColumn
        FROM 
            YourSkewedTable
    ) WITH DATA
    ON COMMIT PRESERVE ROWS;

    SELECT 
        T1.ColumnA, 
        T2.ColumnB
    FROM 
        AnotherTable T1
    JOIN 
        NotSkewingTable T2
    ON 
        T1.JoinColumn = T2.SaltedJoinColumn;

Benefits:
  • Improved Parallelism: Distributes the workload more evenly across AMPs, leading to faster query execution.
  • Reduced Spool Space: Minimizes the need for large intermediate spool files, which can consume significant resources.
  • Enhanced Join Performance: Addresses the bottleneck caused by skewed join keys.
Considerations:
  • Overhead: 
    Generating and concatenating random values adds a small overhead, but the performance gains typically outweigh this.
  • Cardinality: 
    The range of random numbers should be sufficient to effectively distribute the skewed data.
  • Data Types: 
    Ensure compatible data types when concatenating the skewed column and the random number.
  • Temporary Nature: 
    This technique is often applied in temporary or volatile tables for specific operations to avoid altering the base table structure.

Finding Skewed Tables by Querying the TableSizeV View

Finding SKEW

In this example, the SELECT request looks for poorly distributed tables by displaying the average, minimum, and maximum of the CurrentPerm figures allocated on each AMP to every table in the USER database. Each table is reported separately, ordered by name.

SELECT
  TableName (FORMAT 'X(20)'),
  MIN(CurrentPerm) AS "AMP Minimum",
  AVG(CurrentPerm) AS "AMP Average",
  MAX(CurrentPerm) AS "AMP Maximum"
FROM DBC.TableSizeV
WHERE DatabaseName = 'USER'
GROUP BY TableName
ORDER BY TableName; 

Result:

TableName              AMP Minimum   AMP Average   AMP Maximum
--------------------   -----------   -----------   -----------
employee_nupi_ondept   4,096         15,488        30,208
employee_upi_onempid   18,944        19,200        19,968

The result displays two tables. Notice the results show that:

  • The table Employee_upi_onempid is evenly distributed. CurrentPerm is similar across all vprocs (the minimum and maximum are close to the average). Permanent space is relatively evenly distributed across all AMPs in the system.
  • The table Employee_nupi_ondept is poorly distributed. The CurrentPerm figures range from a minimum of 4,096 bytes to a maximum of 30,208 bytes on different AMPs, indicating a wide variance from the average.

 

Wednesday, February 22, 2023

Teradata Table Type

 Teradata Table Type

1. ANSI Temporal

ANSI-compliant support for temporal tables. Using temporal tables, Teradata Database can process statements and queries that include time-based reasoning. Temporal tables record both system time (the time period when the information was recorded in the database) and valid time (the time period when the information is in effect or true in a real-world application).


2. Derived

A derived table is a type of temporary table obtained from one or more other tables as the result of a subquery. Is specified in an SQL SELECT statement. Avoids the need to use the CREATE and DROP TABLE statements for storing retrieved information.

Is useful when you are coding more sophisticated, complex queries.


3. Error Logging

Error logging tables store information about errors on an associated permanent table.Log information about insert and update errors.


4. Global Temporary

Global temporary tables are private to the session. Are dropped automatically at the end of a session. Have a persistent table definition stored in the Data Dictionary. The saved definition may be shared by multiple users and sessions with each session getting its own instance of the table.


5. Global Temporary Trace

Global temporary trace tables store trace output for the length of the session. Have a persistent table definition stored in the Data Dictionary. Are useful for debugging SQL stored procedures (via a call to an external stored procedure written to the trace output) and external routines (UDFs, UDMs, and external stored procedures).


6. NoPI

NoPI tables are permanent tables that do not have primary indexes defined on them. They provide a performance advantage when used as staging tables to load data from FastLoad or TPump Array INSERT. They can have secondary indexes defined on them to avoid full-table scans during row access.


7. Permanent

Permanent tables allow different sessions and users to share table content.


8. Queue

Queue tables are permanent tables with a timestamp column. The timestamp indicates when each row was inserted into the table. Establish first-in first-out (FIFO) ordering of table contents, which is needed for customer applications requiring event processing.


9. Volatile

Volatile tables are used when Only one session needs the table. Only the creator needs to access the table.You want better performance than a global temporary table. You do not need the table definition after the session ends.

Note: The definition of a volatile table can survive across a system restart if it is contained in a macro.

Friday, August 26, 2016

Easy way to get Syntax for a command in teradata

Easy way to get Syntax for a command in teradata

Easy way to get Syntax for a command in teradata

exec dbc.getsyntax('sql','create table');

getsyntax is a macro that accepts 2 parameters.

First parameter is the type of utility that the command belongs to.
valid values includes the following.

PMPC
BULKLOAD
DUMP
ARCHIVE
MULTILOAD
VERSION
SPL
FASTEXPORT
DATA_TYPE
HELP
SQL
FASTLOAD
TPCCONS

Second parameter the command name for which we need to find the syntax.

examples includes:
exec dbc.getsyntax('FASTLOAD','DEFINE');
exec dbc.getsyntax('MULTILOAD','.IMPORT');
exec dbc.getsyntax('SQL','UPDATE');

REPLACE MACRO DBC.GetSyntax
      (
         TheFunction (CHAR(30)),
         Command  (CHAR(30))
      )
      AS
      (
         SELECT Syntax FROM SysAdmin.HelpSyntax
         WHERE TheFunction = :TheFunction AND
               Command  = :Command 
         ORDER BY LineNumber;
      );

Determining Privileges for a User

Determining Privileges for a User

You can create an AllUserRights macro to list all the privileges a user has on a specific database.
The macro gets information from the DBC.AllRightsV and DBC.AllRolesRightsV views, and spells out the two character privilege code it finds in the AccessRightDesc field of
the views.


Sample Macro for Determining User Privileges


create macro paga003.AllUserRights (UserName char(30)) as (
locking row for access select
UserName (varchar(30))
,AccessType (varchar(30))
,RoleName (varchar(30))
,DatabaseName (varchar(30))
,TableName (varchar(30))
,ColumnName (varchar(30))
,AccessRight
,case
when accessright='AE' then 'ALTER EXTERNALPROCEDURE'
when accessright='AF' then 'ALTER FUNCTION'
when accessright='AP' then 'ALTER PROCEDURE'
when accessright='AS' then 'ABORT SESSION'
when accessright='CA' then 'CREATE AUTHORIZATION'
when accessright='CD' then 'CREATE DATABASE'
when accessright='CE' then 'CREATE EXTERNAL PROCEDURE'
when accessright='CF' then 'CREATE FUNCTION'
when accessright='CG' then 'CREATE TRIGGER'
when accessright='CM' then 'CREATE MACRO'
when accessright='CO' then 'CREATE PROFILE'
when accessright='CP' then 'CHECKPOINT'
when accessright='CR' then 'CREATE ROLE'
when accessright='CT' then 'CREATE TABLE'
when accessright='CU' then 'CREATE USER'
when accessright='CV' then 'CREATE VIEW'
when accessright='D' then 'DELETE'
when accessright='DA' then 'DROP AUTHORIZATION'
when accessright='DD' then 'DROP DATABASE'
when accessright='DF' then 'DROP FUNCTION'
when accessright='DG' then 'DROP TRIGGER'
when accessright='DM' then 'DROP MACRO'
when accessright='DO' then 'DROP PROFILE'
when accessright='DP' then 'DUMP'
when accessright='DR' then 'DROP ROLE'
when accessright='DT' then 'DROP TABLE'
when accessright='DU' then 'DROP USER'
when accessright='DV' then 'DROP VIEW'
when accessright='E' then 'EXECUTE'
when accessright='EF' then 'EXECUTE FUNCTION'
when accessright='GC' then 'CREATE GLOP'
when accessright='GD' then 'DROP GLOP'
when accessright='GM' then 'GLOP MEMBER'
when accessright='I' then 'INSERT'
when accessright='IX' then 'INDEX'
when accessright='MR' then 'MONITOR RESOURCE'
when accessright='MS' then 'MONITOR SESSION'
when accessright='NT' then 'NONTEMPORAL'
when accessright='OD' then 'OVERRIDE DELETE POLICY'
when accessright='OI' then 'OVERRIDE INSERT POLICY'
when accessright='OP' then 'CREATE OWNER PROCEDURE'
when accessright='OS' then 'OVERRIDE SELECT POLICY'
when accessright='OU' then 'OVERRIDE UPDATE POLICY'
when accessright='PC' then 'CREATE PROCEDURE'
when accessright='PD' then 'DROP PROCEDURE'
when accessright='PE' then 'EXECUTE PROCEDURE'
when accessright='RO' then 'REPLICATION OVERRIDE'
when accessright='R' then 'RETRIEVE/SELECT'
when accessright='RF' then 'REFERENCES'
when accessright='RS' then 'RESTORE'
when accessright='SA' then 'SECURITY CONSTRAINT ASSIGNMENT'
when accessright='SD' then 'SECURITY CONSTRAINT DEFINITION'
when accessright='ST' then 'STATISTICS'
when accessright='SS' then 'SET SESSION RATE'
when accessright='SR' then 'SET RESOURCE RATE'
when accessright='TH' then 'CTCONTROL'
when accessright='U' then 'UPDATE'
when accessright='UU' then 'UDT Usage'
when accessright='UT' then 'UDT Type'
when accessright='UM' then 'UDT Method'
else''
end (varchar(26)) as AccessRightDesc
,GrantAuthority
,GrantorName (varchar(30))
,AllnessFlag
,CreatorName (varchar(30))
,CreateTimeStamp
from
(
select
UserName
,'User' (varchar(30)) as AccessType
,'' (varchar(30)) as RoleName
,DatabaseName
,TableName
,ColumnName
,AccessRight
,GrantAuthority
,GrantorName
,AllnessFlag
,CreatorName
,CreateTimeStamp
from dbc.allrights
where UserName = :username
and CreatorName not = :username
union all
select
Grantee as UserName
,'Member' as UR
,r.RoleName
,DatabaseName
,TableName
,ColumnName
,AccessRight
,null (char(1)) as GrantAuthority
,GrantorName
,null (char(1)) as AllnessFlag
,null (char(1)) as CreatorName
,CreateTimeStamp
from dbc.allrolerights r
join dbc.rolemembers m
on m.RoleName = r.RoleName
where UserName = :username
union all
select
User as UserName
,m.Grantee as UR
,r.RoleName
,DatabaseName
,TableName
,ColumnName
,AccessRight
,null (char(1)) as GrantAuthority
,GrantorName
,null (char(1)) as AllnessFlag
,null (char(1)) as CreatorName
,CreateTimeStamp
from dbc.allrolerights r
join dbc.rolemembers m
on m.RoleName = r.RoleName
where m.grantee in (select rolename from dbc.rolemembers where grantee = :username)
) AllRights
order by 4,5,6,7; );


where database_name is the name of a database in your system, for which the macro checks user privileges.
For example, if you create the macro in the DBAdmin database, you identify the macro as DBAdmin.AllUserRights.
Note: This macro returns all privileges granted to a user either directly or through a role.

It does not return implicit (ownership) privileges.


Executing the Privilege Check Macro After you create the macro, you can execute it to check access privileges for a particular user
with the command:


execute database_name.AllUserRights ('username');
where:
• database_name is the name of the macro, and also identifies database for which the macro checks user privileges
• username is the name of a permanent database user for which the macro checks privileges
• The user that executes the macro must have EXECUTE privileges on the database that contains the macro.



Thanks

Friday, January 2, 2015

Database Query Log (DBQL) in Teradata

DBQL provides a series of pre-defined tables that can store, based on rules you specify, historical records of queries and their duration, performance, and target activity.

DBQL is flexible enough to log information on the variety of SQL requests that run on the Teradata Database, from short transactions to longer-running analysis and  mining queries. You begin and end collection for a user, group of users, account, or a list of accounts.

Collection option include:
  • Default logging reports for each query with at least the leading SQL characters, the time of receipt, the number of processing steps completed, and the time the first step was dispatched, and the times the packets were returned to the host.
  • Summary logging reports at each logging interval the count of all queries that completed processing time within the specified time intervals, I/O criteria, or CPU usage criteria. 
  • Threshold logging can log a combination of default and summary data:
    • Default data for each query that ran beyond the threshold limit
    • Summary counts of all queries that ran within the threshold time
    • Use I/O or CPU time as a criterion for detail or summary logging
  • Detail logging, which includes:
    • Default Data
    • Any or all of the following:
      • Step level Activity, including parallel steps
      • Object Usage per Query
      • Full SQL text
      • Explain Text


Note: Any DBC database tables and columns used by the system while processing a query are not reflected in the DBQL object rows for that query. This means, for example, that statements like CREATE TABLE or SELECT FROM DBC.xxx will not have objects logged through DBQL because they deal with DBC tables and columns.

Also, DBQL is limited to logging information about base tables and logging direct SQL statements. Macros, Views and triggers are not reported.

The DBQL Components
The DBQL logs are a series of system tables created in database DBC during the Teradata Database installation process. The suite of DBQL components includes a security macro and a view for each table, which are created in database DBC by the DIP utility.

DBQL tables that are being populated include:
  • DBQLogTbl  table stores default rows. The foundation of the DBQL feature.
  • DBQLObjTbl  table stores information on the target objects of the query being logged. One row is logged for each object referenced in the query.
  • DBQLSQLTbl  table logs the full SQL statement text, no matter how large. Multiple rows are generated if necessary.
  • DBQLStepTbl  table logs the AMP step-level information of the query being logged
  • DBQLExplainTbl  table logs the full unformatted Explain text.
DBQL Data Capture

Like other system tables, the predefined DBQL logs are created as relational tables in database DBC during normal Teradata Database installation. However, while most system tables are populated automatically, you can choose whether you want to populate the DBQL tables.
If you choose not to use the feature, the tables remain empty. If you want to use the feature, simply submit a BEGIN QUERY LOGGING statement, with or without options.
The options enable you to control the volume and detail of the logged data. You can define rules, for instance, that log the first 5,000 characters of any query that runs during a session invoked by a specific user under a specific account. Or you could define a rule to log queries that take more time to complete than the specified time threshold or queries with CPU usage that exceed a specific CPU hundredths of a second.
DBQL is controlled by the Teradata SQL statements BEGIN QUERY LOGGING and END QUERY LOGGING. Only a user with EXECUTE privilege on DBC.DBQLAccessMacro can invoke the statements.

DBQL Log Table
Default rows are stored in DBQLogTbl, the foundation of the DBQL feature.
When you do not specify a LIMIT SUMMARY or LIMIT THRESHOLD option, one default row of query-level information is logged in DBQLogTbl for each query processed during a session that is initiated by any user for whom a query logging rule exists.
If you specify options that result in more detailed information, a default row is still generated in DBQLogTbl (except with the SUMMARY option or a query that completes within the limit specified with the THRESHOLD option), plus one or more additional logs are populated with one or more additional rows.
Note: DBQL is limited to logging information about base tables and logging direct SQL statements. Macros, views, and triggers do not result in complete logging information.

The Default Row
The fields of the default row provide general query information that is usually adequate for investigating a query that is interfering with performance. When no options are specified, a default row includes at least the:
  • User ID and user name under which the session being logged was initiated
  • Unique ID for the process, session, and host (client) connection
  • Account string, expanded as appropriate, that was current when the query completed
  • First 200 characters of the query SQL statement
  • Times - Start and Last Response
  • CPU and I/O statitics
  • Default Database name that was current when the query completed
  • Numerous other performance metrics
DBQL Object Table
With the OBJECTS option, DBQL logs one row in DBQLObjTbl for each data object referenced by the query. An object can be a database, data table, column, secondary index, join index, or journal table. (If the object is a secondary index, its number is logged rather than a name.) Also logged is a count of the number of times the object was accessed by the optimizer.
Note: Any DBC database tables and columns used by the system while processing a query are not reflected in the DBQL object rows for that query. This means, for example, that statements like CREATE TABLE or SELECT FROM DBC.xxx will not have objects logged through DBQL because they deal with DBC tables and columns.
Only objects from base tables are logged. Objects from views and macros are not logged.

Scenario 1
User runs a SELECT statement that joins two tables owned by the same database:
  • One row for the query is logged in DBQLogTbl
  • Rows are logged in DBQLObjTbl as follows:
    • A row for the database
    • One row for each table
    • One row for each selected column
Scenario 2
User runs a query that causes the Optimizer to reference the same object twice:
  • One row for the object is logged in DBQLObjTbl
  • The value in the FreqofUse field is incremented to 2.
DBQL SQL Table
The DBQLSQLTbl is populated if you specify the SQL option in addition to default information. The SQL option logs the full statement text, no matter how large, into DBQLSqlTbl. Multiple rows are generated as necessary.
Note: The full SQL text of a CREATE/ALTER/REPLACE PROCEDURE/ FUNCTION is not logged in DBQLSQLTbl when the statement is submitted with the option to not save the source in the database.

DBQL STEP Table
DBQLStepTbl is populated if you specify the STEPINFO option. When the query completes, the system logs one row for each query step, including parallel steps.

DBQL EXPLAIN Table
This table is populated if the WITH EXPLAIN option is requested in addition to the default information row. If the additional Explain text is greater than 64KB, multiple rows are generated.The DBQLExplainTblis populated with unformatted explain text.






Thursday, January 1, 2015

Collect Statistics Recommendations in Teradata TD14


Statistical information is vital for the optimizer when it builds query plans. But collecting statistics can involve time and resources. By understanding and combining several different statistics gathering techniques, users of Teradata can find the correct balance between good query plans and the time required to ensure adequate statistical information is always available.

Collect Full Statistics
•PI of small tables (less than 100 rows per AMP)
•Non-indexed columns used in where or join constraints
•Composite columns that often appear together in conditions with equality
•All NUSIs with an uneven distribution (High Bias Values)
•NUSIs used in Join conditions
•UPIs and USIs used in range constraints
•NUPIs that are more than or less than 75% unique (with no statistics, optimizer assumes 75% unqiueness)

Can rely on Random AMP Sample
•UPIs or USIs only used in equality constraints
•NUSIs with even distribution of values
•NUPIs with even distribution

Option to use USING SAMPLE
• Unique index columns
• Nearly-unique columns or indexes (Columns or indexes which are over 95% unique)

Other Considerations

1. Optimizations such as nested join, partial GROUP BY, and dynamic partition elimination will not be chosen unless statistics have been collected on the relevant columns.

Statistics must be collected on relevant columns to use optimizations such as:
•Nested Join
•Partial Group By
•Dynamic partition elimination

NUPIs that are used in join steps in the absence of collected statistics are assumed to
be 75% unique, and the number of distinct values in the table is derived from that. A
NUPI that is far off from being 75% unique (for example, it’s 90% unique, or on the other
side, it’s 60% unique or less) will benefit from having statistics collected. However, if it
is close to being 75% unique, then random AMP samples are adequate. To determine
what the uniqueness of a NUPI is before collecting statistics, you can issue this SQL
statement:
EXPLAIN SELECT DISTINCT nupi-column FROM table;

2. For a partitioned primary index table, it is recommended that you always collect statistics on:
• PARTITION. This tells the optimizer how many partitions are empty, and how many rows are in each partition. This statistic is used for optimizer costing.
• The partitioning column. This provides cardinality estimates to the optimizer when the partitioning column is part of a query’s selection criteria.

3. For a partitioned primary index table, consider collecting these statistics if the partitioning column is not part of the table’s primary index (PI):
• (PARTITION, PI). This statistic is most important when a given PI value may exist in multiple partitions, and can be skipped if a PI value only goes to one partition. It provides the optimizer with the distribution of primary index values across the partitions. It helps in costing the sliding-window and rowkey-based merge join, as well as dynamic partition elimination.
• (PARTITION, PI, partitioning column). This statistic provides the combined number of distinct values for the combination of PI and partitioning columns after partition elimination. It is used in rowkey join costing.

4. Random AMP sampling has the option of pulling samples from all AMPs, rather than from a single AMP (the default). All-AMP random AMP sampling has these particular advantages:
• It provides a more accurate row count estimate for a table with a NUPI. This benefit becomes important when NUPI statistics have not been collected (as might be the case if the table is extraordinarily large), and the NUPI has an uneven distribution of values.
• Statistics extrapolation for any column in a table will not be attempted for small tables or tables whose primary index is skewed (based on full statistics having been collected on the PI), unless all-AMP random AMP sampling is turned on. Because a random AMP sample is compared against the table row count in the histogram as the first step in the extrapolation process, an accurate random AMP sample row count is critical for determining if collected statistics are stale, or not.
(Set the DBSControl setting for Random AMP samples to All AMPs instead of the default of single AMP to leverage extrapolation on small tables or skewed tables)