http://www.teradatamagazine.com/v09n02/tech2tech/applied-solutions-3-whats-a-dba-to-do/
A DATA WAREHOUSE FROM TERADATA ELIMINATES MANY MANUAL TASKS TYPICAL OF OTHER SYSTEMS.
What do experienced DBAs with an Oracle, IBM or Microsoft background need to know about managing a Teradata system? Basically, much less than they need to know about the others.
The Teradata Database is a shared-nothing massively parallel processing (MPP) relational database management system (RDBMS), making it the only commercially available RDBMS designed from the ground up for data warehousing.
Parallel processing and the automation of many typical DBA functions were created in the DNA of the Teradata Database. Because of that architecture, many functions that are manual or enhanced by wizards in other vendors’ systems are managed automatically. Consequently, the roles and responsibilities of the DBA are significantly different. Fewer tasks are required, making the system much easier to manage. Understanding those differences and how to exploit them with the RDBMS is key to driving the success of the organization.
The next sections break down how a data warehouse from Teradata differs from other systems, enabling the DBA to focus on more productive work and less on manually maintaining the data warehouse.
SYSTEM INSTALLATION
Data warehouse performance is achieved in a parallel database architecture by the “divide and conquer” method. First, the data is divided into small, equal units. Then independent software modules process those units simultaneously (i.e., in parallel) to conquer the problems (i.e., answer the queries).
The units of work and the hardware resources allocated to each parallel software module must be as equal as possible. Like a chain that is only as strong as its weakest link, the overall job cannot conclude until every unit has completed its processing. This “balanced processing” of workloads in the Teradata Database enables superior query performance.
The DBA’s objective, therefore, is to install a balanced processing platform, operating system (OS), database management software and disk subsystems. In a typical environment, these steps require careful planning with time-consuming analysis of user data and targeted queries. Anxious for a quick return on investment (ROI), however, management too often applies pressure to take shortcuts—which can have disastrous consequences.
With a Teradata purpose-built platform, the OS, database and disk subsystems are installed before system delivery. All that is needed is the size of the raw user data, number of concurrent users and some targeted queries. This information is inputted into a system-sizing calculator and a platform configuration is recommended. The result is a balanced parallel-processing platform that is tailored to the organization’s application requirements.
A pre-configured, purpose-built Teradata platform relieves DBAs of the responsibility of understanding and setting the myriad OS and database parameters and installation options. No longer do DBAs have to spend hours writing programs to analyze the data and determine partitioning and data placement. They are free from the burden of having to understand and be responsible for setting the various options and initialization parameters with both the database and OS.
Those runtime database control parameter settings are critical for performance tuning of transaction processing applications where queries are predetermined and must be tuned. However, a data warehouse is built on the concept that users are able to ask any question of any data at any time. There is no opportunity to know or tune the queries beforehand. This is best left to the Teradata Database to optimize and tune dynamically.
In short, once the Teradata platform is installed, the DBA can immediately define the databases, users and tables and load data. Users can then leverage the data warehouse as it is intended—to run queries that answer their business intelligence (BI) questions.
STORAGE MANAGEMENT
Because the Teradata storage subsystem is installed and balanced before delivery, management of the disk subsystem is greatly simplified. The DBA familiar with managing items such as disk groups, logical volumes, node groups, file system, files and tablespaces will find that those entities and concepts are nonexistent. (See table.)
All disk organization is entirely logical, as opposed to physical. (See figure) Initially, all space in the system is allocated to a predefined system database called DBC. Using the “CREATE” DML command, the DBA will define DATABASE and USER entities. The space parameter on the CREATE DML statement is not a physical “allocation” but is simply a size quota. If a database is allocated 5TB of space that is the maximum amount of space the database is allowed to use. Anytime that database attempts to use more than 5TB, an “out of space” message will result. However, the system is not out of space, it just exceeded its space quota.
DISK FILE SYSTEM
A database management system (DBMS) is designed for storing and retrieving data. Typical file-system architectures fragment and performance degrades over time as insert, updates and deletes are applied to the data.
Teradata broke all the rules with its file system design. Data is not stored in B-Tree indexes based on data values; rather, the file system is built on raw disk slices. There are no pages, BufferPools, tablespaces, extents, etc. to manage. Row lengths are variable and stored in blocks that can grow or shrink on demand. Maximum block size is configurable with row or block placement managed by the DBMS. Rows can dynamically be moved, space reclaimed or the file system defragmented on the fly. This is a transparent background process that runs continuously using available system resources. Because of this process, the DBA never has to do a re-organization, and performance is optimized on a continual basis and does not degrade with file updates.
USER MANAGEMENT
Defining users is easy with a Teradata system. There are two types: query users with no workspace capability; and power users who have the capability to create and manipulate tables within their own workspace based on whatever limitations the DBA placed on their space usage.
The DBA first adds the users with a CREATE USER DML command, then grants them security rights to database entities. Role-based security is supported for ease of maintenance.
INDEX MANAGEMENT
DBAs need to resist the temptation to over-index the tables. Because of the powerful parallel architecture of the Teradata platform, it is unnecessary to avoid full-table scans. Therefore, far fewer indexes are needed than in other RDBMSs. In fact, as experienced in several organizations, tables having more than 80 billion rows each can be scanned in less than five minutes.
These recommended steps will help determine the number of indexes needed in the Teradata Database:
- The DBA defines a primary index (PI) and a secondary index on any column that will participate as a foreign key in join operations. The PI is for data distribution and keyed access.
- Once the indexes are defined, the query workload is run and the query capture facility logs the query activity.
- The Teradata Index Wizard uses this information to recommend the addition or removal of indexes based on actual query usage.
This process saves the DBA from having to manually analyze the number of indexes.
Special indexes are available for specific performance needs. The partitioned primary index (PPI), for one, can deliver dramatic results. A large transaction file that is accessed with date parameter queries is a good candidate for creating a PPI on transaction date. The Optimizer then can eliminate partitions on any date-sensitive queries with dramatic response-time reductions.
The multi-level PPI, join, aggregate and aggregate join indexes are tools that can turbo-charge certain applications.
WORKLOAD MANAGEMENT
Teradata Active System Management provides the necessary tools for comprehensive workload management; therefore, no outside tools or resources are needed. The product has three components:
Dynamic Query Manager enables the DBA to classify and govern the query before its execution in the database.
Priority Scheduler defines resource partitions where varying workloads can be controlled and monitored.
Database Query Log provides post-execution performance analysis.
Because Teradata tools administer all performance management and tuning needs, the DBA no longer has to be an expert in the OS, database and third-party tools.
SYSTEM EXPANSION
One common characteristic of successful data warehouses is growth. Often, no matter how much detail is put into planning a data warehouse, expansion needs often arise unexpectedly and in unanticipated areas. The scalable Teradata system makes growth an easy process. In fact, Teradata’s Customer Support Team performs the expansion—the DBA’s role is simply as an observer.
Expanding the Teradata system is similar to ordering the initial platform. With assistance from Teradata Professional Services, the DBA determines the amount of raw data on the existing Teradata system and the number of concurrent users, as well as a few critical queries. Then those numbers are added to the anticipated growth in each area. These values are input into a system-sizing calculator, which produces the additional platform requirements.
As with the original Teradata system, the additional platform and software will be pre-configured, delivered, installed and connected to the existing platform. The data redistribution utility is then run, which automatically rebalances the data on the system. The tool relocates any data that belongs on the new platform nodes had it been installed at the time the data was loaded. Once that data is relocated, the utility tool removes it from the old nodes. (No data movement occurs between the original nodes.) Redistributing the data requires downtime on the system, but the process normally takes less than a shift to complete.
A BETTER VALUE
The features of the Teradata system make it ideal for data warehouse applications. The balanced, purpose-built platform arrives ready to deliver the first application generally in days, instead of weeks or months.
With the automated data management features, DBAs are freed from having to micro-manage the file system and can, therefore, engage in other tasks and responsibilities. For instance, instead of the DBA constantly writing and tuning queries, the query optimizer allows the user to ask any question, anytime. The support and freedom provided by a data warehouse from Teradata empowers DBAs to concentrate on working with the user community to deliver greater business value to their organization.
WHAT TERADATA DBAS DON’T DO:
With the automatic features included in a Teradata Database, DBAs have fewer tasks and responsibilities for implementing and maintaining the system. As identified in this partial list of duties, Teradata DBAs have never been required to:
- Install an operating system (OS)
- Understand and set extensive OS tuning parameters
- Install the Teradata Database
- Understand and set extensive Teradata Database parameters
- Write programs/execute utilities that determine how to divide data into partitions
- Determine size and physical location of each table and index partition or simple tablespace
- Code/allocate/format partitions or underlying file structures
- Embed partition assignment into CREATE TABLE statements
- Determine level/degrees of parallelism to be assigned to tables/partitions/databases
- Assign and manage special buffer pools for parallel processing
- Associate tables/queries with parallel degrees
- Code/allocate/format temporary work space

