Determining the Release of Your Teradata System:

SELECT * FROM DBC.DBCINFO;

Fundamental Structured Query Language (SQL)

The access language for all modern relational database systems (RDBMS) is Structured Query Language (SQL). It has evolved over time to be the standard. The ANSI SQL group defines which commands and functionality all vendors should provide within their RDBMS.

There are three levels of compliance within the standard: Entry, Intermediate and Full. The three level definitions are based on specific commands, data types and functionalities. So, it is not that a vendor has incorporated some percentage of the commands; it is more that each command is categorized as belonging to one of the three levels. For instance, most data types are Entry level compliant. Yet, there are some that fall into the Intermediate and Full definitions.

Since the standard continues to grow with more options being added, it is difficult to stay fully ANSI compliant. Additionally, all RDBMS vendors provide extra functionality and options that are not part of the standard. These extra functions are called extensions because they extend or offer a benefit beyond those in the standard definition.

At the writing of this book, Teradata was fully ANSI Entry level compliant based on the 1992 Standards document. NCR also provides much of the Intermediate and some of the Full capabilities. This book indicates feature by feature which SQL capabilities are ANSI and which are Teradata specific, or extensions. It is to NCR’s benefit to be as compliant as possible in order to make it easier for customers of other RDBMS vendors to port their data warehouse to Teradata.

As indicated earlier, SQL is used to access, store, remove and modify data stored within a relational database, like Teradata. The SQL is actually comprised of three types of statements. They are: Data Definition Language (DDL), Data Control Language (DCL) and Data Manipulation Language (DML). The primary focus of this book is on DML and DDL. Both DDL and DCL are, for the most part, used for administering an RDBMS. Since the SELECT statement is used the vast majority of the time, we are concentrating on its functionality, variations and capabilities.

Everything in the first part of this chapter describes ANSI standard capabilities of the SELECT command. As the statements become more involved, each capability will be designated as either ANSI or a Teradata Extension.

Basic SELECT Command

Using the SELECT has been described like playing the game, Jeopardy. The answer is there; all you have to do is come up with the correct question.

The basic structure of the SELECT statement indicates which column values are desired and the tables that contain them. To aid in the learning of SQL, this book will capitalize the SQL keywords. However, when SQL is written for Teradata, the case of the statement is not important. The SQL statements can be written using all uppercase, lowercase or a combination; it does not matter to the Teradata PE.

The SELECT is used to return the data value(s) stored in the columns named within the SELECT command. The requested columns must be valid names defined in the table(s) listed in the FROM portion of the SELECT.

The following shows the format of a basic SELECT statement. In this book, the syntax uses expressions like: <column-name> (see Figure 1-1) to represent the location of one or more names required to construct a valid SQL statement:

$image\img00005.gif$

The structure of the above command places all keywords on the left in uppercase and the variable information such as column and table names to the right. Like using capital letters, this positioning is to aid in learning SQL. Lastly, although the use of SEL is acceptable in Teradata, with [ECT] in square brackets being optional, it is not ANSI standard.

Lastly, when multiple column names are requested in the SELECT, a comma must separate them. Without the separator, the optimizer cannot determine where one ends and the next begins.

The following syntax format is also acceptable:

SEL[ECT] <column-name> FROM <table-name> ;

Both of these SELECT statements produce the output report, but the above style is easier to read and debug for complex queries. The output display might appear as:

3 Rows Returned

<column-name>

aaaaaaaaaaaaaaaaaa

bbbbbbbbbbbbbbbb

cccccccccccccccccc

In the output, the column name becomes the default heading for the report. Then, the data contained in the selected column is displayed once for each row returned.

The next variation of the SELECT statement returns all of the columns defined in the table indicated in the FROM portion of the SELECT.

$image\img00150.gif$

The output of the above request uses each column name as the heading and the columns are displayed in the same sequence as they are defined in the table. Depending on the tool used to submit the request, care should be taken, because if the returned display is wider than the media (i.e. terminal=80 and paper=133); it may be truncated.

At times, it is desirable to select the same column twice. This is permitted and to accomplish it, the column name is simply listed in the SELECT column list more than once. This technique might often be used when doing aggregations or calculating a value, both are covered in later chapters.

The table below is used to demonstrate the results of various requests. It is a small table with a total of ten rows for easy comparison.

Student Table - contains 10 students

Student_ID	Last_Name	First_name	Class_code	Grade_Pt
PK			FK
UPI	NUSI		NUSI
123250 125634 234121 231222 260000 280023 322133 324652 333450 423400	Phillips Hanson Thomas Wilson Johnson McRoberts Bond Delaney Smith Larkins	Martin Henry Wendy Susie Stanley Richard Jimmy Danny Andy Michael	SR FR FR SO JR JR SR SO FR	3.00 2.88 4.00 3.80 1.90 3.95 3.35 2.00 0.00

Figure 2-1

For Example: the next SELECT might be used with Figure 2-1, to display the student number, the last name, first name, the class code and grade point for all of the students in the Student table:

SELECT *

FROM Student_Table ;

10 Rows returned

Student_ID	Last_Name	First_Name	Class_Code	Grade_Pt
423400	Larkins	Michael	FR	0.00
125634	Hanson	Henry	FR	2.88
280023	McRoberts	Richard	JR	1.90
260000	Johnson	Stanley	?	?
231222	Wilson	Susie	SO	3.80
234121	Thomas	Wendy	FR	4.00
324652	Delaney	Danny	SR	3.35
123250	Phillips	Martin	SR	3.00
322133	Bond	Jimmy	JR	3.95
333450	Smith	Andy	SO	2.00

Notice that Johnson has question marks in the grade point and class code columns. Most client software uses the question mark to represent missing data or an unknown value (NULL). More discussion on this condition will appear throughout this book. The other thing to note is that character data is aligned from left to right, the same as we read it and numeric is from right to left, from the decimal.

This SELECT returns all of the columns except the Student ID from the Student table:

$image\img00006.gif$

10 Rows returned

First_Name	Last_Name	Class_Code	Grade_Pt
Michael	Larkins	FR	0.00
Henry	Hanson	FR	2.88
Richard	McRoberts	JR	1.90
Stanley	Johnson	?	?
Susie	Wilson	SO	3.80
Wendy	Thomas	FR	4.00
Danny	Delaney	SR	3.35
Martin	Phillips	SR	3.00
Jimmy	Bond	JR	3.95
Andy	Smith	SO	2.00

There is no short cut for selecting all columns except one or two. Also, notice that the columns are displayed in the output in the same sequence they are requested in the SELECT statement.

WHERE Clause

The previous “unconstrained” SELECT statement returned every row from the table. Since the Teradata database is most often used as a data warehouse, a table might contain millions of rows. So, it is wise to request only certain types of rows for return.

By adding a WHERE clause to the SELECT, a constraint is established to potentially limit which rows are returned based on a TRUE comparison to specific criteria or set of conditions.

$image\img00007.gif$

The conditional check in the WHERE can use the ANSI comparison operators (symbols are ANSI / alphabetic is Teradata Extension):

Equal	Not Equal	Less Than	Greater Than	Less Than or Equal	Greater Than or Equal
=	<>	<	>	<=	>=
EQ	NE	LT	GT	LE	GE

Figure 2-2

The following SELECT can be used to return the students with a B (3.0) average or better from the Student table:

$image\img00151.gif$

5 Rows returned

Student_ID	Last_Name	Grade_Pt
231222	Wilson	3.80
234121	Thomas	4.00
324652	Delaney	3.35
123250	Phillips	3.00
322133	Bond	3.95

Without the WHERE clause, the AMPs return all of the rows in the table to the user. More and more Teradata user systems are getting to the point where they are storing billions of rows in a single table. There must be a very good reason for needing to see all of them. More simply put, you will always use a WHERE clause whenever you want to see only a portion of the rows in a table.

Compound Comparisons ( AND / OR )

Many times a single comparison is not sufficient to specify the desired rows. To add more functionality to the WHERE it is common to use more than one comparison. The multiple condition checks and column names are not separated by a comma, like column names. Instead, they must be connected using a logical operator.

The following is the syntax for using the AND logical operator:

$image\img00008.gif$

Notice that the column name is listed for each comparison separated by a logical operator; this will be true even when it is the same column being compared twice. The AND signifies that each individual comparison on both sides of the AND must be true. The final result of the comparison must be TRUE for a row to be returned.

This Truth Table illustrates this point using AND.

First Test Result AND	Second Test Result	Final Result
True	True	True
True	False	False
False	True	False
False	False	False

Figure 2-3

When using AND, different columns must be used because a single column can never contain more than a single data value.

Therefore, it does not make good sense to issue the next SELECT using an AND on the same column because no rows will ever be returned.

$image\img00009.gif$

No rows found

The above SELECT will never return any rows. It is impossible for a column to contain more than one value. No student has a 3.0 grade average AND a 4.0 average. They might have one or the other, but not both. It might contain one or the other, but never

both at the same time. The AND operator indicates both must be TRUE and should never be used between two comparisons on the same column.

By substituting an OR logical operator for the previous AND, rows will now be returned.

The following is the syntax for using OR:

$image\img00010.gif$

2 Rows returned

Student_ID	Last_Name	First_Name	Grade_Pt
234121	Thomas	Wendy	4.00
123250	Phillips	Martin	3.00

The OR signifies that only one of the comparisons on each side of the OR needs to be true for the entire test to result in a true and the row to be selected.

This Truth Table illustrates the results for the OR:

First Test Result OR	Second Test Result	Final Result
True	True	True
True	False	True
False	True	True
False	False	False

Figure 2-4

When using the OR, the same column or different column names may be used. In this case, it makes sense to use the same column because a row is returned when a column contains either of the specified values as opposed to both values as seen with AND.

It is perfectly legal and common practice to combine the AND with the OR in a single SELECT statement.

The next SELECT contains both an AND as well as an OR:

$image\img00016.gif$

2 Rows returned

Student_ID	Last_Name	First_Name	Class_Code	Grade_Pt
234121	Thomas	Wendy	FR	4.00
123250	Phillips	Martin	SR	3.00

At first glance, it appears that the comparison worked correctly. However, upon closer evaluation it is incorrect because Phillips is a senior and not a freshman.

When mixing AND with OR in the same WHERE clause, it is important to know that the AND is evaluated first. The previous SELECT actually returns all rows with a grade point of 3.0. Hence, Phillips was returned. The second comparison returned Thomas with a grade point of 4.0 and a class code of ‘FR’.

When it is necessary for the OR to be evaluated before the AND the use of parentheses changes the priority of evaluation. A different result is seen when doing the OR first. Here is how the statement should be written:

$image\img00012.gif$

1 Row returned

Last_Name	Class_Code	Grade_Pt
Thomas	FR	4.00

Now, only Thomas is returned and the output is correct.

Impact of NULL on Compound Comparisons

NULL is an SQL reserved word. It represents missing or unknown data in a column. Since NULL is an unknown value, a normal comparison cannot be used to determine whether it is true or false. All comparisons of any value to a NULL result in an unknown; it is neither true nor false. The only valid test for a null uses the keyword NULL without the normal comparison symbols and is explained in this chapter.

When a table is created in Teradata, the default for a column is for it to allow a NULL value to be stored. So, unless the default is over-ridden and NULL values are not allowed, it is a good idea to understand how they work.

A SHOW TABLE command (chapter 3) can be used to determine whether a NULL is allowed. If the column contains a NOT NULL constraint, you need not be concerned about the presence of a NULL because it is disallowed.

This AND Truth Table must now be used for compound tests when NULL values are allowed:

First Test Result AND	Second Test Result	Final Result
True	Unknown	Unknown
Unknown	True	Unknown
False	Unknown	False
Unknown	False	False
Unknown	Unknown	Unknown

Figure 2-5

This OR Truth Table must now be used for compound tests when NULL values are allowed:

First Test Result OR	Second Test Result	Final Result
True	Unknown	True
Unknown	True	True
False	Unknown	Unknown
Unknown	False	Unknown
Unknown	Unknown	Unknown

Figure 2-6

For most comparisons, an unknown (null) is functionally equivalent to a false because it is not a true. Therefore, when using any comparison symbol a row is not returned when it contains a NULL.

At the same time, the next SELECT does not return Johnson because all comparisons against a NULL are unknown:

$image\img00017.gif$

No rows found

V2R5: *** Failure 3731 The user must use IS NULL or IS NOT NULL to test for NULL values.

As seen in the above Truth tables, a comparison test cannot be used to find a NULL.

To find a NULL, it becomes necessary to make a slight change in the syntax of the conditional comparison. The coding necessary to find a NULL is seen in the next section.

Using NOT in SQL Comparisons

It can be fairly straightforward to request exactly which rows are needed. However, sometimes rows are needed that contain any value other than a specific value. When this is the case, it might be easier to write the SELECT to find what is not needed instead of what is needed. Then convert it to return everything else. This might be the situation when there are 100 potential values stored in the database table and 99 of them are needed. So, it is easier to eliminate the one value than it is to specifically list the desired 99 different values individually.

Either of the next two SELECT formats can be used to accomplish the elimination of the one value:

$image\img00018.gif$

This second version of the SELECT is normally used when compound conditions are required. This is because it is usually easier to code the SELECT to get what is not wanted and then to enclose the entire set of comparisons in parentheses and put one NOT in front of it. Otherwise, with a single comparison, it is easier to put NOT in front of the comparison operator without requiring the use of parentheses.

The next SELECT uses the NOT with an AND comparison to display seniors and lower classmen with grade points less than 3.0:

$image\img00019.gif$

6 Rows returned

Last_Name	First_Name	Class_Code	Grade_Pt
McRoberts	Richard	JR	1.90
Hanson	Henry	FR	2.88
Delaney	Danny	SR	3.35
Larkins	Michael	FR	0.00
Phillips	Martin	SR	3.00
Smith	Andy	SO	2.00

Without using the above technique of a single NOT, it is necessary to change every individual comparison. The following SELECT shows this approach, notice the other change necessary below, NOT AND is an OR:

Since you cannot have conditions like: NOT >= and NOT <>, they must be converted to < (not < and not =) and = (not, not =). It returns the same 5 rows, but also notice that the AND is now an OR:

$image\img00020.gif$

6 Rows returned

Last_Name	First_Name	Class_Code	Grade_Pt
McRoberts	Richard	JR	1.90
Hanson	Henry	FR	2.88
Delaney	Danny	SR	3.35
Phillips	Martin	SR	3.00
Larkins	Michael	FR	0.00
Smith	Andy	SO	2.00

Chart of individual conditions and NOT:

Condition	Opposite condition	NOT condition
<=	<	NOT >=
<>	=	NOT <>
AND	OR	OR
OR	AND	AND

Figure 2-7

To maintain the integrity of the statement, all portions of the WHERE must be changed, including AND, as well as OR. The following two SELECT statements illustrate the same concept when using an OR:

$image\img00021.gif$

1 Row returned

Last_Name

Hanson

In the earlier Truth table, the NULL value returned an unknown when checked with a comparison operator. When looking for specific conditions, an unknown was functionally equivalent to a false, but really it is an unknown.

These two Truth tables can be used together as a tool when mixing AND and OR together in the WHERE clause along with NOT.

This Truth Table helps to gauge returned rows when using NOT with AND:

First Test Result AND	Second Test Result	Result
NOT(True) = False	NOT(Unknown) = Unknown	False
NOT(Unknown) = Unknown	NOT(True) = False	False
NOT(False) = True	NOT(Unknown) = Unknown	Unknown
NOT(Unknown) = Unknown	NOT(False) = True	Unknown
NOT(Unknown) = Unknown	NOT(Unknown) = Unknown	Unknown

Figure 2-8

This Truth Table can be used to gauge returned rows when using NOT with OR:

First Test Result OR	Second Test Result	Result
NOT(True) = False	NOT(Unknown) = Unknown	Unknown
NOT(Unknown) = Unknown	NOT(True) = False	Unknown
NOT(False) = True	NOT(Unknown) = Unknown	True
NOT(Unknown) = Unknown	NOT(False) = True	True
NOT(Unknown) = Unknown	NOT(Unknown) = Unknown	Unknown

Figure 2-9

There is an issue associated with using NOT. When a NOT is done on a true condition, the result is a false. Likewise, the NOT of a false is a true. However, when a NOT is done with an unknown, the result is still an unknown. Whenever a NULL appears in the data for any of the columns being compared, the row will never be returned and the answer set will not be what is expected.

Another area where care must be taken is when allowing NULL values to be stored in one or both of the columns. As mentioned earlier, previous versions of Teradata had no concept of “unknown” and if a compare didn’t result in a true, it was false. With the emphasis on ANSI compatibility the unknown was introduced.

If NULL values are allowed and there is potential for the NULL to impact the final outcome of compound tests, additional tests are required to eliminate them. One way to eliminate this concern is to never allow a NULL value in any columns. However, this may not be appropriate and it will require more storage space because a NULL can be compressed. Therefore, when a NULL is allowed, the SQL needs to simply check for a NULL.

Therefore, using the expression IS NOT NULL is a good technique when NULL is allowed in a column and the NOT is used with a single or a compound comparison. This does require another comparison and could be written as:

$image\img00022.gif$

7 Rows returned

Last_Name	First_Name	Class_Code	Grade_Pt
Larkins	Michael	FR	0.00
Hanson	Henry	FR	2.88
McRoberts	Richard	R	1.90
Johnson	Stanley	?	?
Delaney	Danny	SR	3.35
Phillips	Martin	SR	3.00
Smith	Andy	SO	2.00

Notice that Johnson came back this time and did not appear previously because of the NULL values.

Later in this book, the COALESCE will be explored as another way to eliminate NULL values directly in the SQL instead of in the database.

TERADATA

Thursday, March 21, 2013

TD SQL