1) Take out any / all-Scalar functions coded on columns in predicates. For example, this is the most common:
SELECT EMPNO, LASTNAME
FROM EMPLOYEE
WHERE YEAR(HIREDATE) = 2005
Should be coded as:
SELECT EMPNO, LASTNAME
FROM EMPLOYEE
WHERE HIREDATE BETWEEN ‘2005-01-01’ and ‘2005-12-31’
2) Take out any / all mathematics coded on columns in predicates. For example:
SELECT EMPNO, LASTNAME
FROM EMPLOYEE
WHERE SALARY * 1.1 > 50000.00
Should be coded as:
SELECT EMPNO, LASTNAME
FROM EMPLOYEE
WHERE SALARY > 50000.00 / 1.1
3) Stay away from ‘Distinct’ if at all possible.
If duplicates are to be eliminated from the result set, try:
- ‘Group By’ which looks to take advantage of any associated indexes to eliminate a sort for uniqueness.
- Rewriting the query using an ‘In’ or ‘Exists’ subquery.
This will work if the table causing the duplicates (due to a one to many relationship) does not have data being returned as part of the result set.
4) Minimize the SQL requests to DB2.
This is huge in performance tuning of programs, especially batch programs because they tend to process more data. Every time an SQL call is sent to the database manager, there is overhead in sending the SQL statement to DB2, going from one address space in the operating system to the DB2 address space for SQL execution.
In general, developers need to minimize:
- The number of time cursors are Opened/Closed
- The number of random SQL requests (noted as synchronized reads in DB2 monitors).
5) Give prominence to Stage 1 over Stage 2 Predicates.
Always try to code predicates as Stage 1 and index able. In general, Stage 2 predicates do not perform as well and consume extra CPU. See the IBM SQL Reference Guide to determine what predicates are Stage 1 vs. Stage 2 and make sure to go to the correct Version of DB2 when checking. Recommendation: Use Visual Explain.
IBM DB2 Manuals: Search on ==> Summary of Predicate Processing
6) Never put filtering logic within application code.
It is always best to have all the filtering logic that is needed written as predicates in a SQL statement. Do not leave some predicates out and have the database manager bring in extra rows and then eliminate / bypass some of the rows through program logic checks. (Some people call this Stage 3 processing)..
Deviate only when performance is an issue and not all other efforts have provided significant enough improvement in performance.
7) When using cursors, use ROWSET positioning and fetching using multi row fetch, multi row update, and multi row insert. New as of V8.
DB2 V8 introduced support for the manipulation of multiple rows on fetches, updates, and insert processing. Prior versions of DB2 would only allow for a program to process one row at a time during cursor processing. Now having the ability to fetch, update, or insert more than 1 row at a time reduces network traffic and other related costs associated with each call to DB2. The recommendation is to start with 100 row fetches, inserts, or updates, and then test other numbers. It has been proven many times that this process reduces runtime on average of 35%. Consult the IBM DB2 manuals for further detail and coding examples.
8) Take advantage of Scalar Full selects within the Select clause whenever possible. New as of V8:
Many times the output needed from SQL development requires a combination of Detail and aggregate data together. There are typically a number of ways to code this with SQL, but with the Scalar Full select now part of DB2 V8, there is now another option that is very efficient as long as indexes are being used.
For Example: Individual Employee Report with Aggregate Department Averages
SELECT E1.EMPNO, E1.LASTNAME,
E1.WORKDEPT, E1.SALARY, (SELECT AVG(E2.SALARY)
FROM EMPLOYEE E2
WHERE E2.WORKDEPT = E1.WORKDEPT)
AS DEPT_AVG_SAL
FROM EMPLOYEE E1
ORDER BY E1.WORKDEPT, E1.SALARY
9) Watch out for tablespace scans.
What do you do? If you as a developer see that a tablespace scan is occurring in your SQL execution, then go through the following checklist to help figure out why?
- The predicate(s) may be poorly coded in a non-indexable way.
- The predicates in the query do not match any available indexes on the table.
- The table could be small, and DB2 decides a tablespace scan may be faster than index processing.
- The catalog statistics say the table is small, or maybe there are no statistics on the table.
- The predicates are such that DB2 thinks the query is going to retrieve a large enough amount of data that would require a tablespace scan.
- The predicates are such that DB2 picks a non-clustered index, and the number of pages to retrieve is high enough based on total number of pages in the table to require a tablespace scan.
- The tablespace file or index files could physically be out of shape and need a REORG.
- The predicates in the query do not match any available indexes on the table
10) Only code the columns needed in the Select portion of the SQL statement.
Having extra columns can have an affect on:
- The optimizer choosing ‘Index Only’
- Expensiveness of any sorts
- Optimizer’s choice of join methods
0 comments:
Post a Comment