Escape SQL apostrophe

Many drivers have ways to escape SQL strings to make sure no malicious activity is going on.  Usually you can use a function in the driver that can take care of that.  However, if all you need is too escape a single quote, you can also use the double quote method:
 

reliability is key to gain customers' loyalty

becomes

reliability is key to gain customers'' loyalty

DB2 Table Structure Alterations

When an attribute on a table is dropped, or the data type changes, DB2 sets the table into REORG PENDING state.  That prevents a number of operations on the table, which can be problematic for a production database.  If there is no way around such a structural change, then a REORG command needs to be issued, such as:

REORG TABLE <table_name>;

to restore normal behavior for the table.


DB2 Query Optimization: Custom Stats Collection

One of the challenges in query optimization is to make sure that the query optimizer works with accurate estimates on cost of the different possible decision branches in the optimization process.   The accuracy of the estimates depends on the efficiency of the stats collection. 

DB2 assumes the data distribution is constant distribution. And that’s why there’s numfreq and numquantile options in RUNSTATS:

RUNSTATS ON TABLE table_name WITH DISTRIBUTION ON COLUMNS 
( col1 NUM_FREQVALUES N NUM_QUANTILES N, ...);
  • with numfreq, we pickup the top N values with most occurance (ex, if the data distribution is 1,1,1,1,1,1,2,2,2,3,3,5,6,7,8. with numfreq=3, we know “1” happened 6 times, “2” happened 3 times and “3” happened 2 times. So if the query predicate is looking for “1” or “2”, we’ll be get the correct estimation if stats is up to date).
  • with numquantile, it breaks the entire data distribution into N blocks, for example, if the distribution is “1,1,1,1,1,1,2,2,2,3,3,5,6,7,8”, when there’s no numfreq defined, and numquantile defines to 3, so it will be “1,1,1,1,1 | 1,2,2,2,3 | 3,5,6,7,8” We record the min/max for each block, and assume normal distribution inside each block. So if we are looking for c1=1, then we know the entire block 0 has 1, and lowest value in block 1 is 1, highest is 3, then the estimation will be 5 + 5/(3-1)=7.5. Same idea applied if c1=6, the search will be in block2, so the estimation could be 5/(8-3)=1

Once a good set of parameters is defined for RUNSTATS, you need to make sure that the automated maintenance doesn’t override the correct distributions with the default stats collection parameters.   You need to set up RUNSTATS PROFILE to define the autorunstats behavior. Otherwise the default behavior will be

RUNSTATS ON TABLE DB2USER.EMPLOYEE WITH DISTRIBUTION AND SAMPLED DETAILED INDEXES ALL

You can refer the following paper about RUNSTATS PROFILE:
http://www.ibm.com/developerworks/data/library/techarticle/dm-0509poon/
At this point, you might think: is there a way to automatically calculate based on statistics the correct parameters for the RUNSTATS sampling?  According to DB2 engineer Tao E Wang: “Usually I don’t have good method to get the exact value of numfreq/numquantile just based on calculation. There are too many variables in optimizer to consider, so usually if we want the optimizer pickup a specific index, we need to study the plan and understand why/how each filter factor coming from, and sometime we need to even setup optimizer profile to force the plan in order to study what the cost estimation looks like if it pickup the specified index… So basically I’d say numfreq/numquantile are the variables to “play with”. Once we understand the bad plan is caused by incorrect estimation, we can try to increase those 2 values until optimizer successfully get accurate estimation…”

More information is available here:
num_freqvalues – Number of frequent values retained configuration parameter
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000331.html

num_quantiles – Number of quantiles for columns configuration parameter
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/topic/com.ibm.db2.luw.admin.config.doc/doc/r0000334.html

Collecting distribution statistics for specific columns
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/topic/com.ibm.db2.luw.admin.perf.doc/doc/t0005073.html


DB2 Large Updates Without Logging

We have large amounts of data, and in development we have to often to bulk insert and updates by the millions.

DB2 allows us to dramatically speed up these operations by allowing to not log them.  The trick is to use:

ALTER TABLE <table_name> NOT LOGGED INITIALLY

However, one needs to be careful using this feature.  As DB2 support people point out:

If you performed rollback after ACTIVATE NOT LOGGED INITIALLY, the table will be marked as unavailable. You can use LOAD QUERY TABLE command to see the table state:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=/com.ibm.db2.luw.admin.cmd.doc/doc/r0002000.html

Here is a note about behavior of NOT LOGGED INITIALLY  you can find in manual of ALTER TABLE statement:

Note: If non-logged activity occurs against a table that has the NOT LOGGED INITIALLY attribute activated, and if a statement fails (causing a rollback), or a ROLLBACK TO SAVEPOINT is executed, the entire unit of work is rolled back (SQL1476N). Furthermore, the table for which the NOT LOGGED INITIALLY attribute was activated is marked inaccessible after the rollback has occurred and can only be dropped. Therefore, the opportunity for errors within the unit of work in which the NOT LOGGED INITIALLY attribute is activated should be minimized.

and some other notes in manual of CREATE TABLE statement:

The NOT LOGGED INITIALLY option is useful for situations where a large result set needs to be created with data from an alternate source (another table or a file) and recovery of the table is not necessary. Using this option will save the overhead of logging the data. The following considerations apply when this option is specified:

* When the unit of work is committed, all changes that were made to the table during the unit of work are flushed to disk.
* When you run the roll forward utility and it encounters a log record that indicates that a table in the database was either populated by the Load utility or created with the NOT LOGGED INITIALLY option, the table will be marked as unavailable. The table will be dropped by the roll forward utility if it later encounters a DROP TABLE log. Otherwise, after the database is recovered, an error will be issued if any attempt is made to access the table (SQLSTATE 55019). The only operation permitted is to drop the table.
* Once such a table is backed up as part of a database or table space back up, recovery of the table becomes possible.