DB2 Query Writing Best Practices

Problem

Writing efficient SQL SELECTqueries for DB2 can be tricky. Some general SQL guidelines apply, and there are also specific practices that apply only to DB2.

Solution

Here is a list of tips you can use to write good SELECT queries:

  • Avoid SELECT * in your queries. Instead, state the columns you want to select.
  • Use the consistent acronyms to alias your table names (makes it easier to read joins)
  • Use FETCH ONLY x ROWS to limit the results set
  • Use CASE or MERGE instead of UNION ALL

References


Creating a User Defined Function (UDF) in Java for IBM DB2 9.7

Intro

One of the great features of DB2 is that it can be extended with custom code using SQL, C/C++, Java and COBOL. One of the great new features added to DB2 in 9.7 is the ability to run native Oracle PL/SQL code, that capability opens up a world of possibilities, and at the same time, the need to extend DB2 with functions that might have been available on the Oracle install and now need to be replicated on DB2. In this short article I describe how to extend DB2 9.7 with Java UDFs.

Prerequisites

  • Check that Java is properly installed and configured on DB2:
    db2 get dbm config|grep JDK

    The output should show the path to the JDK installation in DB2. If that isn’t the case, you need to consult the DB2 install manual and make sure Java is installed and configured for DB2.
  • Make sure that your user has access to java and javac, and that it is the same version or compatible with DB2’s JDK
  • Set the database manager parameter KEEPFENCED to NO, so that the UDF will be reloaded in each call.

Writing the test code

  • Use your favorite text editor or IDE and create a file called udftest.java with the sample code:
    public class udftest {
            public static String udftest(String domainName) {
                    return "Domain Name: " + domainName;
            }
    }
  • Compile the java program:
    javac udftest.java
  • Create a jar file:
    jar cvf udftest.jar udftest.class
  • Register the jar file with DB2:
    db2 "CALL sqlj.install_jar('file:/home/bnagy/dev/java/udf/udftest.jar','UDFTESTJAR')"

Creating the UDF in DB2

  • Use your favorite text editor or IDE and create a file called udftest-create.sql with the sample code:
    CONNECT TO SAMPLE;
    CREATE FUNCTION udftest(VARCHAR(255)) RETURNS VARCHAR(255)
    FENCED
    EXTERNAL NAME 'UDFTESTJAR:udftest.udftest'
    NOT VARIANT NO SQL PARAMETER STYLE java LANGUAGE java
    NO EXTERNAL ACTION;
    CONNECT RESET;
  • Run the scritpt in DB2:
    db2 -tvf udftest-create.sql

Testing

To check that the function works, run the following command on the command line:
db2 "select udftest('test.com') from sysibm.sysdummy1"

You should get a regular SELECT result set with the expected values. If something goes wrong, you can look at the db2diag.log that will contain any Java stack trace that would be generated.

References


IBM Expands in Analytics Market

Infoworld’s Pete Babb has an article about IBM to acquire analytics firm SPSS. Pete sees this acquisition by IBM in line with the earlier acquisition of Cognos also by IBM. He underlines that the key aspect of this acquisition is the focus on predictive analysis. IBM clearly shows a preference for solutions that work best with its core information platform DB2. Some analysts question whether IBM will fund development for compatibility on competing information platforms such as Oracle and the emerging Open Source options. According to SAS – one of the main competitors of SPSS in that field – they may have an opportunity to increase they market share over SPSS if IBM will focus on DB2 at the expense of other information platforms.


Connection pooling with mod_perl

I found this info in the PostgreSQL archives. Here are  2 methods:

Best method from Dan Lyke: Apache::DBI will pool across Perl programs, and you don’t have to change anything in your scripts.

Next best method from Gilles DAROLD: in your perl script use the following code

use vars qw($dbh);

$dbh ||= DBI::connect($datasrc, $dbuser, $dbpwd);

These can be use to create a database connection class in Perl that can handle transparently the connection / reconnection and disconnection for your code. The end use code doesn’t have to be OO, thus the connection object can be reused in any legacy code.


How to delete records with duplicate fields, such as dupplicate emails?

To delete records where only a field is duplicated, we can use a similar technique to the general duplicate removal technique:

CREATE TABLE temp AS SELECT DISTINCT ON (email) * FROM table_to_deduplicate;
DROP TABLE table_to_deduplicate;
ALTER TABLE temp RENAME TO table_to_deduplicate;

Be mindful of the fact that this process will not preserve the constraints on the original table. So if you have indexes, NOT NULL attributes, primary keys, foreign keys, you’ll have to recreate them.


How to delete dupplicate records in DB2, Oracle, MySQL, and PostgreSQL

To delete dupplicate records in SQL, the following sequence of commands will do the trick:

CREATE TABLE temp AS SELECT DISTINCT * FROM table_to_deduplicate;
DROP TABLE table_to_deduplicate;
ALTER TABLE temp RENAME TO table_to_deduplicate;

Be mindful of the fact that this process will not preserve the constraints on the original table. So if you have indexes, NOT NULL attributes, primary keys, foreign keys, you’ll have to recreate them.


DB2 Client Install on Linux

Allowing client machines to connect to a DB2 database is not as simple as regular databases. If you are using a client application, such as a Java web application, usually the application server has what is needed to configure a DB2 connection pool. However, if you need to access DB2 from the command line, the process is a little more complicated. Here are steps that will work with IBM DB2 v9.5 (example is for AMD64 architecture):

  • Get download link from http://www-306.ibm.com/software/data/db2/9/download.html
  • Run the following commands:
    wget <use link retrieved above>
    tar xvzf v9.5_linuxx64_client.tar.gz
    client/db2_install (accept defaults)
    groupadd -g 999 db2iadm1
    groupadd -g 998 db2fadm1
    groupadd -g 997 dasadm1
    useradd -u 1004 -g db2iadm1 -m -d /home/db2inst1 db2inst1
    useradd -u 1003 -g db2fadm1 -m -d /home/db2fenc1 db2fenc1
    useradd -u 1002 -g dasadm1 -m -d /home/dasusr1 dasusr1
    /opt/ibm/db2/V9.5/instance/db2icrt -a server -u db2fenc1 db2inst1
    cp /home/db2inst1/sqllib/db2profile /etc/profile.d/db2profile.sh
  • If you run into errors during the install process such as:

    /root/client/db2/linuxamd64/install/db2_install: line 277: /root/tmp/db2_install.15043.rsp: Not a directory
    /root/client/db2/linuxamd64/install/db2_install: line 285: /root/tmp/db2_install.15043.rsp: Not a directory
    /root/client/db2/linuxamd64/install/db2setup: /tmp/db2.tmp.15043/db2/linuxamd64/install//db2chgpath: /bin/sh: bad interpreter: Permission denied
    /root/client/db2/linuxamd64/install/db2setup: /tmp/db2.tmp.15043/db2/linuxamd64/install//db2setup_exec: /bin/sh: bad interpreter: Permission denied

    Because /tmp is mounted with the noexec option the db2_install fails. It looks like it prefers to copy all the files to tmp then install (not a secure method).

    Open up the platform specific db2_install file (noted above in the errors)

    db2/linuxamd64/install/db2_install

    Edit the TMPDIR line:

    TMPDIR=${DB2TMPDIR:-/tmp}

    to something like

    TMPDIR=${DB2TMPDIR:-/root/tmp}

    (*directory must exist or be created!)

    Next, make sure DB2TMPDIR is a valid environment variable matching above:

     [root@server ~/] export DB2TMPDIR=~/tmp

    Then start the db2_install from the top level directory, IE

     ./db2_install
  • there is one more step: adding the catalog entry to for the remote database which consists of two steps
  • create the instance alias using the following commands:
        db2 "catalog tcpip node remoteinst remote remotehost server 50000"
        db2 "terminate"
    
  • create the database alias using the following commands:
        db2 "catalog database testdb as remotedb at node remoteinst authentication server"
        db2 "terminate"

DB2 crashes when LOAD used on generated field

When trying to LOAD table data where one of the rows are generated, DB2 9.1 and 9.5 on Linux and AIX will crash if the generated field uses a user defined function (fenced or unfenced).  IBM is aware of the issue, and the following APARs exist to track the problem:

  • V9.1 APAR is IZ48197
  • V9.5 APAR is IZ48200

In the mean time, the workaround is to put something like the following in a file:
alter table <table> activate not logged initially;
import from <file> of del insert into <table>;
commit;

And execute the above file with “db2 +c -tvf <filename>“.


PostgreSQL to DB2 export conversion

PostgreSQL and DB2 use different styles of conventions to represent NULL. In PostgreSQL, the convention is to have “N” in the data to be loaded with the COPY command. In DB2, the LOAD command can be set to consider and empty field as NULL. Thus, to convert from DB2 to POstgreSQL style data dump, assuming TAB as the delimiter, we would use:

sed -e 's/tt/t\Nt/g' -e 's/t$/t\N/g' -e 's/tt/t\Nt/g' db2_file > pg_file

Conversely, to convert from PostgreSQL to DB2 style, with TAB delimited data:

sed -e 's/t\Nt/tt/g' -e 's/t\N$/t/g' -e 's/t\Nt/tt/g' db2_file > pg_file

The reason we need to go with 3 steps is the following:

  • step 1: convert TAB NULL TAB
  • step2: convert TAB NULL at the end of line
  • step3: convert TAB NULL TAB from sequences where originally there was TAB NULL TAB NULL

All this has been tested on AIX and Linux.