Monday, 29 March 2010

Tuning a Oracle JDBC Variable In List Query

I recently showed how to create a variable in list for a JDBC query as follows. The problem with this demo is it won't be a very effecient query as the table grows as it will most likely do a full table scan.

Variable in list with Oracle JDBC and RDBMS
http://theblasfrompas.blogspot.com/2008/02/variable-in-list-with-oracle-jdbc-and.html

To show this lets trace the SQL to verify the path of a query against a table as follows. This table has a PRIMARY KEY defined on the ID column which will be what we drive our in list query from. The query we are running is defined as follows:


select loadall_type(id, message_type, message) as "Data"
from loadall_table
where id in
(SELECT * FROM TABLE
(CAST(loadall_pkg.in_number_list('4, 67, 88, 1001') as message_id_nt)));

Table Definition

create table loadall_table
(id number,
message_type varchar2(1),
message varchar2(100),
CONSTRAINT loadall_table_PK PRIMARY KEY (id))
/
So by turning autotrace on for the in list query we can see a full table scan needs to be performed to retrieve 4 rows from a table with 100,000 rows. Sure that won't take long but if the table grows to 10 million rows for example then a full table scan is what we want to avoid.

SCOTT@linux11g> set autotrace on
SCOTT@linux11g> select loadall_type(id, message_type, message) as "Data"
2 from loadall_table
3 where id in
4 (SELECT * FROM TABLE
5 (CAST(loadall_pkg.in_number_list('4, 67, 88, 1001') as message_id_nt)));

Data(MESSAGE_ID, MESSAGE_TYPE, MESSAGE)
------------------------------------------------------------------------------------------------------

LOADALL_TYPE(4, 'M', 'Message at 4')
LOADALL_TYPE(67, 'M', 'Message at 67')
LOADALL_TYPE(88, 'M', 'Message at 88')
LOADALL_TYPE(1001, 'M', 'Message at 1001')


Execution Plan
----------------------------------------------------------
Plan hash value: 3431305811

-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 25 | 167 (2)| 00:00:03 |
|* 1 | HASH JOIN RIGHT SEMI | | 1 | 25 | 167 (2)| 00:00:03 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| IN_NUMBER_LIST | | | | |
| 3 | TABLE ACCESS FULL | LOADALL_TABLE | 100K| 2246K| 137 (1)| 00:00:02 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("ID"=VALUE(KOKBF$))


Statistics
----------------------------------------------------------
6 recursive calls
0 db block gets
487 consistent gets
0 physical reads
0 redo size
4188 bytes sent via SQL*Net to client
1148 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed


So if we can assume only a small amount of rows will ever be queried using the in list then a query as follows would be a lot more efficient to return the same data.

SCOTT@linux11g> select /*+ cardinality(plist 2) */ loadall_type(id, message_type, message) as "Data"
2 from loadall_table,
3 (SELECT * FROM TABLE
4 (CAST(loadall_pkg.in_number_list('4, 67, 88, 1001') as message_id_nt))) plist
5 where id = plist.column_value;

Data(MESSAGE_ID, MESSAGE_TYPE, MESSAGE)
------------------------------------------------------------------------------------------------------

LOADALL_TYPE(4, 'M', 'Message at 4')
LOADALL_TYPE(67, 'M', 'Message at 67')
LOADALL_TYPE(88, 'M', 'Message at 88')
LOADALL_TYPE(1001, 'M', 'Message at 1001')


Execution Plan
----------------------------------------------------------
Plan hash value: 113937112

------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 50 | 31 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 2 | 50 | 31 (0)| 00:00:01 |
| 3 | COLLECTION ITERATOR PICKLER FETCH| IN_NUMBER_LIST | | | | |
|* 4 | INDEX UNIQUE SCAN | SYS_C0050957 | 1 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID | LOADALL_TABLE | 1 | 23 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("ID"=VALUE(KOKBF$))


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
16 consistent gets
0 physical reads
0 redo size
1151 bytes sent via SQL*Net to client
428 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed

Be careful using this technique if your in list query contains more then 5% of the table as the newer query will force the use of the index and hence avoid a full table scan. If you know that a very small part of the table is returned and the records are not known until runtime a query as follows would work well for large tables, where the original query would not be as efficient.

We did two things here to tune this query.

1. Used a hint for force the use of an index

/*+ cardinality(plist 2) */

2. Altered the query to no longer use IN list. New Query as follows now.

select /*+ cardinality(plist 2) */
loadall_type(id, message_type, message) as "Data"
from
loadall_table,
(SELECT * FROM TABLE
(CAST(loadall_pkg.in_number_list('4, 67, 88, 1001') as message_id_nt))) plist
where id = plist.column_value;

No comments: