Friday, 16 March 2012

Using derby style table functions to load data into SQLFire

Loading data into SQLFire can be done various ways including using Spring Batch with CSV files, Apache DDLUtils or direct JDBC connections pulling data into SQLFire. The approach below is yet another way. In this example we load the Oracle HR schema table "departments" into a SQLFire distributed system.

1. Create a java class with code that simple queries the table data and returns it as aJDBC ResultSet.

package vmware.au.se.sqlfire.derby;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

import oracle.jdbc.OracleDriver;

public class DepartmentTable 
{

 public static String SQL = "select * from departments";
 
 public static ResultSet read() throws SQLException
 {
        Connection conn = getConnection();
        Statement  stmt = conn.createStatement(); 
        ResultSet  rset = stmt.executeQuery(SQL);
        
        return rset;
 }
 
 public static Connection getConnection() throws SQLException
 {
     String username = "hr";
     String password = "hr";
     String thinConn = "jdbc:oracle:thin:@172.16.101.70:1521/linux11gr2";
     DriverManager.registerDriver(new OracleDriver());
     Connection conn = DriverManager.getConnection(thinConn,username,password);
     conn.setAutoCommit(false);
     return conn;
 }
 
} 

2. Add the JAR file with the class above as well as the Oracle JDBC jar file to the system CLASSPATH as shown below.

export CUR_DIR=`pwd`
export CLASSPATH=$CUR_DIR/lib/ojdbc6.jar:$CUR_DIR/lib/derby-funct.jar

Note: This ensures when we start our SQLFire nodes they will have the classes avaiable on the classpath

3. Start the SQLFire servers as shown below.

> sqlf server start -server-groups=MYGROUP -locators=localhost[41111] -client-bind-address=localhost -client-port=1528 -dir=server1 -classpath=$CLASSPATH &

> sqlf server start -server-groups=MYGROUP -locators=localhost[41111] -client-bind-address=localhost -client-port=1529 -dir=server2 -classpath=$CLASSPATH &

4. Log into the distributed system using the CLI and run the SQL as follows to create the table in SQLFire which will store the same dataset from the Oracle "departments" table.
create diskstore STORE1;
 
call sys.set_eviction_heap_percentage_sg (85, 'MYGROUP');

drop table departments;

create table departments
(department_id int NOT NULL CONSTRAINT department_id_PK PRIMARY KEY,
 department_name varchar(40),
 manager_id int,
 location_id int)
partition by column (department_id)
SERVER GROUPS (MYGROUP)
persistent 'STORE1'
REDUNDANCY 1;

5. Log into the distributed system using the CLI and run the SQL below to create a function
CREATE FUNCTION externalDepartments
()
RETURNS TABLE
(
  DEPARTMENT_ID    INT,
  DEPARTMENT_NAME  VARCHAR( 40 ),
  MANAGER_ID       INT,
  LOCATION_ID      INT
)
LANGUAGE JAVA
PARAMETER STYLE DERBY_JDBC_RESULT_SET
READS SQL DATA
EXTERNAL NAME 'vmware.au.se.sqlfire.derby.DepartmentTable.read'; 

6. Log into the distributed system using the CLI and run the SQL below to insert data into the "departments" table in SQLFire using the function we created at #5.
insert into departments
select s.*
FROM TABLE (externalDepartments()) s;

7. Verify we now have our departments table in SQLFire with data.
[Fri Mar 16 08:55:40 papicella@:~/sqlfire/vFabric_SQLFire_101/pasdemos/oraclehr ] $ sqlf
sqlf version 10.4
sqlf> connect client 'localhost:1527';
sqlf> select * from departments;
DEPARTMENT&|DEPARTMENT_NAME                         |MANAGER_ID |LOCATION_ID
----------------------------------------------------------------------------
230        |IT Helpdesk                             |NULL       |1700       
120        |Treasury                                |NULL       |1700       
10         |Administration                          |200        |1700       
240        |Government Sales                        |NULL       |1700       
130        |Corporate Tax                           |NULL       |1700       
20         |Marketing                               |201        |1800       
250        |Retail Sales                            |NULL       |1700       
140        |Control And Credit                      |NULL       |1700       
30         |Purchasing                              |114        |1700       
260        |Recruiting                              |NULL       |1700       
150        |Shareholder Services                    |NULL       |1700       
40         |Human Resources                         |203        |2400       
270        |Payroll                                 |NULL       |1700       
160        |Benefits                                |NULL       |1700       
50         |Shipping                                |121        |1500       
170        |Manufacturing                           |NULL       |1700       
60         |IT                                      |103        |1400       
180        |Construction                            |NULL       |1700       
70         |Public Relations                        |204        |2700       
190        |Contracting                             |NULL       |1700       
80         |Sales                                   |145        |2500       
200        |Operations                              |NULL       |1700       
90         |Executive                               |100        |1700       
210        |IT Support                              |NULL       |1700       
100        |Finance                                 |108        |1700       
220        |NOC                                     |NULL       |1700       
110        |Accounting                              |205        |1700       

27 rows selected 

More info on derby-style functions can be found here.

http://db.apache.org/derby/docs/10.4/devguide/cdevspecialtabfuncs.html

5 comments:

Carter Shanklin said...

Pas, also check out sysj.install_jar as (IMO) a better alternative to setting the classpath.

Example: "call sqlj.install_jar('/home/carter/examples.jar', 'TEST', 0)"

There is also a remove_jar and you can update classes without bouncing the server.

Drikus said...

Thanks for the good example. I have a problem with my own table function, i.e. where you have the reference vmware.au.se.sqlfire.derby.DepartmentTable.read, to the jar file. Mine is very similar. I have also added it to the classpath, with an explicit jar file refernece, but it simply won't see the package.

I have even placed the jar file in a folder structure that resembles the package name.

Creating the table function, does not fail [EXTERNAL NAME 'co.za.acme.trdSchema.GetPrivateTrades.read';], but when running the INSERT statement [FROM TABLE (EQUITYTRADE.externalTrades()) t;], it produces the 42X51 error.

I am running on Windows, which is the only difference, it would seem.

Your thoughts are appreciated.

Pas Apicella said...

Drikus,

1. What version of SQLFire are you using please ensure it's 102 which is the latest version

2. What does your classpath setting look like can you show me that?

3. The SQLFire logs for each server should show the JAR file with your class being picked up if not then the JAR your using is not being picked up.

Thanks..

Drikus said...

Pas, thanks for the follow-up, I appreciate it.

I am on version 102 (trial).

The class path has a reference to the correct location: C:\SQLFire\Java Classes\co\za\jse\trdSchema\GetPrivateTrades.jar, for the started server.

I make the client connection from a machine (not related) via a Locator (not related) to the one on which the database is hosted, i.e. three separate computers. The table function is created on the database, and the jar is file in the folder location (as below), on the actual machine, where the database runs. I then attempt to run the data-harvest, from the first computer, via the Locator, on the database computer.

I am able to log onto the external db, using sqlf, and query the database.

Class Path:
C:\SQLFire\vFabric_SQLFire_102\lib\sqlfire.jar
C:\SQLFire\vFabric_SQLFire_102\lib\sqlfiretools.jar
C:\SQLFire\vFabric_SQLFire_102\lib\sqlfireclient.jar
C:\SQLFire\vFabric_SQLFire_102\lib\jline.jar
C:\SQLFire\vFabric_SQLFire_102\lib\jna.jar
C:\SQLFire\vFabric_SQLFire_102\lib\commons-cli.jar
.
C:\Program Files (x86)\Java\jre7\lib\ext\
C:\Data Drivers\jtds-1.2.5.jar
C:\SQLFire\Java Classes\co\za\jse\trdSchema\GetPrivateTrades.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\ant\ant.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\commons-beanutils-1.7.0.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\commons-codec-1.3.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\commons-collections-3.1.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\commons-dbcp-1.2.1.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\commons-digester-1.7.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\commons-lang-2.1.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\commons-logging-1.0.4.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\commons-pool-1.2.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\dom4j-1.4.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\log4j-1.2.8.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\stax-api-1.0.1.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\lib\wstx-asl-3.0.2.jar
C:\SQLFire\vFabric_SQLFire_102\lib\ddlutils\dist\DdlUtils-1.0.jar
C:\SQLFire\vFabric_SQLFire_102\lib\commons-logging.jar
C:\SQLFire\vFabric_SQLFire_102\lib\commons-modeler-2.0.jar
C:\SQLFire\vFabric_SQLFire_102\lib\mx4j.jar
C:\SQLFire\vFabric_SQLFire_102\lib\mx4j-remote.jar
C:\SQLFire\vFabric_SQLFire_102\lib\mx4j-tools.jar
C:\SQLFire\vFabric_SQLFire_102\lib\mail.jar

I have verified that the package name is correctly spelled throughout.

Thank you for your time and effort.

Drikus said...

Hi Pas,

I figured out what the problem is with my CLASSPATH.

The actual folder structure, where the class/ jar is stored, up to the start of the package name, is considered superfluous to the java class, but is important to the execution.

In your example, you expose your function externally as: vmware.au.se.sqlfire.derby.DepartmentTable, with "DepartmentTable", as the class name. When this function is called, it relies on this fully qualified name, starting at the point where the folder structure stops, i.e. the CLASSPATH.

Example: folder- c:\myClasses\vmware\au\se\sqlfire\derby. The actual jar is stored inside the folder "derby". Add the folder path: c:\myClasses to CLASSPATH. When the function is called, it looks at CLASSPATH, to find the starting point for the EXTERNAL NAME of the function, and follows the folder-structure via the name, to get to the jar that is executed.

I would imagine, if the entire path, i.e. C:\myClasses\"vmware.au.se.sqlfire.derby" is added to CLASSPATH, the function-EXTERNAL-NAME should be, "DepartmentTable.read" only.

I opted to put the short path (without jar file reference) in the CLASSPATH, and leave the fully qualified path in the EXTERNAL NAME of the function, while storing the jar in a folder structure that mimics the name. This works for me.

Thanks for your time and effort.