Search This Blog

Friday, 26 July 2013

Friday, 19 July 2013

Pivotal SQLFire - Shutting down members when authentication is on

In a previous blog entry I detailed how to use the DEV BULIT IN authentication mode for SQLFire.

http://theblasfrompas.blogspot.com.au/2013/07/securing-schemas-in-pivotal-sqlfire.html

When you need to shutdown members with authentication on you would do the following.

1. Create a directory called "shutdown".

2. Copy sqlfire.properties into that directory

sqlfire.auth-provider=BUILTIN
sqlfire.sql-authorization=true
sqlfire.user.admin=adminpassword
sqlfire.user.locatoradmin=adminpassword
sqlfire.user.serveradmin=adminpassword

3. Change into the shutdown directory and run a script as follows.

sqlf shut-down-all -locators=localhost[41111] -user=admin -password=adminpassword

This will only shutdown the members you would still need to stop the locator if you wanted to do that.

Wednesday, 17 July 2013

Securing Schemas in Pivotal SQLFire


Authentication/authorization is off by default in SQLFire if servers have not specified it. That is why SQLFire connections don't need a user/password if not configured explicitly on server. Once authorization has been configured on server, users will have access to only their database objects. This is illustrated below. In this example we use PLAIN text for the passwords just to illustrate how we secure a schema

1. Create a locator script as follows

sqlf locator start -user=admin -password=adminpassword -sqlfire.sql-authorization=true -peer-discovery-address=Pas-Apicellas-MacBook-Pro.local -peer-discovery-port=41111 -client-bind-address=Pas-Apicellas-MacBook-Pro.local -client-port=1527 -dir=locator

2. Create a server start script as follows

sqlf server start $JVM_OPTS -user=admin -password=adminpassword -sqlfire.sql-authorization=true -max-heap=$MAXHEAP -initial-heap=$MINHEAP -critical-heap-percentage=85 -eviction-heap-percentage=75 -server-groups=MYGROUP -conserve-sockets=false -locators=Pas-Apicellas-MacBook-Pro.local[41111] -client-bind-address=Pas-Apicellas-MacBook-Pro.local -client-port=1528 -dir=server1

Note: Ensure that a sqlfire.properties file exists for the locator/server as shown below. 

sqlfire.properties

sqlfire.auth-provider=BUILTIN
sqlfire.sql-authorization=true
sqlfire.user.admin=adminpassword
sqlfire.user.locatoradmin=adminpassword
sqlfire.user.serveradmin=adminpassword

3. As the admin using connect with sqlf and create a database user account which will be the schema owner.
  
sqlf> connect client '127.0.0.1:1527;user=admin;password=adminpassword';
sqlf> CALL SYS.CREATE_USER('pas', 'pas');
Statement executed. 

4. Connect with a JDBC client as shown below.
  
private Connection getConnection() throws SQLException
 {
  Connection conn = null;
  conn = DriverManager.getConnection
      ("jdbc:sqlfire://127.0.0.1:1527/", "pas", "pas");
  return conn; 
 }

When the JDBC connection connects successfully it will then be in the context of the schema "pas" and will have full rights / access to that schema to be able to create database objects etc.

If we provide an invalid username/password we get an error as follows, hence securing access to the schema with a built in username we created at step #3 above.


Exception in thread "main" java.sql.SQLNonTransientConnectionException: Connection authentication failure occurred.  Reason: userid 'pas' or password invalid.


Pivotal HD - Talking true SQL on HDFS

Pivotal HD is an Apache Hadoop distribution that natively integrates the industry-leading Pivotal Greenplum massively parallel processing (MPP) database technology with the Apache Hadoop framework. See this link for it's release information. http://www.gopivotal.com/pivotal-products/pivotal-data-fabric/pivotal-hd

In this blog entry I want to detail, how we bring a full compliant ansi standard SQL into hadoop. The two components of Pivotal HD critical for this are as follows.

HAWQ - SQL query processor based on GPDB running on HDFS

HAWQ is a parallel SQL query engine that combines the merits of the Greenplum Database Massively Parallel Processing (MPP) relational database engine and the Hadoop parallel processing framework. HAWQ supports SQL and native querying capability against various data sources in different popular formats. It provides linear scalable storage solution for managing terabytes or petabytes of data at low cost

GFXF - Extension Framework component of HAWQ to create external tables

GPXF is the external table interface to HAWQ that enables querying data stored in HDFS, HBase, and HIVE. In addition, it exposes an API that enables a parallel connection to additional data sources.

Example

So here is a very simple example of how this works. We already have an existing Pivotal HD install with the HAWQ and GPXF components.

1. First lets generate some data using a simple script as follows

for i in {1..1000000}
do
   echo "$i|person$i"
done


2. Create the file person.txt as shown below and verify we have 1 million person entries
  
[gpadmin@pivhdsne pas]$ ./data2.sh > person.txt
[gpadmin@pivhdsne pas]$ cat person.txt | wc -l
1000000
[gpadmin@pivhdsne pas]$ tail person.txt 
999991|person999991
999992|person999992
999993|person999993
999994|person999994
999995|person999995
999996|person999996
999997|person999997
999998|person999998
999999|person999999
1000000|person1000000
3. Now lets copy the file person.txt onto the HDFS as shown below

hadoop fs -put person.txt /

4. At this point we have the data file on HDFS so we can create an external table using HAWQ so we can query the data using SQL. To create an external table we would use SQL as follows
  
demo=# CREATE EXTERNAL TABLE person ( id int, name text)
demo-# LOCATION ('gpxf://pivhdsne:50070/person.txt?Fragmenter=HdfsDataFragmenter') FORMAT 'TEXT' (DELIMITER = '|');
CREATE EXTERNAL TABLE
Time: 3.843 ms

GPXF enables External table interface inside HAWQ to read data stored in Hadoop ecosystem which enables loading and querying data stored in
  • HDFS
  • HBase
  • Hive
The supported data formats for GPXF include the following.
  • Text
  • Avro
  • Hive - Text, Sequence and RCFile formats
  • HBase

Note: GPXF has an extensible framework API to enable custom connector development for other data sources and custom formats

5. Now at this point we can query data sitting on our HDFS using SQL , some examples below.

- Getting a count(*)
  
demo=# select count(*) from person;
  count  
---------
 1000000
(1 row)

Time: 1396.451 ms

- Querying the first 10 records only
  
demo=# select * from person limit 10;
 id |   name   
----+----------
  1 | person1
  2 | person2
  3 | person3
  4 | person4
  5 | person5
  6 | person6
  7 | person7
  8 | person8
  9 | person9
 10 | person10
(10 rows)

Time: 263.435 ms

More Information

Fore more information on Pivotal HD see the links below.

http://www.gopivotal.com/pivotal-products/pivotal-data-fabric/pivotal-hd

http://www.greenplum.com/products/pivotal-hd

http://pivotalhd.cloudfoundry.com/index.html