A while ago I started working on a web based utility for Pivotal SQLFire known as SQLFire*Web. Today it was finally released as an open source project. See the link below for more details.
http://blog.gopivotal.com/products/new-open-source-download-browser-based-schema-management-for-pivotal-sqlfire
Search This Blog
Friday, 26 July 2013
Friday, 19 July 2013
Pivotal SQLFire - Shutting down members when authentication is on
In a previous blog entry I detailed how to use the DEV BULIT IN authentication mode for SQLFire.
http://theblasfrompas.blogspot.com.au/2013/07/securing-schemas-in-pivotal-sqlfire.html
When you need to shutdown members with authentication on you would do the following.
1. Create a directory called "shutdown".
2. Copy sqlfire.properties into that directory
sqlfire.auth-provider=BUILTIN
sqlfire.sql-authorization=true
sqlfire.user.admin=adminpassword
sqlfire.user.locatoradmin=adminpassword
sqlfire.user.serveradmin=adminpassword
3. Change into the shutdown directory and run a script as follows.
sqlf shut-down-all -locators=localhost[41111] -user=admin -password=adminpassword
This will only shutdown the members you would still need to stop the locator if you wanted to do that.
http://theblasfrompas.blogspot.com.au/2013/07/securing-schemas-in-pivotal-sqlfire.html
When you need to shutdown members with authentication on you would do the following.
1. Create a directory called "shutdown".
2. Copy sqlfire.properties into that directory
sqlfire.auth-provider=BUILTIN
sqlfire.sql-authorization=true
sqlfire.user.admin=adminpassword
sqlfire.user.locatoradmin=adminpassword
sqlfire.user.serveradmin=adminpassword
sqlf shut-down-all -locators=localhost[41111] -user=admin -password=adminpassword
This will only shutdown the members you would still need to stop the locator if you wanted to do that.
Wednesday, 17 July 2013
Securing Schemas in Pivotal SQLFire
Authentication/authorization is off by default in SQLFire if servers have not specified it. That is why SQLFire connections don't need a user/password if not configured explicitly on server. Once authorization has been configured on server, users will have access to only their database objects. This is illustrated below. In this example we use PLAIN text for the passwords just to illustrate how we secure a schema
1. Create a locator script as follows
sqlf locator start -user=admin -password=adminpassword -sqlfire.sql-authorization=true -peer-discovery-address=Pas-Apicellas-MacBook-Pro.local -peer-discovery-port=41111 -client-bind-address=Pas-Apicellas-MacBook-Pro.local -client-port=1527 -dir=locator
2. Create a server start script as follows
sqlf server start $JVM_OPTS -user=admin -password=adminpassword -sqlfire.sql-authorization=true -max-heap=$MAXHEAP -initial-heap=$MINHEAP -critical-heap-percentage=85 -eviction-heap-percentage=75 -server-groups=MYGROUP -conserve-sockets=false -locators=Pas-Apicellas-MacBook-Pro.local[41111] -client-bind-address=Pas-Apicellas-MacBook-Pro.local -client-port=1528 -dir=server1
Note: Ensure that a sqlfire.properties file exists for the locator/server as shown below.
sqlfire.properties
sqlfire.auth-provider=BUILTIN
sqlfire.sql-authorization=true
sqlfire.user.admin=adminpassword
sqlfire.user.locatoradmin=adminpassword
sqlfire.user.serveradmin=adminpassword
sqlf> connect client '127.0.0.1:1527;user=admin;password=adminpassword'; sqlf> CALL SYS.CREATE_USER('pas', 'pas'); Statement executed.
4. Connect with a JDBC client as shown below.
private Connection getConnection() throws SQLException { Connection conn = null; conn = DriverManager.getConnection ("jdbc:sqlfire://127.0.0.1:1527/", "pas", "pas"); return conn; }
When the JDBC connection connects successfully it will then be in the context of the schema "pas" and will have full rights / access to that schema to be able to create database objects etc.
If we provide an invalid username/password we get an error as follows, hence securing access to the schema with a built in username we created at step #3 above.
Exception in thread "main" java.sql.SQLNonTransientConnectionException: Connection authentication failure occurred. Reason: userid 'pas' or password invalid.
Pivotal HD - Talking true SQL on HDFS
Pivotal HD is an Apache Hadoop distribution that natively integrates
the industry-leading Pivotal Greenplum massively parallel processing (MPP)
database technology with the Apache Hadoop framework. See this link for it's release information. http://www.gopivotal.com/pivotal-products/pivotal-data-fabric/pivotal-hd
In this blog entry I want to detail, how we bring a full compliant ansi standard SQL into hadoop. The two components of Pivotal HD critical for this are as follows.
HAWQ - SQL query processor based on GPDB running on HDFS
HAWQ is a parallel SQL query engine that combines the merits of the Greenplum Database Massively Parallel Processing (MPP) relational database engine and the Hadoop parallel processing framework. HAWQ supports SQL and native querying capability against various data sources in different popular formats. It provides linear scalable storage solution for managing terabytes or petabytes of data at low cost
GFXF - Extension Framework component of HAWQ to create external tables
GPXF is the external table interface to HAWQ that enables querying data stored in HDFS, HBase, and HIVE. In addition, it exposes an API that enables a parallel connection to additional data sources.
Example
So here is a very simple example of how this works. We already have an existing Pivotal HD install with the HAWQ and GPXF components.
1. First lets generate some data using a simple script as follows
for i in {1..1000000}
do
echo "$i|person$i"
done
2. Create the file person.txt as shown below and verify we have 1 million person entries
hadoop fs -put person.txt /
4. At this point we have the data file on HDFS so we can create an external table using HAWQ so we can query the data using SQL. To create an external table we would use SQL as follows
GPXF enables External table interface inside HAWQ to read data stored in Hadoop ecosystem which enables loading and querying data stored in
Note: GPXF has an extensible framework API to enable custom connector development for other data sources and custom formats
5. Now at this point we can query data sitting on our HDFS using SQL , some examples below.
- Getting a count(*)
- Querying the first 10 records only
http://www.gopivotal.com/pivotal-products/pivotal-data-fabric/pivotal-hd
http://www.greenplum.com/products/pivotal-hd
http://pivotalhd.cloudfoundry.com/index.html
In this blog entry I want to detail, how we bring a full compliant ansi standard SQL into hadoop. The two components of Pivotal HD critical for this are as follows.
HAWQ - SQL query processor based on GPDB running on HDFS
HAWQ is a parallel SQL query engine that combines the merits of the Greenplum Database Massively Parallel Processing (MPP) relational database engine and the Hadoop parallel processing framework. HAWQ supports SQL and native querying capability against various data sources in different popular formats. It provides linear scalable storage solution for managing terabytes or petabytes of data at low cost
GFXF - Extension Framework component of HAWQ to create external tables
GPXF is the external table interface to HAWQ that enables querying data stored in HDFS, HBase, and HIVE. In addition, it exposes an API that enables a parallel connection to additional data sources.
Example
So here is a very simple example of how this works. We already have an existing Pivotal HD install with the HAWQ and GPXF components.
1. First lets generate some data using a simple script as follows
for i in {1..1000000}
do
echo "$i|person$i"
done
2. Create the file person.txt as shown below and verify we have 1 million person entries
[gpadmin@pivhdsne pas]$ ./data2.sh > person.txt [gpadmin@pivhdsne pas]$ cat person.txt | wc -l 1000000 [gpadmin@pivhdsne pas]$ tail person.txt 999991|person999991 999992|person999992 999993|person999993 999994|person999994 999995|person999995 999996|person999996 999997|person999997 999998|person999998 999999|person999999 1000000|person10000003. Now lets copy the file person.txt onto the HDFS as shown below
hadoop fs -put person.txt /
4. At this point we have the data file on HDFS so we can create an external table using HAWQ so we can query the data using SQL. To create an external table we would use SQL as follows
demo=# CREATE EXTERNAL TABLE person ( id int, name text) demo-# LOCATION ('gpxf://pivhdsne:50070/person.txt?Fragmenter=HdfsDataFragmenter') FORMAT 'TEXT' (DELIMITER = '|'); CREATE EXTERNAL TABLE Time: 3.843 ms
GPXF enables External table interface inside HAWQ to read data stored in Hadoop ecosystem which enables loading and querying data stored in
- HDFS
- HBase
- Hive
- Text
- Avro
- Hive - Text, Sequence and RCFile formats
- HBase
Note: GPXF has an extensible framework API to enable custom connector development for other data sources and custom formats
5. Now at this point we can query data sitting on our HDFS using SQL , some examples below.
- Getting a count(*)
demo=# select count(*) from person; count --------- 1000000 (1 row) Time: 1396.451 ms
- Querying the first 10 records only
demo=# select * from person limit 10; id | name ----+---------- 1 | person1 2 | person2 3 | person3 4 | person4 5 | person5 6 | person6 7 | person7 8 | person8 9 | person9 10 | person10 (10 rows) Time: 263.435 ms
More Information
Fore more information on Pivotal HD see the links below.http://www.gopivotal.com/pivotal-products/pivotal-data-fabric/pivotal-hd
http://www.greenplum.com/products/pivotal-hd
http://pivotalhd.cloudfoundry.com/index.html
Subscribe to:
Posts (Atom)