Wednesday, 13 August 2014

Dept/Emp POJO's with sample data for Pivotal GemFire

I constantly blog about using DEPARTMENT/EMPLOYEE POJO'S with sample data. Here is how to create a file with data to load into GemFire to give you that sample set.

Note: You would need to create POJO'S for Department/Empployee objects that have getter/setter for the attributes mentioned below.

Dept Data

put --key=10 --value=('deptno':10,'name':'ACCOUNTING') --value-class=pivotal.au.se.deptemp.beans.Department --region=departments;
put --key=20 --value=('deptno':20,'name':'RESEARCH') --value-class=pivotal.au.se.deptemp.beans.Department --region=departments;
put --key=30 --value=('deptno':30,'name':'SALES') --value-class=pivotal.au.se.deptemp.beans.Department --region=departments;
put --key=40 --value=('deptno':40,'name':'OPERATIONS') --value-class=pivotal.au.se.deptemp.beans.Department --region=departments;

Emp Data

put --key=7369 --value=('empno':7369,'name':'SMITH','job':'CLERK','deptno':20) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7370 --value=('empno':7370,'name':'APPLES','job':'MANAGER','deptno':10) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7371 --value=('empno':7371,'name':'APICELLA','job':'SALESMAN','deptno':10) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7372 --value=('empno':7372,'name':'LUCIA','job':'PRESIDENT','deptno':30) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7373 --value=('empno':7373,'name':'SIENA','job':'CLERK','deptno':40) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7374 --value=('empno':7374,'name':'LUCAS','job':'SALESMAN','deptno':10) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7375 --value=('empno':7375,'name':'ROB','job':'CLERK','deptno':30) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7376 --value=('empno':7376,'name':'ADRIAN','job':'CLERK','deptno':20) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7377 --value=('empno':7377,'name':'ADAM','job':'CLERK','deptno':20) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7378 --value=('empno':7378,'name':'SALLY','job':'MANAGER','deptno':20) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7379 --value=('empno':7379,'name':'FRANK','job':'CLERK','deptno':10) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7380 --value=('empno':7380,'name':'BLACK','job':'CLERK','deptno':40) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;
put --key=7381 --value=('empno':7381,'name':'BROWN','job':'SALESMAN','deptno':40) --value-class=pivotal.au.se.deptemp.beans.Employee --region=employees;

Load into GemFire (Assumed JAR for POJO'S exists in class path of GemFireCache Servers)

The script bellows uses GFSH to load the file into the correct region references the correct POJO inside the files created above.

export CUR_DIR=`pwd`

gfsh <<!
connect --locator=localhost[10334];
run --file=$CUR_DIR/dept-data
run --file=$CUR_DIR/emp-data
!

Below is what the Department.java POJO would look like for example.

  
package pivotal.au.se.deptemp.beans;

public class Department
{ 
 private int deptno;
 private String name;
 
 public Department() 
 {
 }

 public Department(int deptno, String name) {
  super();
  this.deptno = deptno;
  this.name = name;
 }

 public int getDeptno() {
  return deptno;
 }

 public void setDeptno(int deptno) {
  this.deptno = deptno;
 }

 public String getName() {
  return name;
 }

 public void setName(String name) {
  this.name = name;
 }

 @Override
 public String toString() {
  return "Department [deptno=" + deptno + ", name=" + name + "]";
 }
 
}

Monday, 28 July 2014

Using HAWQ with PHD service in PCF 1.2

The following demo shows how to use the PCF 1.2 PHD service with HAWQ by loading data into the PCF PaaS platform.

1. First lets setup our ENV to use the correct version of HADOOP on our local laptop.

export HADOOP_INSTALL=/Users/papicella/vmware/software/hadoop/hadoop-2.0.5-alpha
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home

export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

export HADOOP_OPTS="$HADOOP_OPTS  -Djava.awt.headless=true -Djava.security.krb5.realm=-Djava.security.krb5.kdc="

export YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk -Djava.awt.headless=true"

hadoop version

2. Set the HADOOP_USER_NAME to ensure you have write access to load a file.

export HADOOP_USER_NAME=ucc3a04008db2486

3. Create a file called person.txt with some pipe delimited data , example below.

[Mon Jul 28 21:47:37 papicella@:~/vmware/software/hadoop/cloud-foundry/pcf12/demo ] $ head person.txt
1|person1
2|person2
3|person3
4|person4
5|person5

4. Load the file into the PHD instance running in PCF 1.2. You will need to use the name node / path which is correct for your PHD instance.

[Mon Jul 28 21:51:43 papicella@:~/vmware/software/hadoop/cloud-foundry/pcf12/demo ] $ hadoop fs -put person.txt hdfs://x.x.x.x:8020/user/ucc3a04008db2486/

5. Create a HAWQ table to the file person.txt using PXF as shown below.

CREATE EXTERNAL TABLE person (id int, name text)
LOCATION ('pxf://x.x.x.x:50070/user/ucc3a04008db2486/person.txt?Fragmenter=HdfsDataFragmenter&Accessor=TextFileAccessor&Resolver=TextResolver')
FORMAT 'TEXT' (DELIMITER = '|');

6. Query the table as shown below.



For more information on the PHD service see the link below.

http://docs.pivotal.io/pivotalhd-ds/index.html

Friday, 27 June 2014

Pivotal Cloud Foundry Installed lets create an ORG / USER to get started

I installed Pivotal Cloud Foundry 1.2 recently and the commands below is what I run using the CLI to quickly create an ORG and a USER to get started with. Below assumes your connected as the ADMIN user to set a new ORG up.

Cloud Foundry CLI Commands as follows

cf api {cloud end point}
cf create-org pivotal
cf create-user pas pas
cf set-org-role pas pivotal OrgManager
cf target -o pivotal
cf create-space development
cf create-space test
cf create-space production
cf set-space-role pas pivotal production SpaceDeveloper
cf set-space-role pas pivotal development SpaceDeveloper
cf set-space-role pas pivotal test SpaceDeveloper
cf login -u pas -p pas -s development

Thursday, 15 May 2014

Pivotal GemFireXD*Web, Web based Interface For GemFireXD

Pivotal GemFire XD bridges GemFire’s proven in-memory intelligence and integrates it with Pivotal HD 2.0 and HAWQ. This enables businesses to make prescriptive decisions in real-time, such as stock trading, fraud detection, intelligence for energy companies, or routing for the telecom industries.

You can read more about how GemFireXD and it's integration with PHD here.

https://www.gopivotal.com/big-data/pivotal-hd

While development team worked on GemFireXD I produced another open source web based tool named GemFireXD*Web. It's available with source code as follows.

https://github.com/papicella/GemFireXD-Web

GemFireXD *Web enables schema management from a web browser with features as follows

  • Create all Schema Objects via Dialogs
  • Generate DDL
  • Run multiple SQL Commands, upload SQL files
  • Browse / Administer Objects
  • Browse / Administer HDFS stores/tables 
  • Browse / Administer Async Event Listeners
  • View data distribution
  • View Members / start parameters


etc…







Tuesday, 15 April 2014

Creating some Pivotal Cloud Foundry (PCF) PHD services

After installing PHD add on for Pivotal Cloud Foundry 1.1 I quickly created some development services for PHD using the CLI as shown below.

[Tue Apr 15 22:40:08 papicella@:~/vmware/pivotal/products/cloud-foundry ] $ cf create-service p-hd-hawq-cf free dev-hawq
Creating service dev-hawq in org pivotal / space development as pas...
OK
[Tue Apr 15 22:42:31 papicella@:~/vmware/pivotal/products/cloud-foundry ] $ cf create-service p-hd-hbase-cf free dev-hbase
Creating service dev-hbase in org pivotal / space development as pas...
OK
[Tue Apr 15 22:44:10 papicella@:~/vmware/pivotal/products/cloud-foundry ] $ cf create-service p-hd-hive-cf free dev-hive
Creating service dev-hive in org pivotal / space development as pas...
OK
[Tue Apr 15 22:44:22 papicella@:~/vmware/pivotal/products/cloud-foundry ] $ cf create-service p-hd-yarn-cf free dev-yarn
Creating service dev-yarn in org pivotal / space development as pas...
OK

Finally using the web console to brow the services in the "Development" space


Wednesday, 9 April 2014

Pivotal Greenplum GPLOAD with multiple CSV files

I recently needed to setup a cron script which loaded CSV files from a directory into Greenplum every 2 minutes. Once loaded the files are moved onto Hadoop for archive purposes. The config below shows how to use GPLOAD data load utility which utilises GPFDIST.

1. Create a load table. In this example the data is then moved to the FACT table once the load is complete

  
drop table rtiadmin.rtitrans_etl4;

CREATE TABLE rtiadmin.rtitrans_etl4 (
    imsi character varying(82),
    subscriber_mccmnc character varying(10),
    msisdn character varying(82),
    imei character varying(50),
    called_digits character varying(50),
    start_datetime integer,
    end_datetime integer,
    first_cell_lac integer,
    first_cell_idsac integer,
    current_cell_lac integer,
    current_cell_idsac integer,
    dr_type integer,
    status character varying(50),
    ingest_time bigint,
    processed_time bigint,
    export_time bigint,
    extra_col text,
    gploaded_time timestamp without time zone
)
WITH (appendonly=true) DISTRIBUTED BY (imsi); 

2. GPLOAD yaml file defined as follows

VERSION: 1.0.0.1
DATABASE: mydb
USER: rtiadmin
HOST: 172.1.1.1
PORT: 5432
GPLOAD:
   INPUT:
    - SOURCE:
         LOCAL_HOSTNAME:
            - loadhost
         PORT: 8100
         FILE:
          - /data/rti/stage/run/*.csv
    - COLUMNS:
          - imsi : text
          - subscriber_mccmnc : text
          - msisdn : text
          - imei : text
          - called_digits : text
          - start_datetime : text
          - end_datetime : text
          - first_cell_lac : integer
          - first_cell_idsac : integer
          - current_cell_lac : integer
          - current_cell_idsac : integer
          - dr_type : integer
          - status : text
          - ingest_time : bigint
          - processed_time : bigint
          - export_time : bigint
          - extra_col : text
    - FORMAT: text
    - HEADER: false
    - DELIMITER: ','
    - NULL_AS : ''
    - ERROR_LIMIT: 999999
    - ERROR_TABLE: rtiadmin.rtitrans_etl4_err
   OUTPUT:
    - TABLE: rtiadmin.rtitrans_etl4
    - MODE: INSERT
    - MAPPING:
           imsi : imsi
           subscriber_mccmnc : subscriber_mccmnc
           msisdn : msisdn
           imei : imei
           called_digits : called_digits
           start_datetime : substr(start_datetime, 1, 10)::int
           end_datetime : substr(end_datetime, 1, 10)::int
           first_cell_lac : first_cell_lac
           first_cell_idsac : first_cell_idsac
           current_cell_lac : current_cell_lac
           current_cell_idsac : current_cell_idsac
           dr_type : dr_type
           status : status
           ingest_time : ingest_time
           processed_time : processed_time
           export_time : export_time
           extra_col : extra_col
           gploaded_time : current_timestamp
   PRELOAD:
    - TRUNCATE : true 
    - REUSE_TABLES : true
   SQL:
    - AFTER : "insert into rtitrans select * from rtitrans_etl4" 

3. Call GPLOAD as follows

source $HOME/.bash_profile
gpload -f rtidata.yml


Note: We use the ENV variable as $PGPASSWORD which is used during the load if a password is required which was in this demo

Few things worth noting here.

REUSE_TABLES : This ensures the external tables created during the load are maintained and re-used on next load.

TRUNCATE: This clears the load table prior to load and we use this as we COPY the data once the load is finished into the main FACT table using the "AFTER"

Tuesday, 4 March 2014

Pivotal Cloud Foundry using App Direct "newrelic" Monitoring Service

PCF AWS marketplace provides app direct services and in this example I am going to use the "newrelic" monitoring service to monitor my spring based java application. It's really this simple.

1. Create a service as shown below.

[Tue Mar 04 17:19:34 papicella@:~/cfapps/spring-travel ] $ cf create-service newrelic standard dev-newrelic

2. Create a manifest.yml for my spring application which uses the new relic service above.

applications:
- name: pas-springtravel 
  memory: 1024M 
  instances: 1
  host: pas-springtravel 
  domain: cfapps.io 
  path: ./travel.war
  services:
  - dev-mysql
  - dev-newrelic

3. Push the application

[Tue Mar 04 17:19:34 papicella@:~/cfapps/spring-travel ] $ cf push -f manifest.yml 
Using manifest file manifest.yml

Creating app pas-springtravel in org papicella-org / space development as papicella@gopivotal.com...
OK

Using route pas-springtravel.cfapps.io
Binding pas-springtravel.cfapps.io to pas-springtravel...
OK

Uploading pas-springtravel...
Uploading from: travel.war
5.3M, 2748 files
OK
Binding service dev-mysql to pas-springtravel in org papicella-org / space development as papicella@gopivotal.com
OK
Binding service dev-newrelic to pas-springtravel in org papicella-org / space development as papicella@gopivotal.com
OK

Starting app pas-springtravel in org papicella-org / space development as papicella@gopivotal.com...
OK
-----> Downloaded app package (22M)
-----> Uploading droplet (67M)

0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
1 of 1 instances running

App started

Showing health and status for app pas-springtravel in org papicella-org / space development as papicella@gopivotal.com...
OK

requested state: started
instances: 1/1
usage: 1G x 1 instances
urls: pas-springtravel.cfapps.io

     state     since                    cpu    memory         disk           
#0   running   2014-03-04 05:24:43 PM   0.0%   610.7M of 1G   155.9M of 1G 

4. Under the services listed on AWS click on "Manage" and here are some screen shots of what the newrelic monitoring service provides with just a simple BIND when we pushed the application.





Friday, 21 February 2014

Deploying Spring MVC application to Cloud Foundry from IntelliJ IDEA

I previously showed how to create a connection in IntelliJ IDEA to Cloud Foundry v2 in the post below.

http://theblasfrompas.blogspot.com.au/2014/02/intellij-idea-version-13-now-includes.html

With a Cloud Foundry CLOUD connection we can now PUSH our application directly from the IDE as shown below.

1. Create a run configuration for your project as shown below. We also specify the memory and number of instances on this page as part of the push  / deployment process.



2. Select the created run configuration and deploy the application, output as follows


3. View the deployed application on AWS hosted instance of Cloud Foundry



IntelliJ IDEA version 13 now includes CloudFoundry Connection

Just installed IntelliJ IDEA version 13 and found that it now includes a CloudFoundry connection type. You define it under IDE settings as shown below.


Will test deploying to the publicly hosted AWS Cloud Foundry using this connection at some stage.

Thursday, 20 February 2014

PCF (Pivotal Cloud Foundry) cf push multiple applications using manifest file

By creating a manifest as follows we can push multiple applications in one go as shown below.

1. manifest.yml

applications:
- name: pas-props
  memory: 256M
  instances: 1
  host: pas-props
  domain: cfapps.io
  path: ./props.war
- name: pas-httpsession
  memory: 256M
  instances: 1
  host: pas-httpsession
  domain: cfapps.io
  path: ./haclusterdemo.war

2. Push as follows

[Thu Feb 20 14:48:45 papicella@:~/vmware/pivotal/products/cloud-foundry/apps/other ] $ cf push -f manifest-twoapps.yml
Using manifest file manifest-twoapps.yml

Creating app pas-props in org papicella-org / space development as papicella@gopivotal.com...
OK

Using route pas-props.cfapps.io
Binding pas-props.cfapps.io to pas-props...
OK

Uploading pas-props...
Uploading from: props.war
2.7K, 5 files
OK

Starting app pas-props in org papicella-org / space development as papicella@gopivotal.com...
OK

1 of 1 instances running

App started

Showing health and status for app pas-props in org papicella-org / space development as papicella@gopivotal.com...
OK

requested state: started
instances: 1/1
usage: 256M x 1 instances
urls: pas-props.cfapps.io

     state     since                    cpu    memory           disk           
#0   running   2014-02-20 02:51:14 PM   0.0%   193.2M of 256M   110.7M of 1G   
Creating app pas-httpsession in org papicella-org / space development as papicella@gopivotal.com...
OK

Creating route pas-httpsession.cfapps.io...
OK

Binding pas-httpsession.cfapps.io to pas-httpsession...
OK

Uploading pas-httpsession...
Uploading from: haclusterdemo.war
130.2K, 10 files
OK

Starting app pas-httpsession in org papicella-org / space development as papicella@gopivotal.com...
OK

0 of 1 instances running, 1 starting
1 of 1 instances running

App started

Showing health and status for app pas-httpsession in org papicella-org / space development as papicella@gopivotal.com...
OK

requested state: started
instances: 1/1
usage: 256M x 1 instances
urls: pas-httpsession.cfapps.io

     state     since                    cpu    memory           disk           
#0   running   2014-02-20 02:53:53 PM   2.0%   214.3M of 256M   113.3M of 1G   

More Information

http://docs.cloudfoundry.org/devguide/deploy-apps/manifest.html