The following demo shows how to use the PCF 1.2 PHD service with HAWQ by loading data into the PCF PaaS platform.
1. First lets setup our ENV to use the correct version of HADOOP on our local laptop.
export HADOOP_INSTALL=/Users/papicella/vmware/software/hadoop/hadoop-2.0.5-alpha
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.awt.headless=true -Djava.security.krb5.realm=-Djava.security.krb5.kdc="
export YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk -Djava.awt.headless=true"
hadoop version
2. Set the HADOOP_USER_NAME to ensure you have write access to load a file.
export HADOOP_USER_NAME=ucc3a04008db2486
3. Create a file called person.txt with some pipe delimited data , example below.
[Mon Jul 28 21:47:37 papicella@:~/vmware/software/hadoop/cloud-foundry/pcf12/demo ] $ head person.txt
1|person1
2|person2
3|person3
4|person4
5|person5
4. Load the file into the PHD instance running in PCF 1.2. You will need to use the name node / path which is correct for your PHD instance.
[Mon Jul 28 21:51:43 papicella@:~/vmware/software/hadoop/cloud-foundry/pcf12/demo ] $ hadoop fs -put person.txt hdfs://x.x.x.x:8020/user/ucc3a04008db2486/
5. Create a HAWQ table to the file person.txt using PXF as shown below.
CREATE EXTERNAL TABLE person (id int, name text)
LOCATION ('pxf://x.x.x.x:50070/user/ucc3a04008db2486/person.txt?Fragmenter=HdfsDataFragmenter&Accessor=TextFileAccessor&Resolver=TextResolver')
FORMAT 'TEXT' (DELIMITER = '|');
6. Query the table as shown below.
For more information on the PHD service see the link below.
http://docs.pivotal.io/pivotalhd-ds/index.html