1. Create a load table. In this example the data is then moved to the FACT table once the load is complete
drop table rtiadmin.rtitrans_etl4; CREATE TABLE rtiadmin.rtitrans_etl4 ( imsi character varying(82), subscriber_mccmnc character varying(10), msisdn character varying(82), imei character varying(50), called_digits character varying(50), start_datetime integer, end_datetime integer, first_cell_lac integer, first_cell_idsac integer, current_cell_lac integer, current_cell_idsac integer, dr_type integer, status character varying(50), ingest_time bigint, processed_time bigint, export_time bigint, extra_col text, gploaded_time timestamp without time zone ) WITH (appendonly=true) DISTRIBUTED BY (imsi);
2. GPLOAD yaml file defined as follows
VERSION: 1.0.0.1
DATABASE: mydb
USER: rtiadmin
HOST: 172.1.1.1
PORT: 5432
GPLOAD:
INPUT:
- SOURCE:
LOCAL_HOSTNAME:
- loadhost
PORT: 8100
FILE:
- /data/rti/stage/run/*.csv
- COLUMNS:
- imsi : text
- subscriber_mccmnc : text
- msisdn : text
- imei : text
- called_digits : text
- start_datetime : text
- end_datetime : text
- first_cell_lac : integer
- first_cell_idsac : integer
- current_cell_lac : integer
- current_cell_idsac : integer
- dr_type : integer
- status : text
- ingest_time : bigint
- processed_time : bigint
- export_time : bigint
- extra_col : text
- FORMAT: text
- HEADER: false
- DELIMITER: ','
- NULL_AS : ''
- ERROR_LIMIT: 999999
- ERROR_TABLE: rtiadmin.rtitrans_etl4_err
OUTPUT:
- TABLE: rtiadmin.rtitrans_etl4
- MODE: INSERT
- MAPPING:
imsi : imsi
subscriber_mccmnc : subscriber_mccmnc
msisdn : msisdn
imei : imei
called_digits : called_digits
start_datetime : substr(start_datetime, 1, 10)::int
end_datetime : substr(end_datetime, 1, 10)::int
first_cell_lac : first_cell_lac
first_cell_idsac : first_cell_idsac
current_cell_lac : current_cell_lac
current_cell_idsac : current_cell_idsac
dr_type : dr_type
status : status
ingest_time : ingest_time
processed_time : processed_time
export_time : export_time
extra_col : extra_col
gploaded_time : current_timestamp
PRELOAD:
- TRUNCATE : true
- REUSE_TABLES : true
SQL:
- AFTER : "insert into rtitrans select * from rtitrans_etl4"
source $HOME/.bash_profile
gpload -f rtidata.yml
Note: We use the ENV variable as $PGPASSWORD which is used during the load if a password is required which was in this demo
Few things worth noting here.
REUSE_TABLES : This ensures the external tables created during the load are maintained and re-used on next load.
TRUNCATE: This clears the load table prior to load and we use this as we COPY the data once the load is finished into the main FACT table using the "AFTER"
No comments:
Post a Comment