Each Mod provide some functionality useful to an ETL process. For examples, the Commands Mod will provide the ability to run commands on either the host or the client, to unzip files, run scripts, etc. The Files Mod is useful to download files from ftp servers, over http/https, etc.
This mod will prompt user to enter something.
None. Simply leverages readline
to prompt message and read answer.
root:
interactives:
sayhello:
prompt: _prompt_
var: _key_
root:
interactives:
sayhello:
prompt: "Name please:"
var: "name"
This mod simply downloads files.
This mods uses wget
to make download files.
Other aspects (like permissions) are done through bash commands (chmod
, chown
, chgrp
).
root:
files:
"_output_full_file_path_":
source: _url_
The template below will simply downloads the file from https://a.b.c/file1.zip
to /downloads/file1.zip
.
my_extract:
files:
"/downloads/file1.zip":
source: https://a.b.c/file1.zip
You can use the Commands Mod to execute commands. Commands are executed in the order provided.
None. This mod simply provides access to bash environment.
Property | Required? | Default value | Description |
---|---|---|---|
command | required | String specifying the command to run. Tags can be used | |
cwd | optional | Working directory | |
env | optional | Sets environment variables for the command. This property overwrites, rather than appends, the existing environment. | |
test | optional | A test command that determines whether to run the command specified. If the test fails (or in error, based on exit code), the command is skipped. For Linux, the test command must return an exit code of 0 for the test to pass. For Windows, the test command must return an %ERRORLEVEL% of 0. | |
ignore_errors | optional | false | If true, errors when executing command (not the test) are ignored and process continues as if command executed fine. |
exit_on_error | optional | false | Whether or not to exit the entire process upon error. |
var | optional | Save result of command into context under var key. | |
result_as_json | optional | false | Whether or not to parse command result (stdout) as JSON. This can be used with var to save JSON into context and make it available to other mods. |
Note that, if the command used in test is in error, the command will be skipped. This means both ignore_errors and exit_on_error are ignored in this situation.
The template below runs echo "hello"
.
commands:
000_say_hello:
command: echo "hello"
The following template will first run a test to decide whether or not to run the command:
commands:
000_say_hello:
command: echo "Now do something"
test: [$((`date +%d`%2)) -eq 0]
The following template make use of the var property to use the value in a subsequent command:
commands:
000_figure_out_day:
command: date +%A
var: today
001_printit:
command: echo "Today is {{today}}"
The following template make use of both var and result_as_json:
commands:
000_var:
command: echo '{ "toto":"titi" }'
var: myvar
result_as_json: true
001_use:
command: echo "According to myvar, toto={{myvar.toto}}"
To prevent the executing of the process, whatever the reason, a simple script exiting with an exit code different from 0 will do:
commands:
000_pre_flight_check:
command: /opt/something/check_for_new_data.sh
exit_on_error: true
This mod execute ECL code against an HPCC System cluster.
This mods requires HPCC ECL Client Tools.
The program eclplus
, specifically, is used here.
root:
ecls:
_key_:
# required
cluster: _cluster_,
# one or the other, required
content: _ecl_,
file: _file_,
# optional
queue: _queue_,
graph: _graph_,
timeout: _timeout_,
format: _format_,
output: _output_,
jobname: _output_,
pagesize: _pagesize_
The following example runs OUTPUT('Hello world!)
on the cluster:
root:
ecls:
output_hello:
# required
cluster: thor,
content: |
OUTPUT('Hello world!')
The following example will generate a CSV file with headers:
root:
ecls:
output_hello:
# required
cluster: thor,
output: /tmp/test.csv
format: csvh
content: |
layout := RECORD
STRING name;
STRING num;
END;
ds := DATASET([ { 'Toto', 4 }, { 'Titi', 4 }, ( 'Anderson', 8 } ], layout );
OUTPUT(ds);
Use the HPCC Systems Distributed File Utility (DFU) (dfuplus
) to load file into HPCC Systems Cluster.
This mods requires HPCC ECL Client Tools.
The program dfuplus
, specifically, is used here.
root:
hpcc-sprays:
# required
format: _format_,
destinationgroup: null,
destinationlogicalname: null,
# usually defined in settings or in dfuplus.ini
server: _server_,
username: _username_,
password: _password_,
# optional
sourceip: _sourceip_,
sourcepath: _sourcepath_,
maxrecordsize: _maxrecordsize_,
srccsvseparator: _csvseparator_,
srccsvterminator: _csvterminator_,
srccsvquote: _csvquote_,
timeout: _timeout_,
maxconnections: _maxconnections_,
allowoverwrite: _allowoverwrite_,
replicate: _replicate_,
compress: _compress_,
sourcecsvescape: _sourcecsvescape_,
failifnosourcefile: _failifnosourcefile_,
recordstructurepresent: _recordstructurepresent_,
quotedterminator: _quotedterminator_,
encoding: _encoding_,
expiredays: _expiredays_
The following example sprays the file /tmp/2018.csv
onto the cluster:
load:
hpcc-sprays:
"noaa::ghcn::daily::2018::raw":
format: csv
destinationGroup: "mythor"
allowoverwrite: true
sourcePath: /tmp/2018.csv
This mod can be used to query a MySQL database.
This mods requires MySQL client.
The program mysql
, specifically, is used here.
root:
mysqls:
auto_rehash: _auto_rehash_
auto_vertical_output: _auto_vertical_output_
batch: _batch_
binary_as_hex: _binary_as_hex_
binary_mode: _binary_mode_
bind_address: _bind_address_
columns: _columns_
compress: _compress_
debug: _debug_
debug_check: _debug_check_
debug_info: _debug_info_
default_auth: _default_auth_
default_character_set: _default_character_set_
defaults_extra_file: _defaults_extra_file_
defaults_file: defaults_file_
defaults_group_suffix: _defaults_group_suffix_
delete: _delete_
enable_cleartext_plugin: _enable_cleartext_plugin_
fields_enclosed_by: _fields_enclosed_by_
fields_escaped_by: _fields_escaped_by_
fields_optionally_enclosed_by: _fields_optionally_enclosed_by_
fields_terminated_by: _fields_terminated_by_
force: _force_
get_server_public_key: _get_server_public_key_
host: _host_
ignore: _ignore_
ignore_lines: _ignore_lines_
lines_terminated_by: _lines_terminated_by_
local: _local_
lock_tables: _lock_tables_
login_path: _login_path_
low_priority: _low_priority_
no_defaults: _no_defaults_
output: _file_to_output_results_to
password: _password_
pipe: _pipe_
plugin_dir: _plugin_dir_
port: _port_
protocol: _protocol_
replace: _replace_
secure_auth: _secure_auth_
server_public_key_path: _server_public_key_path_
shared_memory_base_name: _shared_memory_base_name_
silent: _silent_
skip_named_commands: _skip_named_commands_
skip_pager: _skip_pager_
skip_reconnect: _skip_reconnect_
socket: _socket_
ssl_ca: _ssl_ca_
ssl_capath: _ssl_capath_
ssl_cert: _ssl_cert_
ssl_cipher: _ssl_cipher_
ssl_crl: _ssl_crl_
ssl_crlpath: _ssl_crlpath_
ssl_fips_mode: _ssl_fips_mode_
ssl_key: _ssl_key_
ssl_mode: _ssl_mode_
syslog: _syslog_
table: _table_
tee: _tee_
tls_cipheruites: _tls_ciphersuites_
tls_version: _tls_version_
unbuffered: _unbuffered_
user: _user_
vertical: _vertical_
wait: _wait_
xml: _xml_
Querying some records and saving the results in a file:
query:
mysqls:
user_report:
db_name: reports
execute: |
SELECT
email,
COUNT(DISTINCT(attempts)) as total_attempts,
SUM(IF(failed,1,0)) as total_failed_attempts
FROM
user_logs;
output: /opt/example/user_reports.txt
Creating a table (no output necessary):
ddl:
mysqls:
create_users_tables:
db_name: idp
execute: |
DROP TABLE IF EXISTS users;
CREATE TABLE users (id BIGINT NOT NULL PRIMARY KEY, email VARCHAR(200), first_name VARCHAR(255), last_name VARCHAR(255), UNIQUE INDEX (email));
Imports file into MySQL database through mysqlimport
.
This mods requires MySQL client.
The program mysqlimport
, specifically, is used here.
root:
mysqlimports:
bind_address: _bind_address_
columns: _columns_
compress: _compress_
debug: _debug_
debug_check: _debug_check_
debug_info: _debug_info_
default_auth: _default_auth_
default_character_set: _default_character_set_
defaults_extra_file: _defaults_extra_file_
defaults_file: defaults_file_
defaults_group_suffix: _defaults_group_suffix_
delete: _delete_
enable_cleartext_plugin: _enable_cleartext_plugin_
fields_enclosed_by: _fields_enclosed_by_
fields_escaped_by: _fields_escaped_by_
fields_optionally_enclosed_by: _fields_optionally_enclosed_by_
fields_terminated_by: _fields_terminated_by_
force: _force_
get_server_public_key: _get_server_public_key_
host: _host_
ignore: _ignore_
ignore_lines: _ignore_lines_
lines_terminated_by: _lines_terminated_by_
local: _local_
lock_tables: _lock_tables_
login_path: _login_path_
low_priority: _low_priority_
no_defaults: _no_defaults_
password: _password_
pipe: _pipe_
plugin_dir: _plugin_dir_
port: _port_
protocol: _protocol_
replace: _replace_
secure_auth: _secure_auth_
server_public_key_path: _server_public_key_path_
shared_memory_base_name: _shared_memory_base_name_
silent: _silent_
socket: _socket_
ssl_ca: _ssl_ca_
ssl_capath: _ssl_capath_
ssl_cert: _ssl_cert_
ssl_cipher: _ssl_cipher_
ssl_crl: _ssl_crl_
ssl_crlpath: _ssl_crlpath_
ssl_fips_mode: _ssl_fips_mode_
ssl_key: _ssl_key_
ssl_mode: _ssl_mode_
tls_cipheruites: _tls_ciphersuites_
tls_version: _tls_version_
use_threads: _use_threads_
user: _user_