- add entry point
mara.commands
(for mara-cli support) - add mara-pipelines click command group (#104)
- add dynamic Task (#106)
- add max_retries for parallel tasks (#105)
- fix html_doc_items tuple of command ReadScriptOutput
- upgrade to bootstrap 4.6 (from alpha 4) (#66)
Caution: Some CSS classes changed between Boostram Alpha 4 and 4.6. You might need to upgrade other mara packages as well, e.g. mara-pipelines and mara-app. - add HttpRequest command #78 (#79)
- add WriteFile command (#89)
- add support for formats in file operations (#95)
- add typing (#91)
- add before/after task to ParallelTask only when not command list is not empty (#93)
- fix get_user_display_name on docker (#90)
- fix small issues (#91)
- fix SQLAlchemy warning about declarative_base moved in 2.0 (#99)
required changes You might need to investigate your custom CSS styling, see boostram upgrade above.
- Add option to hide system stats in the UI #72
- Add option to disable the collection of system statistics (#72)
- Add syntax highlighting for TSQL and SQLite3 (#86)
- Support pipeline execution without 'mara' database (#71)
- Fix getting exitcode from process issue since python 3.8 (#87)
- Use client-side rendering for graphviz when shell command is not available (#70)
- Fix CopyIncrementally with no data (#54)
- Add ability to specify modification value type in CopyIncrementally (#53) 66e7dc1 Jan Katins [email protected] 4. Mar 2021 at 22:06
- Fix read stderr during command execution (#47)
- Use echo_queries from mara_db.config.default_echo_queries (#58)
- Include all versioned package files in wheel
- Fix for visible passwords in the logs despite
mara_pipelines.config.password_masks()
set. Bug was introduced in 3.0.0.
- Modify shell command to support the Google BigQuery integration
- Add file_dependencies argument to Python commands
Rename package from data_integration
to mara_pipelines
.
required changes
- In requirements.txt, change
-e git+https://github.com/mara/[email protected]#egg=data-integration
to-e git+https://github.com/mara/[email protected]#egg=mara-pipelines
- If you use the
mara-etl-tools
package, update to version4.0.0
- In your project code, replace all imports from
data_integration
tomara_pipelines
- Adapt navigation and ACL entries, if you have any (their names changed from "Data integration" to "Pipelines")
Here's an example of how that looks at the mara example project 2: https://github.com/mara/mara-example-project-2/commit/fa2fba148e65533f821a70c18bb0c05c37706a83
- Fix duplicated system stats if you run multiple ETLs in parallel (#38)
- Add config default_task_max_retries (#39)
- Cleaner shutdown (#41)
- Ignore not succeeded executions in cost calculation (#36)
- Ensure we log errors via events in case of error/shutdown (#33)
- Fix a bug where we reported the wrong error to chat channels when running in the browser and did not restart between failed runs (#33)
- Fix Problems when frontend and database are in a different timezone (#34)
- Implement pipeline notifications via Microsoft Teams #28
- Make it possible to disable output coloring in command line etl runs (#31)
- Make event handlers configurable: this allows for e.g. adding your own notifier for specific events
- Switch slack to use events for notifications of interactive pipeline runs
- Fix an edge case bug where reverting a commit after an error in the table creation for an incremental load job would not recreate the original tables leading to a failed load
- Fix an edge case bug where crashing during a triggered (code change, TRUNCATE) full load of an incremental load job after the table was already loaded would not rerun the full load leading to missing data
- Optimize how we set the spawning method in multiprocessing
- Fix for Python 3.7 ("RuntimeError: context has already been set")
- Python 3.8 compatibility (explicitly set process spawning method to 'fork')
- Fix open runs after browser reload
- Add workaround for system statistics on wsl1
- Speedup incremental insert into partitioned tables
- Show warning when graphviz is not installed
- Include file_dependencies as variable for Copy Commands: This could handle cases in ETL pipeline, where the copy command shall be skipped if the sql_files stay the same.
- Bug fix: make last modification timestamp of parallel file reading time zone aware (fixes
TypeError: can't compare offset-naive and offset-aware datetimes
error)
- Add travis integration and PyPi upload
- Add parameter
csv_format
anddelimiter_char
toCopy
andCopyIncrementally
commands.
- Changed all
TIMSTAMP
toTIMSTAMPTZ
in the mara tables. You have to manually run the below migration commands asmake migrate-mara-db
won't pick up this change.
required changes
You need to manually convert the mara tables to TIMESTAMPTZ
:
-- Change the timezone to whatever your ETL process is running in
ALTER TABLE data_integration_run ALTER start_time TYPE timestamptz
USING start_time AT TIME ZONE 'Europe/Berlin';
ALTER TABLE data_integration_run ALTER end_time TYPE timestamptz
USING end_time AT TIME ZONE 'Europe/Berlin';
ALTER TABLE data_integration_processed_file ALTER last_modified_timestamp TYPE timestamptz
USING last_modified_timestamp AT TIME ZONE 'Europe/Berlin';
ALTER TABLE data_integration_node_run ALTER start_time TYPE timestamptz
USING start_time AT TIME ZONE 'Europe/Berlin';
ALTER TABLE data_integration_node_run ALTER end_time TYPE timestamptz
USING end_time AT TIME ZONE 'Europe/Berlin';
ALTER TABLE data_integration_node_output ALTER timestamp TYPE timestamptz
USING timestamp AT TIME ZONE 'Europe/Berlin';
ALTER TABLE data_integration_file_dependency ALTER timestamp TYPE timestamptz
USING timestamp AT TIME ZONE 'Europe/Berlin';
ALTER TABLE data_integration_system_statistics ALTER timestamp TYPE timestamptz
USING timestamp AT TIME ZONE 'Europe/Berlin';
- Track and visualize also unfinished pipeline runs
- Speed up computation of node durations and node cost
- Improve error handling in launching of parallel tasks
- Improve run times visualization (better axis labels, independent tooltips)
- Smaller ui improvements
- Remove dependency_links from setup.py to regain compatibility with recent pip versions
- Change MARA_XXX variables to functions to delay importing of imports
- move some imports into the functions that use them in order to improve loading speed
- Add ability to mask passwords in
Command
s, so they cannot show up in the UI anymore or are not written to the database in saved Events (configdata_integration.config.password_masks()
). See the example in the original function how to not let passwords show up in the settings UI. (gh #14)
required changes
- Update
mara-app
to>=2.0.0
- Use postgresql 10 native partitioning for creating day_id partitions in ParallelReadFile
- Catch and display exceptions when creating html command documentation
- Add python ParallelRunFunction
- Add option to use explicit upsert on incremental load (explicit UPDATE + INSERT)
- Emit a proper NodeFinished event when the launching of a parallel task failed
- Add option truncate_partition to parallel tasks
- Fix bug in run_interactively cli command
- Make it possible to run the ExecuteSQL command outside of a pipeline via .run()
- Add args parameter to RunFunction command
- Show redundant node upstreams as dashed line in pipeline graphs
- Fix problems with too long bash commands by using multiple commands for partition generation in ParallelReadXXX tasks
required changes
- When using
ParallelReadFile
with parameterpartition_target_table_by_day_id=True
, then make sure the target table is natively partitioned by addingPARTITION BY LIST (day_id)
.
- Add possibility to continue running child nodes on error (new
Pipeline
parameterscontinue_on_error
andforce_run_all_children
) - Make dependency on requests explicit
- Implement ReadMode ONLY_CHANGED that reads all new or modified files
- Show node links in run output only relative to current node (to save space)
- Add slack notifications to "run_interactively" cli command
- Add parameter max_retries to class Task
- Fix typos in Readme
- Optimize imports
- Move to Github
- Improve documentation
- Add ReadMode 'ONLY_LATEST'
- Add new command
ReadScriptOutput
- Add slack bot configuration
- Fix url in slack event handler