Skip to content

Latest commit

 

History

History
58 lines (49 loc) · 10.6 KB

README.md

File metadata and controls

58 lines (49 loc) · 10.6 KB

DIEM

GitHub issues GitHub forks GitHub stars GitHub license Build Status Helm

NodeJS Angular Python TypeScript Webpack Docker Kubernetes GitHub Actions

Python, Spark, REST, Scala, Pipelines, Scheduling, API, Custom Jobs, SQL Statements, Openshift, Cloud Native, Machine Learning, Sendgrid, Kubernetes, Slack, Cloud Object Storage, JDBC, Box

Diem can be used to create, display, execute and maintain data transfers between hardware and database platforms. It will cover how to create and manage transfers and assign them to a schedule to execute regularly without human intervention.

Diem provides a front end for SPARK ETL (Extract, Transform, Load) – an SQL data pipeline that can be used to synchronize data between RDMS platforms. Composed of individual transfer operations called jobs, the tool will execute SQL statements to select data from a source system and insert or replicate the data on a target system.

Diem allows the user to create scripts using the interpreted programming language Python, and to create sophisticated schedules using Cron (a work scheduler for Unix systems.) The combination of Python and Cron, along with the intrinsic ability to define and execute custom SQL statements, allows a range of activities from simple data transfers to more sophisticated job streams.

Diem also allows quick and easy definition of connections, as well as a scheduler and log display. An interface to Slack can be used to send the results of jobs to a specified Slack channel.

Application Features

Feature Feature Summary Benefits
Spaces Support for Multiple Organisations Multiple Organisations can make use of DIEM, each org can have it's own space. You can even have multiple spaces per Org and use it for test, pre-prod or production
Data Transfer NodyPy Fast transfer of small data sets <100 k using pandas jdbc sqlalchemy
Data Transfer Spark Bulk Transfer of big data using spark, both pyspark an scala.
Partition your data for paralel inserts.
Write you sql online and easy manage your job.
Include it in a pipeline.
Get notified via slack or mail
Custom Code Write your own python code Write your own python code using pyspark or python. Integrate your favorite library, use your jdbc connection, integrate your config maps, code snippets, webhooks all in one pl;ace, creating a unique experience
API Services Rest services for external use Create jobs that can provide REST Services. Connect external applications to your code and provide rest services for them
Machine Learning Embed Machine Learning in your code Make use of the latest ML Libraries like SciPy, matplotlib, seaborn, pandas etc.. to create machine learning models that can be used in your code
Connections DB2
Netezza
ProgreSQL
Many more
JDBC connectins into various sources, easy to add and manage.
Secrets kept secure if personal
Webhooks Bring in your own webhook Webooks can be to integrate into your applications. You can bring in your git or slack webhook and use it n your applications
Slack Slack Integration Either you use the default slack channels or bring in your own slack api key. All job progress are logged to your slack channels. You can even integrate them in your custom jobs. Provide custom content and subject messages
Pipelines Pipelins of Jobs Group your jobs together and form a pipleline. Start each job at the same time or in order. Manage dependences and organize them in steps
Scheduling Cron Schedule Schedule to run jor jobs using an advance Cron schedule that can handle any type of timeframe and schedule
Mail Mail Functionality Send mail on Completion or Failure of jour job to your audiance
Mail Integration Mail Functionality for your code Integrate mail functionality in your code, send data reports as html, csv , xls to your audience based on your query. Customize headers and body content.
Files Upload, Download or integrate files Each space is connected to it's own Cloud Object Storage Buckewt and can be integrated in your code. You can also specify any other COS instance
Box Upload, Download from BOX You can now directly download and upload files from Box
Config Maps Manage parameters and config values Config maps are vary usefull as you can spererate your code from it's values. They can be kept private and secure so you can use them for storing your own tokens.
Tags Define your own tags You can set up your own tags for easy job search, classification and job management
Templates Reusable or shared Templates Your code could be based of a template, that you can clone from , you can lso have shared code which is the same amongst your jobs but only different in configuration
Code Snipptes Reusabel adn sharable code Create reusable code, share use it in your jobs.
This allows you to reuse your code in multiple jobs, maintaining key code centrally
Job Log Audit trails of completed job Each started job will have it's own audit trail, so you can go back to view errors and integrate it in your reporting for performance review
Organization Organization Profile View your Profile and your access rights organisation
Organizations Organizations you belong to See all organisations you belong to
Space Selector Easy move between spaces You can at any time easily swtich between organisations your belong to