Releases · MarquezProject/marquez

20 Mar 18:20

merobi-hub

0.32.0

0227d72

Marquez 0.32.0

Fixed

API: improve dataset facets access #2407 @pawel-big-lebowski
Improves database query performance when accessing dataset facets by rewriting SQL queries in DatasetDao and DatasetVersionDao.
Chart: fix communication between the UI and the API #2430 @thomas-delrue
Defines the value for MARQUEZ_PORT as .Values.marquez.port (80) in the Helm Chart so the Marquez Web component can communicate with the API.
UI: always render MqCode #2454 @JDarDagran
Fixes rendering of DatasetInfo and RunInfo pages when no SqlJobFacet exists.

Removed

API: remove job context #2373 @JDarDagran
Removes the use of job context and adds two endpoints for job/run facets per run. These are called from Web components to replace the job context with SQLJobFacet.
API: remove jobs_fqn table and move FQN into jobs directly #2448 @collado-mike
Fixes loading of certain jobs caused by the inability to enforce uniqueness constraints on fully qualified job names.

Assets 2

16 Feb 22:19

merobi-hub

0.31.0

6f04b8f

Marquez 0.31.0

Added

UI: add facet view enhancements #2336 @tito12
Creates a dynamic component offering the ability to navigate and search the JSON, expand sections and click on links.
UI: highlight selected path on graph and display status of jobs and datasets based on last 14 runs or latest quality facets #2384 @tito12
Adds highlighting of the visual graph based on upstream and downstream dependencies of selected nodes; makes displayed status reflect last 14 runs the case of jobs and latest quality facets in the case of datasets.
UI: enable auto-accessibility feature on graph nodes #2388 @merobi-hub
Adds attributes to the FontAwesomeIcons to enable a built-in accessibility feature.

Fixed

API: add index to jobs_fqn table using namespace_name and job_fqn columns #2357 @collado-mike
Optimizes read queries by adding an index to this table.
API: add missing indices to column_lineage, dataset_facets, job_facets tables #2419 @pawel-big-lebowski
Creates missing indices on reference columns in a number of database tables.
Spec: make data version and dataset types the same #2400 @phixme
Makes the fields property the same for datasets and dataset versions, allowing type-generating systems to treat them the same way.
UI: show location button only when link to code exists #2409 @tito12
Makes the button visible only if the link is not empty.

Assets 2

31 Jan 22:47

merobi-hub

0.30.0

8a072ff

Marquez 0.30.0

Added

Proposals: add proposal for OL facet tables #2076 @wslulciuc
Adds the proposal Optimize query performance for OpenLineage facets.
UI: display column lineage of a dataset #2293 @pawel-big-lebowski @tito12
Adds a JSON preview of column-level lineage of a selected dataset to the UI.
UI: Add soft delete option to UI #2343 @tito12
Adds option to soft delete a data record with a dialog component and double confirmation.
API: split lineage_events table to dataset_facets, run_facets, and job_facets tables. 2350, 2355, 2359
@wslulciuc, @pawel-big-lebowski
Performance improvement storing and querying facets.
Migration procedure requires manual steps if database has more than 100K lineage events.
We highly encourage users to review our migration plan.
Docker: add new script for stopping Docker #2380 @rossturk
Provides a clean way to stop a deployment via docker-compose down.
Docker: seed data for column lineage #2381 @rossturk
Adds some ColumnLineageDatasetFacet JSON snippets to docker/metadata.json to seed data for column-level lineage facets.

Fixed

API: validate RunLink and JobLink #2342 @pawel-big-lebowski
Fixes validation of the ParentRunFacet to avoid NullPointerExceptions in the case of empty run sections.
Docker: use docker-compose.web.yml as base compose file #2360 @wslulciuc
Fixes the Marquez HTTP server set in docker/up.sh so the script uses docker-compose.web.yml with overrides for dev set via docker-compose.web-dev.yml.
Docs: update copyright headers #2353 @merobi-hub
Updates the headers with the current year.
Chart: fix Helm chart #2374 @perttus
Fixes minor issues with the Helm chart.
Spec: update dataset version API spec #2389 @phixme
Adds limit and offset to the openapi.yml spec file as query parameters.

Assets 2

19 Dec 19:22

merobi-hub

0.29.0

7aa6ed0

Marquez 0.29.0

Added

Add point-in-time requests support to column-lineage endpoints #2265 @pawel-big-lebowski
Add column lineage point-in-time Java client methods #2269 @pawel-big-lebowski
Add raw event viewer to UI #2249 @tito12
Update events page with styling synchronization #2324 @phixMe
Update helm Ingress template to be cross-compatible with recent k8s versions #2275 @jlukenoff
Add delete namespace endpoint doc to OpenAPI docs #2295 @mobuchowski
Add i18next and language switcher for i18n of UI #2254 @merobi-hub @phixMe
Add indexed created_at column to lineage events table #2299 @prachim-collab

Fixed

Allow null column type in column lineage #2272 @pawel-big-lebowski
Include error message for JSON processing exception #2271 @pawel-big-lebowski
Fix column lineage when multiple jobs write to same dataset #2289 @pawel-big-lebowski
Use raw link for iconSearchArrow.svg #2280 @wslulciuc
Fill run state of parent run when created by child run #2296 @fm100
Update migration query to make it work with existing view #2308 @fm100
Fix lineage for orphaned datasets #2314 @collado-mike
Ensure job data in lineage query is not null or empty #2253 @wslulciuc
Make name and type required for datasets #2305 @wslulciuc
Remove unused filter on RunDao.updateStartState() #2319 @wslulciuc
Update linter #2322 @phixMe
Fix asset loading for web #2323 @phixMe

Assets 2

21 Nov 21:21

merobi-hub

0.28.0

3e074d8

Marquez 0.28.0

Added

Optimize current runs query for lineage API #2211 @prachim-collab
Add Code Quality, DCO and Governance docs to project #2237 #2241 @merobi-hub
Add possibility to soft-delete namespaces #2244 @mobuchowski
Add search service proposal #2203 @pawel-big-lebowski

Fixed

Show facets even when dataset has no fields #2214 @JDarDagran
Appreciate column prefix when given for ended_at #2231 @fm100
Fix bug keeping jobs from being properly deleted #2244 @mobuchowski
Fix symlink table column length #2217 @pawel-big-lebowski

Assets 2

24 Oct 20:24

merobi-hub

0.27.0

e6b0dc3

Marquez 0.27.0

Added

Implement dataset symlink feature #2066 @pawel-big-lebowski
Store column lineage facets in separate table #2096 @mzareba382 @pawel-big-lebowski
Add a lineage graph endpoint for column lineage #2124 @pawel-big-lebowski
Enrich returned dataset resource with column lineage information #2113 @pawel-big-lebowski
Add downstream column lineage #2159 @pawel-big-lebowski
Implement column lineage within Marquez Java client #2163 @pawel-big-lebowski
Provide dataset_symlinks table for SymlinkDatasetFacet #2087 @pawel-big-lebowski
Display current run state for job node in lineage graph #2146 @wslulciuc
Include column lineage in dataset resource #2148 @pawel-big-lebowski
Add indices on the job table #2161 @phixMe
Add endpoint to get column lineage by a job #2204 @pawel-big-lebowski
Add column lineage methods to Python client #2209 @pawel-big-lebowski

Changed

Update insert job function to avoid joining on symlinks for jobs with no symlinks #2144 @collado-mike
Increase size of column-lineage.description column #2205 @pawel-big-lebowski

Fixed

Add support for parentRun facet as reported by older Airflow OpenLineage versions #2130 @collado-mike
Add fix and tests for handling Airflow DAGs with dots and task groups #2126 @collado-mike @wslulciuc
Fix version bump in docker/up.sh #2129 @wslulciuc
Use clean when running shadowJar in Dockerfile #2145 @wslulciuc
Fix bug that caused a single run event to create multiple jobs #2162 @collado-mike
Fix column lineage returning multiple entries for job run multiple times #2176 @pawel-big-lebowski
Fix API spec issues #2178 @phixMe
Fix downstream recursion #2181 @pawel-big-lebowski
Update jobs_current_version_uuid_index and jobs_symlink_target_uuid_index to ignore NULL values #2186 @collado-mike

Assets 2

15 Sep 19:07

merobi-hub

0.26.0

4610b8d

Marquez 0.26.0

Added

Update FlywayFactory to support an argument to customize the schema programatically #2055 @collado-mike
Note: this change does not aim to support custom schemas from configuration.
Add steps on proposing changes to Marquez #2065 @wslulciuc
Adds steps on how to submit a proposal for review along with a design doc template.
Add --metadata option to seed backend with OpenLineage events #2082 @wslulciuc
Updates the seed command to load metadata from a file containing an array of OpenLineage events via the --metadata option. (Metadata used in the command was not being defined using the OpenLineage standard.)
Improve documentation on nodeId in the spec #2084 @howardyoo
Adds complete examples of nodeId to the spec.
Add metadata cmd #2091 @wslulciuc
Adds cmd metadata to generate OpenLineage events; generated events will be saved to a file called metadata.json that can be used to seed Marquez via the seed cmd. (We lacked a way to performance test the data model of Marquez with significantly large OL events.)
Add possibility to soft-delete datasets and jobs #2032 #2099 #2101 @mobuchowski
Adds the ability to "hide" inactive datasets and jobs through the UI. (This PR does not include the UI part.) The feature works by adding an is_hidden flag to both datasets and jobs tables. Then, it changes jobs_view and adds datasets_view, which hides rows where the is_hidden flag is set to True. This makes writing proper queries easier since there is no need to do this filtering manually. The soft-delete is reversed if the job or dataset is updated again because the new version reverts the flag.
Add raw OpenLineage events API #2070 @mobuchowski
Adds an API that returns raw OpenLineage events sorted by time and optionally filtered by namespace. Filtering by namespace takes into account both job and dataset namespaces.
Create column lineage endpoint proposal #2077 @julienledem @pawel-big-lebowski
Adds a proposal to implement a column-level lineage endpoint in Marquez to leverage the column-level lineage facet in OpenLineage.

Changed

Update lineage query to only look at jobs with inputs or outputs #2068 @collado-mike
Changes the lineage query to query the job_versions_io_mapping table and INNER join with the jobs_view so that only jobs that have inputs or outputs are present in the jobs_io CTE. Hence, the table becomes very small and the recursive join in the lineage CTE very fast. (In many environments, a large number of jobs reporting events have no inputs or outputs - e.g., PythonOperators in an Airflow deployment. If a Marquez installation has many of these, the lineage query spends much of its time searching for overlaps with jobs that have no inputs or outputs.)
Persist OpenLineage event before updating Marquez model #2069 @fm100
Switches the order of the code in order to persist the OpenLineage event first and then update the Marquez model. (When the RunTransitionListener was invoked, the OpenLineage event was not persisted to the database. Because the OpenLineage event is the source of truth for all Marquez run transitions, it should be available from RunTransitionListener.)
Drop requirement to provide marquez.yml for seed cmd #2094 @wslulciuc
Uses io.dropwizard.cli.Command instead of io.dropwizard.cli.ConfiguredCommand to no longer require passing marquez.yml as an argument to the seed cmd. (The marquez.yml argument is not used in the seed cmd.)

Fixed

Fix/rewrite jobs fqn locks #2067 @collado-mike
Updates the function to only update the table if the job is a new record or if the symlink_target_uuid is distinct from the previous value. (The rewrite_jobs_fqn_table function was inadvertently updating jobs even when no metadata about the job had changed. Under load, this caused significant locking issues, as the jobs_fqn table must be locked for every job update.)
Fix enum string types in the OpenAPI spec #2086 @studiosciences
Changes the type to string. (type: enum was not valid in OpenAPI spec.)
Fix incorrect PostgresSQL version #2089 @jabbera
Corrects the tag for PostgresSQL.
Update OpenLineageDao to handle Airflow run UUID conflicts #2097 @collado-mike
Alleviates the problem for Airflow installations that will continue to publish events with the older OpenLineage library. This checks the namespace of the parent run and verifies that it matches the namespace in the ParentRunFacet. If not, it generates a new parent run ID that will be written with the correct namespace. (The Airflow integration was generating conflicting UUIDs based on the DAG name and the DagRun ID without accounting for different namespaces. In Marquez installations that have multiple Airflow deployments with duplicated DAG names, we generated jobs whose parents have the wrong namespace.)

Assets 2

08 Aug 20:29

merobi-hub

0.25.0

1d02e9b

Marquez 0.25.0

Fixed

Fix py module release #2057 @wslulciuc
Use /bin/sh in web/docker/entrypoint.sh #2059 @wslulciuc

Assets 2

02 Aug 18:45

merobi-hub

0.24.0

6802ea9

Marquez 0.24.0

Added

Add copyright and license guidelines in CONTRIBUTING.md @wslulciuc
Add @FlywayTarget annotation to migration tests to control flyway upgrades #2035 @collado-mike

Changed

Updated jobs_view to stop computing FQN on reads and to compute on writes instead #2036 @collado-mike
Runs row reduction #2041 @collado-mike

Fixed

Update Run in the openapi spec to include a context field #2020 @esaych
Fix dataset openapi model #2038 @esaych
Fix casing on lastLifecycleState #2039 @esaych
Fix V45 migration to include initial population of jobs_fqn table #2051 @collado-mike
Fix symlinked jobs in queries #2053 @collado-mike

Assets 2

16 Jun 20:31

merobi-hub

0.23.0

2c5bea0

Marquez 0.23.0

Added

Update docker-compose.yml: Randomly map postgres db port #2000 @RNHTTR
Job parent hierarchy #1935 #1980 #1992 @collado-mike

Changed

Set default limit for listing datasets and jobs in UI from 2000 to 25 #2018 @wslulciuc

Fixed

Return the tag for postgresql to 12.1.0 #2015 @rossturk

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed

Removed

Added

Fixed

Added

Fixed

Added

Fixed

Added

Fixed

Added

Changed

Fixed

Added

Changed

Fixed

Fixed

Added

Changed

Fixed

Added

Changed

Fixed

Releases: MarquezProject/marquez

Marquez 0.32.0

Fixed

Removed

Marquez 0.31.0

Added

Fixed

Marquez 0.30.0

Added

Fixed

Marquez 0.29.0

Added

Fixed

Marquez 0.28.0

Added

Fixed

Marquez 0.27.0

Added

Changed

Fixed

Marquez 0.26.0

Added

Changed

Fixed

Marquez 0.25.0

Fixed

Marquez 0.24.0

Added

Changed

Fixed

Marquez 0.23.0

Added

Changed

Fixed