Releases: MarquezProject/marquez
Releases · MarquezProject/marquez
Marquez 0.32.0
Fixed
- API: improve dataset facets access
#2407
@pawel-big-lebowski
Improves database query performance when accessing dataset facets by rewriting SQL queries inDatasetDao
andDatasetVersionDao
. - Chart: fix communication between the UI and the API
#2430
@thomas-delrue
Defines the value forMARQUEZ_PORT
as .Values.marquez.port (80) in the Helm Chart so the Marquez Web component can communicate with the API. - UI: always render
MqCode
#2454 @JDarDagran
Fixes rendering ofDatasetInfo
andRunInfo
pages when noSqlJobFacet
exists.
Removed
- API: remove job context
#2373
@JDarDagran
Removes the use of job context and adds two endpoints for job/run facets per run. These are called from Web components to replace the job context withSQLJobFacet
. - API: remove
jobs_fqn
table and move FQN into jobs directly#2448
@collado-mike
Fixes loading of certain jobs caused by the inability to enforce uniqueness constraints on fully qualified job names.
Marquez 0.31.0
Added
- UI: add facet view enhancements
#2336
@tito12
Creates a dynamic component offering the ability to navigate and search the JSON, expand sections and click on links. - UI: highlight selected path on graph and display status of jobs and datasets based on last 14 runs or latest quality facets
#2384
@tito12
Adds highlighting of the visual graph based on upstream and downstream dependencies of selected nodes; makes displayed status reflect last 14 runs the case of jobs and latest quality facets in the case of datasets. - UI: enable auto-accessibility feature on graph nodes
#2388
@merobi-hub
Adds attributes to theFontAwesomeIcon
s to enable a built-in accessibility feature.
Fixed
- API: add index to
jobs_fqn
table usingnamespace_name
andjob_fqn
columns#2357
@collado-mike
Optimizes read queries by adding an index to this table. - API: add missing indices to
column_lineage
,dataset_facets
,job_facets
tables#2419
@pawel-big-lebowski
Creates missing indices on reference columns in a number of database tables. - Spec: make data version and dataset types the same
#2400
@phixme
Makes thefields
property the same for datasets and dataset versions, allowing type-generating systems to treat them the same way. - UI: show location button only when link to code exists
#2409
@tito12
Makes the button visible only if the link is not empty.
Marquez 0.30.0
Added
- Proposals: add proposal for OL facet tables
#2076
@wslulciuc
Adds the proposalOptimize query performance for OpenLineage facets
. - UI: display column lineage of a dataset
#2293
@pawel-big-lebowski @tito12
Adds a JSON preview of column-level lineage of a selected dataset to the UI. - UI: Add soft delete option to UI
#2343
@tito12
Adds option to soft delete a data record with a dialog component and double confirmation. - API: split
lineage_events
table todataset_facets
,run_facets
, andjob_facets
tables.2350
,2355
,2359
@wslulciuc, @pawel-big-lebowski
Performance improvement storing and querying facets.
Migration procedure requires manual steps if database has more than 100K lineage events.
We highly encourage users to review our migration plan. - Docker: add new script for stopping Docker
#2380
@rossturk
Provides a clean way to stop a deployment viadocker-compose down
. - Docker: seed data for column lineage
#2381
@rossturk
Adds someColumnLineageDatasetFacet
JSON snippets todocker/metadata.json
to seed data for column-level lineage facets.
Fixed
- API: validate
RunLink
andJobLink
#2342
@pawel-big-lebowski
Fixes validation of theParentRunFacet
to avoidNullPointerException
s in the case of empty run sections. - Docker: use
docker-compose.web.yml
as base compose file#2360
@wslulciuc
Fixes the Marquez HTTP server set indocker/up.sh
so the script usesdocker-compose.web.yml
with overrides fordev
set viadocker-compose.web-dev.yml
. - Docs: update copyright headers
#2353
@merobi-hub
Updates the headers with the current year. - Chart: fix Helm chart
#2374
@perttus
Fixes minor issues with the Helm chart. - Spec: update dataset version API spec
#2389
@phixme
Addslimit
andoffset
to the openapi.yml spec file as query parameters.
Marquez 0.29.0
Added
- Add point-in-time requests support to column-lineage endpoints #2265 @pawel-big-lebowski
- Add column lineage point-in-time Java client methods #2269 @pawel-big-lebowski
- Add raw event viewer to UI #2249 @tito12
- Update events page with styling synchronization #2324 @phixMe
- Update helm Ingress template to be cross-compatible with recent k8s versions #2275 @jlukenoff
- Add delete namespace endpoint doc to OpenAPI docs #2295 @mobuchowski
- Add i18next and language switcher for i18n of UI #2254 @merobi-hub @phixMe
- Add indexed
created_at
column to lineage events table #2299 @prachim-collab
Fixed
- Allow null column type in column lineage #2272 @pawel-big-lebowski
- Include error message for JSON processing exception #2271 @pawel-big-lebowski
- Fix column lineage when multiple jobs write to same dataset #2289 @pawel-big-lebowski
- Use raw link for
iconSearchArrow.svg
#2280 @wslulciuc - Fill run state of parent run when created by child run #2296 @fm100
- Update migration query to make it work with existing view #2308 @fm100
- Fix lineage for orphaned datasets #2314 @collado-mike
- Ensure job data in lineage query is not null or empty #2253 @wslulciuc
- Make name and type required for datasets #2305 @wslulciuc
- Remove unused filter on
RunDao.updateStartState()
#2319 @wslulciuc - Update linter #2322 @phixMe
- Fix asset loading for web #2323 @phixMe
Marquez 0.28.0
Added
- Optimize current runs query for lineage API #2211 @prachim-collab
- Add Code Quality, DCO and Governance docs to project #2237 #2241 @merobi-hub
- Add possibility to soft-delete namespaces #2244 @mobuchowski
- Add search service proposal #2203 @pawel-big-lebowski
Fixed
- Show facets even when dataset has no fields #2214 @JDarDagran
- Appreciate column prefix when given for ended_at #2231 @fm100
- Fix bug keeping jobs from being properly deleted #2244 @mobuchowski
- Fix symlink table column length #2217 @pawel-big-lebowski
Marquez 0.27.0
Added
- Implement dataset symlink feature #2066 @pawel-big-lebowski
- Store column lineage facets in separate table #2096 @mzareba382 @pawel-big-lebowski
- Add a lineage graph endpoint for column lineage #2124 @pawel-big-lebowski
- Enrich returned dataset resource with column lineage information #2113 @pawel-big-lebowski
- Add downstream column lineage #2159 @pawel-big-lebowski
- Implement column lineage within Marquez Java client #2163 @pawel-big-lebowski
- Provide
dataset_symlinks
table forSymlinkDatasetFacet
#2087 @pawel-big-lebowski - Display current run state for job node in lineage graph #2146 @wslulciuc
- Include column lineage in dataset resource #2148 @pawel-big-lebowski
- Add indices on the job table #2161 @phixMe
- Add endpoint to get column lineage by a job #2204 @pawel-big-lebowski
- Add column lineage methods to Python client #2209 @pawel-big-lebowski
Changed
- Update insert job function to avoid joining on symlinks for jobs with no symlinks #2144 @collado-mike
- Increase size of
column-lineage.description
column #2205 @pawel-big-lebowski
Fixed
- Add support for
parentRun
facet as reported by older Airflow OpenLineage versions #2130 @collado-mike - Add fix and tests for handling Airflow DAGs with dots and task groups #2126 @collado-mike @wslulciuc
- Fix version bump in docker/up.sh #2129 @wslulciuc
- Use clean when running shadowJar in Dockerfile #2145 @wslulciuc
- Fix bug that caused a single run event to create multiple jobs #2162 @collado-mike
- Fix column lineage returning multiple entries for job run multiple times #2176 @pawel-big-lebowski
- Fix API spec issues #2178 @phixMe
- Fix downstream recursion #2181 @pawel-big-lebowski
- Update
jobs_current_version_uuid_index
andjobs_symlink_target_uuid_index
to ignore NULL values #2186 @collado-mike
Marquez 0.26.0
Added
- Update FlywayFactory to support an argument to customize the schema programatically #2055 @collado-mike
Note: this change does not aim to support custom schemas from configuration. - Add steps on proposing changes to Marquez #2065 @wslulciuc
Adds steps on how to submit a proposal for review along with a design doc template. - Add
--metadata
option to seed backend with OpenLineage events #2082 @wslulciuc
Updates the seed command to load metadata from a file containing an array of OpenLineage events via the--metadata
option. (Metadata used in the command was not being defined using the OpenLineage standard.) - Improve documentation on
nodeId
in the spec #2084 @howardyoo
Adds complete examples of nodeId to the spec. - Add
metadata
cmd #2091 @wslulciuc
Adds cmdmetadata
to generate OpenLineage events; generated events will be saved to a file calledmetadata.json
that can be used to seed Marquez via the seed cmd. (We lacked a way to performance test the data model of Marquez with significantly large OL events.) - Add possibility to soft-delete datasets and jobs #2032 #2099 #2101 @mobuchowski
Adds the ability to "hide" inactive datasets and jobs through the UI. (This PR does not include the UI part.) The feature works by adding an is_hidden flag to both datasets and jobs tables. Then, it changes jobs_view and adds datasets_view, which hides rows where the is_hidden flag is set to True. This makes writing proper queries easier since there is no need to do this filtering manually. The soft-delete is reversed if the job or dataset is updated again because the new version reverts the flag. - Add raw OpenLineage events API #2070 @mobuchowski
Adds an API that returns raw OpenLineage events sorted by time and optionally filtered by namespace. Filtering by namespace takes into account both job and dataset namespaces. - Create column lineage endpoint proposal #2077 @julienledem @pawel-big-lebowski
Adds a proposal to implement a column-level lineage endpoint in Marquez to leverage the column-level lineage facet in OpenLineage.
Changed
- Update lineage query to only look at jobs with inputs or outputs #2068 @collado-mike
Changes the lineage query to query the job_versions_io_mapping table and INNER join with the jobs_view so that only jobs that have inputs or outputs are present in the jobs_io CTE. Hence, the table becomes very small and the recursive join in the lineage CTE very fast. (In many environments, a large number of jobs reporting events have no inputs or outputs - e.g., PythonOperators in an Airflow deployment. If a Marquez installation has many of these, the lineage query spends much of its time searching for overlaps with jobs that have no inputs or outputs.) - Persist OpenLineage event before updating Marquez model #2069 @fm100
Switches the order of the code in order to persist the OpenLineage event first and then update the Marquez model. (When the RunTransitionListener was invoked, the OpenLineage event was not persisted to the database. Because the OpenLineage event is the source of truth for all Marquez run transitions, it should be available from RunTransitionListener.) - Drop requirement to provide marquez.yml for seed cmd #2094 @wslulciuc
Usesio.dropwizard.cli.Command
instead ofio.dropwizard.cli.ConfiguredCommand
to no longer require passingmarquez.yml
as an argument to the seed cmd. (The marquez.yml argument is not used in the seed cmd.)
Fixed
- Fix/rewrite jobs fqn locks #2067 @collado-mike
Updates the function to only update the table if the job is a new record or if the symlink_target_uuid is distinct from the previous value. (The rewrite_jobs_fqn_table function was inadvertently updating jobs even when no metadata about the job had changed. Under load, this caused significant locking issues, as the jobs_fqn table must be locked for every job update.) - Fix enum string types in the OpenAPI spec #2086 @studiosciences
Changes the type to string. (type: enum was not valid in OpenAPI spec.) - Fix incorrect PostgresSQL version #2089 @jabbera
Corrects the tag for PostgresSQL. - Update
OpenLineageDao
to handle Airflow run UUID conflicts #2097 @collado-mike
Alleviates the problem for Airflow installations that will continue to publish events with the older OpenLineage library. This checks the namespace of the parent run and verifies that it matches the namespace in the ParentRunFacet. If not, it generates a new parent run ID that will be written with the correct namespace. (The Airflow integration was generating conflicting UUIDs based on the DAG name and the DagRun ID without accounting for different namespaces. In Marquez installations that have multiple Airflow deployments with duplicated DAG names, we generated jobs whose parents have the wrong namespace.)
Marquez 0.25.0
Fixed
- Fix py module release #2057 @wslulciuc
- Use /bin/sh in web/docker/entrypoint.sh #2059 @wslulciuc
Marquez 0.24.0
Added
- Add copyright lines to all source files #1996 @merobi-hub
- Add copyright and license guidelines in CONTRIBUTING.md @wslulciuc
- Add @FlywayTarget annotation to migration tests to control flyway upgrades #2035 @collado-mike
Changed
- Updated
jobs_view
to stop computing FQN on reads and to compute on writes instead #2036 @collado-mike - Runs row reduction #2041 @collado-mike
Fixed
- Update
Run
in the openapi spec to include acontext
field #2020 @esaych - Fix dataset openapi model #2038 @esaych
- Fix casing on lastLifecycleState #2039 @esaych
- Fix V45 migration to include initial population of jobs_fqn table #2051 @collado-mike
- Fix symlinked jobs in queries #2053 @collado-mike
Marquez 0.23.0
Added
- Update docker-compose.yml: Randomly map postgres db port #2000 @RNHTTR
- Job parent hierarchy #1935 #1980 #1992 @collado-mike
Changed
- Set default limit for listing datasets and jobs in UI from 2000 to 25 #2018 @wslulciuc