Skip to content

Latest commit

 

History

History
923 lines (687 loc) · 41.5 KB

CHANGES.md

File metadata and controls

923 lines (687 loc) · 41.5 KB

Changelog

(Future) 0.7.7

Breaking Changes

  • The default sqlite and dagster-postgres implementations have been altered to extract the event step_key field as a column, to enable faster per-step queries. You will need to run dagster instance migrate to update the schema. You may optionally migrate your historical event log data to extract the step_key using the migrate_event_log_data function. This will ensure that your historical event log data will be captured in future step-key based views. This event_log data migration can be invoked as follows:

    from dagster.core.storage.event_log.migration import migrate_event_log_data
    from dagster import DagsterInstance
    
    migrate_event_log_data(instance=DagsterInstance.get())
  • We have made pipeline metadata serializable and persist that along with run information. While there are no user-facing features to leverage this yet, it does require an instance migration. dagster instance migrate. If you have already run the migration for the event_log changes above, you do not need to run it again. Any unforeseen errors related the the new snapshot_id in the runs table or the new snapshots table are related to this migration.

  • dagster-pandas ColumnTypeConstraint has been removed in favor of ColumnDTypeFnConstraint and ColumnDTypeInSetConstraint.

New

  • You can now specify that dagstermill output notebooks be yielded as an output from dagstermill solids, in addition to being materialized.
  • You may now set the extension on files created using the FileManager machinery.
  • dagster-pandas typed PandasColumn constructors now support pandas 1.0 dtypes.
  • The Dagit Playground has been restructured to make the relationship between Preset, Partition Sets, Modes, and subsets more clear. All of these buttons have be reconciled and moved to the left side of the Playground.
  • Config sections that are required but not filled out in the Dagit playground are now detected and labeled in orange.
  • dagster-celery config now support using env: to load from environment variables.

Bugfix

  • Fixed a bug where selecting a preset in dagit would not populate tags specified on the pipeline definition.
  • Fixed a bug where metadata attached to a raised Failure was not displayed in the error modal in dagit.
  • Fixed an issue where reimporting dagstermill and calling dagstermill.get_context() outside of the parameters cell of a dagstermill notebook could lead to unexpected behavior.
  • Fixed an issue with connection pooling in dagster-postgres, improving responsiveness when using the Postgres-backed storages.

Experimental

  • Added a longitudinal view of runs for on the Schedule tab for scheduled, partitioned pipelines. Includes views of run status, execution time, and materializations across partitions. The UI is in flux and is currently optimized for daily schedules, but feedback is welcome.

0.7.6

Breaking Changes

  • default_value in Field no longer accepts native instances of python enums. Instead the underlying string representation in the config system must be used.
  • default_value in Field no longer accepts callables.
  • The dagster_aws imports have been reorganized; you should now import resources from dagster_aws.<AWS service name>. dagster_aws provides s3, emr, redshift, and cloudwatch modules.
  • The dagster_aws S3 resource no longer attempts to model the underlying boto3 API, and you can now just use any boto3 S3 API directly on a S3 resource, e.g. context.resources.s3.list_objects_v2. (#2292)

New

  • New Playground view in dagit showing an interactive config map
  • Improved storage and UI for showing schedule attempts
  • Added the ability to set default values in InputDefinition
  • Added CLI command dagster pipeline launch to launch runs using a configured RunLauncher
  • Added ability to specify pipeline run tags using the CLI
  • Added a pdb utility to SolidExecutionContext to help with debugging, available within a solid as context.pdb
  • Added PresetDefinition.with_additional_config to allow for config overrides
  • Added resource name to log messages generated during resource initialization
  • Added grouping tags for runs that have been retried / reexecuted.

Bugfix

  • Fixed a bug where date range partitions with a specified end date was clipping the last day
  • Fixed an issue where some schedule attempts that failed to start would be marked running forever.
  • Fixed the @weekly partitioned schedule decorator
  • Fixed timezone inconsistencies between the runs view and the schedules view
  • Integers are now accepted as valid values for Float config fields
  • Fixed an issue when executing dagstermill solids with config that contained quote characters.

dagstermill

  • The Jupyter kernel to use may now be specified when creating dagster notebooks with the --kernel flag.

dagster-dbt

  • dbt_solid now has a Nothing input to allow for sequencing

dagster-k8s

  • Added get_celery_engine_config to select celery engine, leveraging Celery infrastructure

Documentation

  • Improvements to the airline and bay bikes demos
  • Improvements to our dask deployment docs (Thanks jswaney!!)

0.7.5

New

  • Added the IntSource type, which lets integers be set from environment variables in config.

  • You may now set tags on pipeline definitions. These will resolve in the following cases:

    1. Loading in the playground view in Dagit will pre-populate the tag container.
    2. Loading partition sets from the preset/config picker will pre-populate the tag container with the union of pipeline tags and partition tags, with partition tags taking precedence.
    3. Executing from the CLI will generate runs with the pipeline tags.
    4. Executing programmatically using the execute_pipeline api will create a run with the union of pipeline tags and RunConfig tags, with RunConfig tags taking precedence.
    5. Scheduled runs (both launched and executed) will have the union of pipeline tags and the schedule tags function, with the schedule tags taking precedence.
  • Output materialization configs may now yield multiple Materializations, and the tutorial has been updated to reflect this.

  • We now export the SolidExecutionContext in the public API so that users can correctly type hint solid compute functions.

Dagit

  • Pipeline run tags are now preserved when resuming/retrying from Dagit.
  • Scheduled run stats are now grouped by partition.
  • A "preparing" section has been added to the execution viewer. This shows steps that are in progress of starting execution.
  • Markers emitted by the underlying execution engines are now visualized in the Dagit execution timeline.

Bugfix

  • Resume/retry now works as expected in the presence of solids that yield optional outputs.
  • Fixed an issue where dagster-celery workers were failing to start in the presence of config values that were None.
  • Fixed an issue with attempting to set threads_per_worker on Dask distributed clusters.

dagster-postgres

  • All postgres config may now be set using environment variables in config.

dagster-aws

  • The s3_resource now exposes a list_objects_v2 method corresponding to the underlying boto3 API. (Thanks, @basilvetas!)
  • Added the redshift_resource to access Redshift databases.

dagster-k8s

  • The K8sRunLauncher config now includes the load_kubeconfig and kubeconfig_file options.

Documentation

  • Fixes and improvements.

Dependencies

  • dagster-airflow no longer pins its werkzeug dependency.

Community

  • We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform development priorities. Telemetry data will motivate projects such as adding features in frequently-used parts of the CLI and adding more examples in the docs in areas where users encounter more errors.

    We will not see or store solid definitions (including generated context) or pipeline definitions (including modes and resources). We will not see or store any data that is processed within solids and pipelines.

    If you'd like to opt in to telemetry, please add the following to $DAGSTER_HOME/dagster.yaml:

    telemetry:
      enabled: true
    
  • Thanks to @basilvetas and @hspak for their contributions!

0.7.4

New

  • It is now possible to use Postgres to back schedule storage by configuring dagster_postgres.PostgresScheduleStorage on the instance.
  • Added the execute_pipeline_with_mode API to allow executing a pipeline in test with a specific mode without having to specify RunConfig.
  • Experimental support for retries in the Celery executor.
  • It is now possible to set run-level priorities for backfills run using the Celery executor by passing --celery-base-priority to dagster pipeline backfill.
  • Added the @weekly schedule decorator.

Deprecations

  • The dagster-ge library has been removed from this release due to drift from the underlying Great Expectations implementation.

dagster-pandas

  • PandasColumn now includes an is_optional flag, replacing the previous ColumnExistsConstraint.
  • You can now pass the ignore_missing_values flag to PandasColumn in order to apply column constraints only to the non-missing rows in a column.

dagster-k8s

  • The Helm chart now includes provision for an Ingress and for multiple Celery queues.

Documentation

  • Improvements and fixes.

0.7.3

New

  • It is now possible to configure a dagit instance to disable executing pipeline runs in a local subprocess.
  • Resource initialization, teardown, and associated failure states now emit structured events visible in Dagit. Structured events for pipeline errors and multiprocess execution have been consolidated and rationalized.
  • Support Redis queue provider in dagster-k8s Helm chart.
  • Support external postgresql in dagster-k8s Helm chart.

Bugfix

  • Fixed an issue with inaccurate timings on some resource initializations.
  • Fixed an issue that could cause the multiprocess engine to spin forever.
  • Fixed an issue with default value resolution when a config value was set using SourceString.
  • Fixed an issue when loading logs from a pipeline belonging to a different repository in Dagit.
  • Fixed an issue with where the CLI command dagster schedule up would fail in certain scenarios with the SystemCronScheduler.

Pandas

  • Column constraints can now be configured to permit NaN values.

Dagstermill

  • Removed a spurious dependency on sklearn.

Docs

  • Improvements and fixes to docs.
  • Restored dagster.readthedocs.io.

Experimental

  • An initial implementation of solid retries, throwing a RetryRequested exception, was added. This API is experimental and likely to change.

Other

  • Renamed property runtime_type to dagster_type in definitions. The following are deprecated and will be removed in a future version.
    • InputDefinition.runtime_type is deprecated. Use InputDefinition.dagster_type instead.
    • OutputDefinition.runtime_type is deprecated. Use OutputDefinition.dagster_type instead.
    • CompositeSolidDefinition.all_runtime_types is deprecated. Use CompositeSolidDefinition.all_dagster_types instead.
    • SolidDefinition.all_runtime_types is deprecated. Use SolidDefinition.all_dagster_types instead.
    • PipelineDefinition.has_runtime_type is deprecated. Use PipelineDefinition.has_dagster_type instead.
    • PipelineDefinition.runtime_type_named is deprecated. Use PipelineDefinition.dagster_type_named instead.
    • PipelineDefinition.all_runtime_types is deprecated. Use PipelineDefinition.all_dagster_types instead.

0.7.2

Docs

  • New docs site at docs.dagster.io.
  • dagster.readthedocs.io is currently stale due to availability issues.

New

  • Improvements to S3 Resource. (Thanks @dwallace0723!)
  • Better error messages in Dagit.
  • Better font/styling support in Dagit.
  • Changed OutputDefinition to take is_required rather than is_optional argument. This is to remain consistent with changes to Field in 0.7.1 and to avoid confusion with python's typing and dagster's definition of Optional, which indicates None-ability, rather than existence. is_optional is deprecated and will be removed in a future version.
  • Added support for Flower in dagster-k8s.
  • Added support for environment variable config in dagster-snowflake.

Bugfixes

  • Improved performance in Dagit waterfall view.
  • Fixed bug when executing solids downstream of a skipped solid.
  • Improved navigation experience for pipelines in Dagit.
  • Fixed for the dagster-aws CLI tool.
  • Fixed issue starting Dagit without DAGSTER_HOME set on windows.
  • Fixed pipeline subset execution in partition-based schedules.

0.7.1

Dagit

  • Dagit now looks up an available port on which to run when the default port is not available. (Thanks @rparrapy!)

dagster_pandas

  • Hydration and materialization are now configurable on dagster_pandas dataframes.

dagster_aws

  • The s3_resource no longer uses an unsigned session by default.

Bugfixes

  • Type check messages are now displayed in Dagit.
  • Failure metadata is now surfaced in Dagit.
  • Dagit now correctly displays the execution time of steps that error.
  • Error messages now appear correctly in console logging.
  • GCS storage is now more robust to transient failures.
  • Fixed an issue where some event logs could be duplicated in Dagit.
  • Fixed an issue when reading config from an environment variable that wasn't set.
  • Fixed an issue when loading a repository or pipeline from a file target on Windows.
  • Fixed an issue where deleted runs could cause the scheduler page to crash in Dagit.

Documentation

  • Expanded and improved docs and error messages.

0.7.0

Breaking Changes

There are a substantial number of breaking changes in the 0.7.0 release. Please see 070_MIGRATION.md for instructions regarding migrating old code.

Scheduler

  • The scheduler configuration has been moved from the @schedules decorator to DagsterInstance. Existing schedules that have been running are no longer compatible with current storage. To migrate, remove the scheduler argument on all @schedules decorators:

    instead of:

    @schedules(scheduler=SystemCronScheduler)
    def define_schedules():
      ...
    

    Remove the scheduler argument:

    @schedules
    def define_schedules():
      ...
    

    Next, configure the scheduler on your instance by adding the following to $DAGSTER_HOME/dagster.yaml:

    scheduler:
      module: dagster_cron.cron_scheduler
      class: SystemCronScheduler
    

    Finally, if you had any existing schedules running, delete the existing $DAGSTER_HOME/schedules directory and run dagster schedule wipe && dagster schedule up to re-instatiate schedules in a valid state.

  • The should_execute and environment_dict_fn argument to ScheduleDefinition now have a required first argument context, representing the ScheduleExecutionContext

Config System Changes

  • In the config system, Dict has been renamed to Shape; List to Array; Optional to Noneable; and PermissiveDict to Permissive. The motivation here is to clearly delineate config use cases versus cases where you are using types as the inputs and outputs of solids as well as python typing types (for mypy and friends). We believe this will be clearer to users in addition to simplifying our own implementation and internal abstractions.

    Our recommended fix is not to used Shape and Array, but instead to use our new condensed config specification API. This allow one to use bare dictionaries instead of Shape, lists with one member instead of Array, bare types instead of Field with a single argument, and python primitive types (int, bool etc) instead of the dagster equivalents. These result in dramatically less verbose config specs in most cases.

    So instead of

    from dagster import Shape, Field, Int, Array, String
    # ... code
    config=Shape({ # Dict prior to change
          'some_int' : Field(Int),
          'some_list: Field(Array[String]) # List prior to change
      })
    

    one can instead write:

    config={'some_int': int, 'some_list': [str]}
    

    No imports and much simpler, cleaner syntax.

  • config_field is no longer a valid argument on solid, SolidDefinition, ExecutorDefintion, executor, LoggerDefinition, logger, ResourceDefinition, resource, system_storage, and SystemStorageDefinition. Use config instead.

  • For composite solids, the config_fn no longer takes a ConfigMappingContext, and the context has been deleted. To upgrade, remove the first argument to config_fn.

    So instead of

    @composite_solid(config={}, config_fn=lambda context, config: {})
    

    one must instead write:

    @composite_solid(config={}, config_fn=lambda config: {})
    
  • Field takes a is_required rather than a is_optional argument. This is to avoid confusion with python's typing and dagster's definition of Optional, which indicates None-ability, rather than existence. is_optional is deprecated and will be removed in a future version.

Required Resources

  • All solids, types, and config functions that use a resource must explicitly list that resource using the argument required_resource_keys. This is to enable efficient resource management during pipeline execution, especially in a multiprocessing or remote execution environment.

  • The @system_storage decorator now requires argument required_resource_keys, which was previously optional.

Dagster Type System Changes

  • dagster.Set and dagster.Tuple can no longer be used within the config system.
  • Dagster types are now instances of DagsterType, rather than a class than inherits from RuntimeType. Instead of dynamically generating a class to create a custom runtime type, just create an instance of a DagsterType. The type checking function is now an argument to the DagsterType, rather than an abstract method that has to be implemented in a subclass.
  • RuntimeType has been renamed to DagsterType is now an encouraged API for type creation.
  • Core type check function of DagsterType can now return a naked bool in addition to a TypeCheck object.
  • type_check_fn on DagsterType (formerly type_check and RuntimeType, respectively) now takes a first argument context of type TypeCheckContext in addition to the second argument of value.
  • define_python_dagster_type has been eliminated in favor of PythonObjectDagsterType .
  • dagster_type has been renamed to usable_as_dagster_type.
  • as_dagster_type has been removed and similar capabilities added as make_python_type_usable_as_dagster_type.
  • PythonObjectDagsterType and usable_as_dagster_type no longer take a type_check argument. If a custom type_check is needed, use DagsterType.
  • As a consequence of these changes, if you were previously using dagster_pyspark or dagster_pandas and expecting Pyspark or Pandas types to work as Dagster types, e.g., in type annotations to functions decorated with @solid to indicate that they are input or output types for a solid, you will need to call make_python_type_usable_as_dagster_type from your code in order to map the Python types to the Dagster types, or just use the Dagster types (dagster_pandas.DataFrame instead of pandas.DataFrame) directly.

Other

  • We no longer publish base Docker images. Please see the updated deployment docs for an example Dockerfile off of which you can work.
  • step_metadata_fn has been removed from SolidDefinition & @solid.
  • SolidDefinition & @solid now takes tags and enforces that values are strings or are safely encoded as JSON. metadata is deprecated and will be removed in a future version.
  • resource_mapper_fn has been removed from SolidInvocation.

New

  • Dagit now includes a much richer execution view, with a Gantt-style visualization of step execution and a live timeline.

  • Early support for Python 3.8 is now available, and Dagster/Dagit along with many of our libraries are now tested against 3.8. Note that several of our upstream dependencies have yet to publish wheels for 3.8 on all platforms, so running on Python 3.8 likely still involves building some dependencies from source.

  • dagster/priority tags can now be used to prioritize the order of execution for the built-in in-process and multiprocess engines.

  • dagster-postgres storages can now be configured with separate arguments and environment variables, such as:

    run_storage:
      module: dagster_postgres.run_storage
      class: PostgresRunStorage
      config:
        postgres_db:
          username: test
          password:
            env: ENV_VAR_FOR_PG_PASSWORD
          hostname: localhost
          db_name: test
    
  • Support for RunLaunchers on DagsterInstance allows for execution to be "launched" outside of the Dagit/Dagster process. As one example, this is used by dagster-k8s to submit pipeline execution as a Kubernetes Job.

  • Added support for adding tags to runs initiated from the Playground view in dagit.

  • Added @monthly_schedule decorator.

  • Added Enum.from_python_enum helper to wrap Python enums for config. (Thanks @kdungs!)

  • [dagster-bash] The Dagster bash solid factory now passes along kwargs to the underlying solid construction, and now has a single Nothing input by default to make it easier to create a sequencing dependency. Also, logs are now buffered by default to make execution less noisy.

  • [dagster-aws] We've improved our EMR support substantially in this release. The dagster_aws.emr library now provides an EmrJobRunner with various utilities for creating EMR clusters, submitting jobs, and waiting for jobs/logs. We also now provide a emr_pyspark_resource, which together with the new @pyspark_solid decorator makes moving pyspark execution from your laptop to EMR as simple as changing modes. [dagster-pandas] Added create_dagster_pandas_dataframe_type, PandasColumn, and Constraint API's in order for users to create custom types which perform column validation, dataframe validation, summary statistics emission, and dataframe serialization/deserialization.

  • [dagster-gcp] GCS is now supported for system storage, as well as being supported with the Dask executor. (Thanks @habibutsu!) Bigquery solids have also been updated to support the new API.

Bugfix

  • Ensured that all implementations of RunStorage clean up pipeline run tags when a run is deleted. Requires a storage migration, using dagster instance migrate.
  • The multiprocess and Celery engines now handle solid subsets correctly.
  • The multiprocess and Celery engines will now correctly emit skip events for steps downstream of failures and other skips.
  • The @solid and @lambda_solid decorators now correctly wrap their decorated functions, in the sense of functools.wraps.
  • Performance improvements in Dagit when working with runs with large configurations.
  • The Helm chart in dagster_k8s has been hardened against various failure modes and is now compatible with Helm 2.
  • SQLite run and event log storages are more robust to concurrent use.
  • Improvements to error messages and to handling of user code errors in input hydration and output materialization logic.
  • Fixed an issue where the Airflow scheduler could hang when attempting to load dagster-airflow pipelines.
  • We now handle our SQLAlchemy connections in a more canonical way (thanks @zzztimbo!).
  • Fixed an issue using S3 system storage with certain custom serialization strategies.
  • Fixed an issue leaking orphan processes from compute logging.
  • Fixed an issue leaking semaphores from Dagit.
  • Setting the raise_error flag in execute_pipeline now actually raises user exceptions instead of a wrapper type.

Documentation

  • Our docs have been reorganized and expanded (thanks @habibutsu, @vatervonacht, @zzztimbo). We'd love feedback and contributions!

Thank you Thank you to all of the community contributors to this release!! In alphabetical order: @habibutsu, @kdungs, @vatervonacht, @zzztimbo.

0.6.9

Bugfix

  • Improved SQLite concurrency issues, uncovered while using concurrent nodes in Airflow
  • Fixed sqlalchemy warnings (thanks @zzztimbo!)
  • Fixed Airflow integration issue where a Dagster child process triggered a signal handler of a parent Airflow process via a process fork
  • Fixed GCS and AWS intermediate store implementations to be compatible with read/write mode serialization strategies
  • Improve test stability

Documentation

  • Improved descriptions for setting up the cron scheduler (thanks @zzztimbo!)

0.6.8

New

  • Added the dagster-github library, a community contribution from @Ramshackle-Jamathon and @k-mahoney!

dagster-celery

  • Simplified and improved config handling.
  • An engine event is now emitted when the engine fails to connect to a broker.

Bugfix

  • Fixes a file descriptor leak when running many concurrent dagster-graphql queries (e.g., for backfill).
  • The @pyspark_solid decorator now handles inputs correctly.
  • The handling of solid compute functions that accept kwargs but which are decorated with explicit input definitions has been rationalized.
  • Fixed race conditions in concurrent execution using SQLite event log storage with concurrent execution, uncovered by upstream improvements in the Python inotify library we use.

Documentation

  • Improved error messages when using system storages that don't fulfill executor requirements.

0.6.7

New

  • We are now more permissive when specifying configuration schema in order make constructing configuration schema more concise.
  • When specifying the value of scalar inputs in config, one can now specify that value directly as the key of the input, rather than having to embed it within a value key.

Breaking

  • The implementation of SQL-based event log storages has been consolidated, which has entailed a schema change. If you have event logs stored in a Postgres- or SQLite-backed event log storage, and you would like to maintain access to these logs, you should run dagster instance migrate. To check what event log storages you are using, run dagster instance info.
  • Type matches on both sides of an InputMapping or OutputMapping are now enforced.

New

  • Dagster is now tested on Python 3.8
  • Added the dagster-celery library, which implements a Celery-based engine for parallel pipeline execution.
  • Added the dagster-k8s library, which includes a Helm chart for a simple Dagit installation on a Kubernetes cluster.

Dagit

  • The Explore UI now allows you to render a subset of a large DAG via a new solid query bar that accepts terms like solid_name+* and +solid_name+. When viewing very large DAGs, nothing is displayed by default and * produces the original behavior.
  • Performance improvements in the Explore UI and config editor for large pipelines.
  • The Explore UI now includes a zoom slider that makes it easier to navigate large DAGs.
  • Dagit pages now render more gracefully in the presence of inconsistent run storage and event logs.
  • Improved handling of GraphQL errors and backend programming errors.
  • Minor display improvements.

dagster-aws

  • A default prefix is now configurable on APIs that use S3.
  • S3 APIs now parametrize region_name and endpoint_url.

dagster-gcp

  • A default prefix is now configurable on APIs that use GCS.

dagster-postgres

  • Performance improvements for Postgres-backed storages.

dagster-pyspark

  • Pyspark sessions may now be configured to be held open after pipeline execution completes, to enable extended test cases.

dagster-spark

  • spark_outputs must now be specified when initializing a SparkSolidDefinition, rather than in config.
  • Added new create_spark_solid helper and new spark_resource.
  • Improved EMR implementation.

Bugfix

  • Fixed an issue retrieving output values using SolidExecutionResult (e.g., in test) for dagster-pyspark solids.
  • Fixes an issue when expanding composite solids in Dagit.
  • Better errors when solid names collide.
  • Config mapping in composite solids now works as expected when the composite solid has no top level config.
  • Compute log filenames are now guaranteed not to exceed the POSIX limit of 255 chars.
  • Fixes an issue when copying and pasting solid names from Dagit.
  • Termination now works as expected in the multiprocessing executor.
  • The multiprocessing executor now executes parallel steps in the expected order.
  • The multiprocessing executor now correctly handles solid subsets.
  • Fixed a bad error condition in dagster_ssh.sftp_solid.
  • Fixed a bad error message giving incorrect log level suggestions.

Documentation

  • Minor fixes and improvements.

Thank you Thank you to all of the community contributors to this release!! In alphabetical order: @cclauss, @deem0n, @irabinovitch, @pseudoPixels, @Ramshackle-Jamathon, @rparrapy, @yamrzou.

0.6.6

Breaking

  • The selector argument to PipelineDefinition has been removed. This API made it possible to construct a PipelineDefinition in an invalid state. Use PipelineDefinition.build_sub_pipeline instead.

New

  • Added the dagster_prometheus library, which exposes a basic Prometheus resource.
  • Dagster Airflow DAGs may now use GCS instead of S3 for storage.
  • Expanded interface for schedule management in Dagit.

Dagit

  • Performance improvements when loading, displaying, and editing config for large pipelines.
  • Smooth scrolling zoom in the explore tab replaces the previous two-step zoom.
  • No longer depends on internet fonts to run, allowing fully offline dev.
  • Typeahead behavior in search has improved.
  • Invocations of composite solids remain visible in the sidebar when the solid is expanded.
  • The config schema panel now appears when the config editor is first opened.
  • Interface now includes hints for autocompletion in the config editor.
  • Improved display of solid inputs and output in the explore tab.
  • Provides visual feedback while filter results are loading.
  • Better handling of pipelines that aren't present in the currently loaded repo.

Bugfix

  • Dagster Airflow DAGs previously could crash while handling Python errors in DAG logic.
  • Step failures when running Dagster Airflow DAGs were previously not being surfaced as task failures in Airflow.
  • Dagit could previously get into an invalid state when switching pipelines in the context of a solid subselection.
  • frozenlist and frozendict now pass Dagster's parameter type checks for list and dict.
  • The GraphQL playground in Dagit is now working again.

Nits

  • Dagit now prints its pid when it loads.
  • Third-party dependencies have been relaxed to reduce the risk of version conflicts.
  • Improvements to docs and example code.

0.6.5

Breaking

  • The interface for type checks has changed. Previously the type_check_fn on a custom type was required to return None (=passed) or else raise Failure (=failed). Now, a type_check_fn may return True/False to indicate success/failure in the ordinary case, or else return a TypeCheck. The newsuccess field on TypeCheck now indicates success/failure. This obviates the need for the typecheck_metadata_fn, which has been removed.
  • Executions of individual composite solids (e.g. in test) now produce a CompositeSolidExecutionResult rather than a SolidExecutionResult.
  • dagster.core.storage.sqlite_run_storage.SqliteRunStorage has moved to dagster.core.storage.runs.SqliteRunStorage. Any persisted dagster.yaml files should be updated with the new classpath.
  • is_secret has been removed from Field. It was not being used to any effect.
  • The environmentType and configTypes fields have been removed from the dagster-graphql Pipeline type. The configDefinition field on SolidDefinition has been renamed to configField.

Bugfix

  • PresetDefinition.from_files is now guaranteed to give identical results across all Python minor versions.
  • Nested composite solids with no config, but with config mapping functions, now behave as expected.
  • The dagster-airflow DagsterKubernetesPodOperator has been fixed.
  • Dagit is more robust to changes in repositories.
  • Improvements to Dagit interface.

New

  • dagster_pyspark now supports remote execution on EMR with the @pyspark_solid decorator.

Nits

  • Documentation has been improved.
  • The top level config field features in the dagster.yaml will no longer have any effect.
  • Third-party dependencies have been relaxed to reduce the risk of version conflicts.

0.6.4

  • Scheduler errors are now visible in dagit
  • Run termination button no longer persists past execution completion
  • Fixes run termination for multiprocess execution
  • Fixes run termination on Windows
  • dagit no longer prematurely returns control to terminal on Windows
  • raise_on_error is now available on the execute_solid test utility
  • check_dagster_type added as a utility to help test type checks on custom types
  • Improved support in the type system for Set and Tuple types
  • Allow composite solids with config mapping to expose an empty config schema
  • Simplified graphql API arguments to single-step re-execution to use retryRunId, stepKeys execution parameters instead of a reexecutionConfig input object
  • Fixes missing step-level stdout/stderr from dagster CLI

0.6.3

  • Adds a type_check parameter to PythonObjectType, as_dagster_type, and @as_dagster_type to enable custom type checks in place of default isinstance checks. See documentation here: https://dagster.readthedocs.io/en/latest/sections/learn/tutorial/types.html#custom-type-checks

  • Improved the type inference experience by automatically wrapping bare python types as dagster types.

  • Reworked our tutorial (now with more compelling/scary breakfast cereal examples) and public API documentation. See the new tutorial here: https://dagster.readthedocs.io/en/latest/sections/learn/tutorial/index.html

  • New solids explorer in Dagit allows you to browse and search for solids used across the repository.

    Solid Explorer Solid Explorer

  • Enabled solid dependency selection in the Dagit search filter.

    • To select a solid and its upstream dependencies, search +{solid_name}.
    • To select a solid and its downstream dependents, search {solid_name}+.
    • For both search +{solid_name}+.

    For example. In the Airline demo, searching +join_q2_data will get the following:

    Screenshot

  • Added a terminate button in Dagit to terminate an active run.

    Stop Button

  • Added an --output flag to dagster-graphql CLI.

  • Added confirmation step for dagster run wipe and dagster schedule wipe commands (Thanks @shahvineet98).

  • Fixed a wrong title in the dagster-snowflake library README (Thanks @Step2Web).

0.6.2

  • Changed composition functions @pipeline and @composite_solid to automatically give solids aliases with an incrementing integer suffix when there are conflicts. This removes to the need to manually alias solid definitions that are used multiple times.
  • Add dagster schedule wipe command to delete all schedules and remove all schedule cron jobs
  • execute_solid test util now works on composite solids.
  • Docs and example improvements: https://dagster.readthedocs.io/
  • Added --remote flag to dagster-graphql for querying remote dagit servers.
  • Fixed issue with duplicate run tag autocomplete suggestions in dagit (#1839)
  • Fixed Windows 10 / py3.6+ bug causing pipeline execution failures

0.6.1

  • Fixed an issue where Dagster public images tagged latest on Docker Hub were erroneously published with an older version of Dagster (#1814)
  • Fixed an issue where the most recent scheduled run was not displayed in dagit (#1815)
  • Fixed a bug with the dagster schedule start --start-all command (#1812)
  • Added a new scheduler command to restart a schedule: dagster schedule restart. Also added a flag to restart all running schedules: dagster schedule restart --restart-all-running.

0.6.0

New

This major release includes features for scheduling, operating, and executing pipelines that elevate dagit and dagster from a local development tool to a deployable service.

  • DagsterInstance introduced as centralized system to control run, event, compute log, and local intermediates storage.
  • A Scheduler abstraction has been introduced along side an initial implementation of SystemCronScheduler in dagster-cron.
  • dagster-aws has been extended with a CLI for deploying dagster to AWS. This can spin up a Dagit node and all the supporting infrastructure—security group, RDS PostgreSQL instance, etc.—without having to touch the AWS console, and for deploying your code to that instance.
  • Dagit
    • Runs: a completely overhauled Runs history page. Includes the ability to Retry, Cancel, and Delete pipeline runs from the new runs page.
    • Scheduler: a page for viewing and interacting with schedules.
    • Compute Logs: stdout and stderr are now viewable on a per execution step basis in each run. This is available in real time for currently executing runs and for historical runs.
    • A Reload button in the top right in dagit restarts the web-server process and updates the UI to reflect repo changes, including DAG structure, solid names, type names, etc. This replaces the previous file system watching behavior.

Breaking Changes

  • --log and --log-dir no longer supported as CLI args. Existing runs and events stored via these flags are no longer compatible with current storage.
  • raise_on_error moved from in process executor config to argument to arguments in python API methods such as execute_pipeline

0.5.9

  • Fixes an issue using custom types for fan-in dependencies with intermediate storage.

0.5.8

  • Fixes an issue running some Dagstermill notebooks on Windows.
  • Fixes a transitive dependency issue with Airflow.
  • Bugfixes, performance improvements, and better documentation.

0.5.7

  • Fixed an issue with specifying composite output mappings (#1674)
  • Added support for specifying Dask worker resources (#1679)
  • Fixed an issue with launching Dagit on Windows

0.5.6

  • Execution details are now configurable. The new top-level ExecutorDefinition and @executor APIs are used to define in-process, multiprocess, and Dask executors, and may be used by users to define new executors. Like loggers and storage, executors may be added to a ModeDefinition and may be selected and configured through the execution field in the environment dict or YAML, including through Dagit. Executors may no longer be configured through the RunConfig.
  • The API of dagster-dask has changed. Pipelines are now executed on Dask using the ordinary execute_pipeline API, and the Dask executor is configured through the environment. (See the dagster-dask README for details.)
  • Added the PresetDefinition.from_files API for constructing a preset from a list of environment files (replacing the old usage of this class). PresetDefinition may now be directly instantiated with an environment dict.
  • Added a prototype integration with dbt.
  • Added a prototype integration with Great Expectations.
  • Added a prototype integration with Papertrail.
  • Added the dagster-bash library.
  • Added the dagster-ssh library.
  • Added the dagster-sftp library.
  • Loosened the PyYAML compatibility requirement.
  • The dagster CLI no longer takes a --raise-on-error or --no-raise-on-error flag. Set this option in executor config.
  • Added a MarkdownMetadataEntryData class, so events yielded from client code may now render markdown in their metadata.
  • Bug fixes, documentation improvements, and improvements to error display.

0.5.5

  • Dagit now accepts parameters via environment variables prefixed with DAGIT_, e.g. DAGIT_PORT.
  • Fixes an issue with reexecuting Dagstermill notebooks from Dagit.
  • Bug fixes and display improvments in Dagit.

0.5.4

  • Reworked the display of structured log information and system events in Dagit, including support for structured rendering of client-provided event metadata.
  • Dagster now generates events when intermediates are written to filesystem and S3 storage, and these events are displayed in Dagit and exposed in the GraphQL API.
  • Whitespace display styling in Dagit can now be toggled on and off.
  • Bug fixes, display nits and improvements, and improvements to JS build process, including better display for some classes of errors in Dagit and improvements to the config editor in Dagit.

0.5.3

  • Pinned RxPY to 1.6.1 to avoid breaking changes in 3.0.0 (py3-only).
  • Most definition objects are now read-only, with getters corresponding to the previous properties.
  • The valueRepr field has been removed from ExecutionStepInputEvent and ExecutionStepOutputEvent.
  • Bug fixes and dagit UX improvements, including SQL highlighting and error handling.

0.5.2

  • Added top-level define_python_dagster_type function.
  • Renamed metadata_fn to typecheck_metadata_fn in all runtime type creation APIs.
  • Renamed result_value and result_values to output_value and output_values on SolidExecutionResult
  • Dagstermill: Reworked public API now contains only define_dagstermill_solid, get_context, yield_event, yield_result, DagstermillExecutionContext, DagstermillError, and DagstermillExecutionError. Please see the new guide for details.
  • Bug fixes, including failures for some dagster CLI invocations and incorrect handling of Airflow timestamps.