DagsterDocs

Execution

Executing pipelines

dagster.execute_pipeline(pipeline, run_config=None, mode=None, preset=None, tags=None, solid_selection=None, instance=None, raise_on_error=True)[source]

Execute a pipeline synchronously.

Users will typically call this API when testing pipeline execution, or running standalone scripts.

Parameters
  • pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.

  • run_config (Optional[dict]) – The configuration that parametrizes this run, as a dict.

  • mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.

  • preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.

  • tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.

  • instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

  • raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur. Defaults to True, since this is the most useful behavior in test.

  • solid_selection (Optional[List[str]]) –

    A list of solid selection queries (including single solid names) to execute. For example:

    • ['some_solid']: selects some_solid itself.

    • ['*some_solid']: select some_solid and all its ancestors (upstream dependencies).

    • ['*some_solid+++']: select some_solid, all its ancestors, and its descendants (downstream dependencies) within 3 levels down.

    • ['*some_solid', 'other_solid_a', 'other_solid_b+']: select some_solid and all its ancestors, other_solid_a itself, and other_solid_b and its direct child solids.

Returns

The result of pipeline execution.

Return type

PipelineExecutionResult

For the asynchronous version, see execute_pipeline_iterator().

dagster.execute_pipeline_iterator(pipeline, run_config=None, mode=None, preset=None, tags=None, solid_selection=None, instance=None)[source]

Execute a pipeline iteratively.

Rather than package up the result of running a pipeline into a single object, like execute_pipeline(), this function yields the stream of events resulting from pipeline execution.

This is intended to allow the caller to handle these events on a streaming basis in whatever way is appropriate.

Parameters
  • pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.

  • run_config (Optional[dict]) – The configuration that parametrizes this run, as a dict.

  • mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.

  • preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.

  • tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.

  • solid_selection (Optional[List[str]]) –

    A list of solid selection queries (including single solid names) to execute. For example:

    • ['some_solid']: selects some_solid itself.

    • ['*some_solid']: select some_solid and all its ancestors (upstream dependencies).

    • ['*some_solid+++']: select some_solid, all its ancestors, and its descendants (downstream dependencies) within 3 levels down.

    • ['*some_solid', 'other_solid_a', 'other_solid_b+']: select some_solid and all its ancestors, other_solid_a itself, and other_solid_b and its direct child solids.

  • instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

Returns

The stream of events resulting from pipeline execution.

Return type

Iterator[DagsterEvent]

Re-executing pipelines

dagster.reexecute_pipeline(pipeline, parent_run_id, run_config=None, step_selection=None, mode=None, preset=None, tags=None, instance=None, raise_on_error=True)[source]

Reexecute an existing pipeline run.

Users will typically call this API when testing pipeline reexecution, or running standalone scripts.

Parameters
  • pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.

  • parent_run_id (str) – The id of the previous run to reexecute. The run must exist in the instance.

  • run_config (Optional[dict]) – The configuration that parametrizes this run, as a dict.

  • solid_selection (Optional[List[str]]) –

    A list of solid selection queries (including single solid names) to execute. For example:

    • ['some_solid']: selects some_solid itself.

    • ['*some_solid']: select some_solid and all its ancestors (upstream dependencies).

    • ['*some_solid+++']: select some_solid, all its ancestors, and its descendants (downstream dependencies) within 3 levels down.

    • ['*some_solid', 'other_solid_a', 'other_solid_b+']: select some_solid and all its ancestors, other_solid_a itself, and other_solid_b and its direct child solids.

  • mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.

  • preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.

  • tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.

  • instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

  • raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur. Defaults to True, since this is the most useful behavior in test.

Returns

The result of pipeline execution.

Return type

PipelineExecutionResult

For the asynchronous version, see reexecute_pipeline_iterator().

dagster.reexecute_pipeline_iterator(pipeline, parent_run_id, run_config=None, step_selection=None, mode=None, preset=None, tags=None, instance=None)[source]

Reexecute a pipeline iteratively.

Rather than package up the result of running a pipeline into a single object, like reexecute_pipeline(), this function yields the stream of events resulting from pipeline reexecution.

This is intended to allow the caller to handle these events on a streaming basis in whatever way is appropriate.

Parameters
  • pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.

  • parent_run_id (str) – The id of the previous run to reexecute. The run must exist in the instance.

  • run_config (Optional[dict]) – The configuration that parametrizes this run, as a dict.

  • solid_selection (Optional[List[str]]) –

    A list of solid selection queries (including single solid names) to execute. For example:

    • ['some_solid']: selects some_solid itself.

    • ['*some_solid']: select some_solid and all its ancestors (upstream dependencies).

    • ['*some_solid+++']: select some_solid, all its ancestors, and its descendants (downstream dependencies) within 3 levels down.

    • ['*some_solid', 'other_solid_a', 'other_solid_b+']: select some_solid and all its ancestors, other_solid_a itself, and other_solid_b and its direct child solids.

  • mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.

  • preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.

  • tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.

  • instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

Returns

The stream of events resulting from pipeline reexecution.

Return type

Iterator[DagsterEvent]

Executing solids

dagster.execute_solid(solid_def, mode_def=None, input_values=None, tags=None, run_config=None, raise_on_error=True)[source]

Execute a single solid in an ephemeral pipeline.

Intended to support unit tests. Input values may be passed directly, and no pipeline need be specified – an ephemeral pipeline will be constructed.

Parameters
  • solid_def (SolidDefinition) – The solid to execute.

  • mode_def (Optional[ModeDefinition]) – The mode within which to execute the solid. Use this if, e.g., custom resources, loggers, or executors are desired.

  • input_values (Optional[Dict[str, Any]]) – A dict of input names to input values, used to pass inputs to the solid directly. You may also use the run_config to configure any inputs that are configurable.

  • tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.

  • run_config (Optional[dict]) – The configuration that parameterized this execution, as a dict.

  • raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur. Defaults to True, since this is the most useful behavior in test.

Returns

The result of executing the solid.

Return type

Union[CompositeSolidExecutionResult, SolidExecutionResult]

dagster.execute_solid_within_pipeline(pipeline_def, solid_name, inputs=None, run_config=None, mode=None, preset=None, tags=None, instance=None)[source]

Execute a single solid within an existing pipeline.

Intended to support tests. Input values may be passed directly.

Parameters
  • pipeline_def (PipelineDefinition) – The pipeline within which to execute the solid.

  • solid_name (str) – The name of the solid, or the aliased solid, to execute.

  • inputs (Optional[Dict[str, Any]]) – A dict of input names to input values, used to pass input values to the solid directly. You may also use the run_config to configure any inputs that are configurable.

  • run_config (Optional[dict]) – The configuration that parameterized this execution, as a dict.

  • mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.

  • preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.

  • tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.

  • instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

Returns

The result of executing the solid.

Return type

Union[CompositeSolidExecutionResult, SolidExecutionResult]

dagster.execute_solids_within_pipeline(pipeline_def, solid_names, inputs=None, run_config=None, mode=None, preset=None, tags=None, instance=None)[source]

Execute a set of solids within an existing pipeline.

Intended to support tests. Input values may be passed directly.

Parameters
  • pipeline_def (PipelineDefinition) – The pipeline within which to execute the solid.

  • solid_names (FrozenSet[str]) – A set of the solid names, or the aliased solids, to execute.

  • inputs (Optional[Dict[str, Dict[str, Any]]]) – A dict keyed on solid names, whose values are dicts of input names to input values, used to pass input values to the solids directly. You may also use the run_config to configure any inputs that are configurable.

  • run_config (Optional[dict]) – The configuration that parameterized this execution, as a dict.

  • mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.

  • preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.

  • tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.

  • instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

Returns

The results of executing the solids, keyed by solid name.

Return type

Dict[str, Union[CompositeSolidExecutionResult, SolidExecutionResult]]

Execution context

class dagster.SolidExecutionContext(step_execution_context)[source]

The context object that can be made available as the first argument to a solid’s compute function.

The context object provides system information such as resources, config, and logging to a solid’s compute function. Users should not instantiate this object directly.

Example:

@solid
def hello_world(context: SolidExecutionContext):
    context.log.info("Hello, world!")
get_tag(key)[source]

Get a logging tag.

Parameters

key (tag) – The tag to get.

Returns

The value of the tag.

Return type

str

has_tag(key)[source]

Check if a logging tag is set.

Parameters

key (str) – The tag to check.

Returns

Whether the tag is set.

Return type

bool

property instance

The current Dagster instance

Type

DagsterInstance

property log

The log manager available in the execution context.

Type

DagsterLogManager

property mode_def

The mode of the current execution.

Type

ModeDefinition

property pdb

Gives access to pdb debugging from within the solid.

Example:

@solid
def debug_solid(context):
    context.pdb.set_trace()
Type

dagster.utils.forked_pdb.ForkedPdb

property pipeline_def

The currently executing pipeline.

Type

PipelineDefinition

property pipeline_name

The name of the currently executing pipeline.

Type

str

property pipeline_run

The current pipeline run

Type

PipelineRun

property resources

The currently available resources.

Type

Resources

property retry_number

Which retry attempt is currently executing i.e. 0 for initial attempt, 1 for first retry, etc.

property run_config

The run config for the current execution.

Type

dict

property run_id

The id of the current execution’s run.

Type

str

property solid_config

The parsed config specific to this solid.

property solid_def

The current solid definition.

Type

SolidDefinition

property step_launcher

The current step launcher, if any.

Type

Optional[StepLauncher]

dagster.build_solid_context(resources=None, config=None, instance=None)[source]

Builds solid execution context from provided parameters.

build_solid_context can be used as either a function or context manager. If there is a provided resource that is a context manager, then build_solid_context must be used as a context manager. This function can be used to provide the context argument when directly invoking a solid.

Parameters
  • resources (Optional[Dict[str, Any]]) – The resources to provide to the context. These can be either values or resource definitions.

  • config (Optional[Any]) – The solid config to provide to the context.

  • instance (Optional[DagsterInstance]) – The dagster instance configured for the context. Defaults to DagsterInstance.ephemeral().

Examples

context = build_solid_context()
solid_to_invoke(context)

with build_solid_context(resources={"foo": context_manager_resource}) as context:
    solid_to_invoke(context)

Validating Execution

dagster.validate_run_config(pipeline_def, run_config=None, mode=None)[source]

Function to validate a provided run config blob against a given pipeline and mode.

If validation is successful, this function will return a dictionary representation of the validated config actually used during execution.

Parameters
  • pipeline_def (PipelineDefinition) – The pipeline definition to validate run config against

  • run_config (Optional[Dict[str, Any]]) – The run config to validate

  • mode (str) – The mode of the pipeline to validate against (different modes may require different config)

Returns

A dictionary representation of the validated config.

Return type

Dict[str, Any]

Reconstructable pipelines

class dagster.reconstructable(target)[source]

Create a ReconstructablePipeline from a function that returns a PipelineDefinition, or a function decorated with @pipeline

When your pipeline must cross process boundaries, e.g., for execution on multiple nodes or in different systems (like dagstermill), Dagster must know how to reconstruct the pipeline on the other side of the process boundary.

This function implements a very conservative strategy for reconstructing pipelines, so that its behavior is easy to predict, but as a consequence it is not able to reconstruct certain kinds of pipelines, such as those defined by lambdas, in nested scopes (e.g., dynamically within a method call), or in interactive environments such as the Python REPL or Jupyter notebooks.

If you need to reconstruct pipelines constructed in these ways, you should use build_reconstructable_pipeline() instead, which allows you to specify your own strategy for reconstructing a pipeline.

Examples:

from dagster import PipelineDefinition, pipeline, reconstructable

@pipeline
def foo_pipeline():
    ...

reconstructable_foo_pipeline = reconstructable(foo_pipeline)


def make_bar_pipeline():
    return PipelineDefinition(...)

reconstructable_bar_pipeline = reconstructable(bar_pipeline)
dagster.core.definitions.reconstructable.build_reconstructable_pipeline(reconstructor_module_name, reconstructor_function_name, reconstructable_args=None, reconstructable_kwargs=None)[source]

Create a dagster.core.definitions.reconstructable.ReconstructablePipeline.

When your pipeline must cross process boundaries, e.g., for execution on multiple nodes or in different systems (like dagstermill), Dagster must know how to reconstruct the pipeline on the other side of the process boundary.

This function allows you to use the strategy of your choice for reconstructing pipelines, so that you can reconstruct certain kinds of pipelines that are not supported by reconstructable(), such as those defined by lambdas, in nested scopes (e.g., dynamically within a method call), or in interactive environments such as the Python REPL or Jupyter notebooks.

If you need to reconstruct pipelines constructed in these ways, use this function instead of reconstructable().

Parameters
  • reconstructor_module_name (str) – The name of the module containing the function to use to reconstruct the pipeline.

  • reconstructor_function_name (str) – The name of the function to use to reconstruct the pipeline.

  • reconstructable_args (Tuple) – Args to the function to use to reconstruct the pipeline. Values of the tuple must be JSON serializable.

  • reconstructable_kwargs (Dict[str, Any]) – Kwargs to the function to use to reconstruct the pipeline. Values of the dict must be JSON serializable.

Examples:

# module: mymodule

from dagster import PipelineDefinition, pipeline, build_reconstructable_pipeline

class PipelineFactory:
    def make_pipeline(*args, **kwargs):

        @pipeline
        def _pipeline(...):
            ...

        return _pipeline

def reconstruct_pipeline(*args):
    factory = PipelineFactory()
    return factory.make_pipeline(*args)

factory = PipelineFactory()

foo_pipeline_args = (...,...)

foo_pipeline_kwargs = {...:...}

foo_pipeline = factory.make_pipeline(*foo_pipeline_args, **foo_pipeline_kwargs)

reconstructable_foo_pipeline = build_reconstructable_pipeline(
    'mymodule',
    'reconstruct_pipeline',
    foo_pipeline_args,
    foo_pipeline_kwargs,
)

Pipeline and solid results

class dagster.PipelineExecutionResult(pipeline_def, run_id, event_list, reconstruct_context, output_capture=None)[source]

The result of executing a pipeline.

Returned by execute_pipeline(). Users should not instantiate this class directly.

output_for_solid(handle_str, output_name='result')

Get the output of a solid by its solid handle string and output name.

Parameters
  • handle_str (str) – The string handle for the solid.

  • output_name (str) – Optional. The name of the output, default to DEFAULT_OUTPUT.

Returns

The output value for the handle and output_name.

result_for_handle(handle)

Get the result of a solid by its solid handle.

This allows indexing into top-level solids to retrieve the results of children of composite solids.

Parameters

handle (Union[str,SolidHandle]) – The handle for the solid.

Returns

The result of the given solid.

Return type

Union[CompositeSolidExecutionResult, SolidExecutionResult]

result_for_solid(name)

Get the result of a top level solid.

Parameters

name (str) – The name of the top-level solid or aliased solid for which to retrieve the result.

Returns

The result of the solid execution within the pipeline.

Return type

Union[CompositeSolidExecutionResult, SolidExecutionResult]

property solid_result_list

The results for each top level solid.

Type

List[Union[CompositeSolidExecutionResult, SolidExecutionResult]]

property step_event_list

List[DagsterEvent] The full list of events generated by steps in the execution.

Excludes events generated by the pipeline lifecycle, e.g., PIPELINE_START.

property success

Whether all steps in the execution were successful.

Type

bool

class dagster.SolidExecutionResult(solid, step_events_by_kind, reconstruct_context, pipeline_def, output_capture=None)[source]

Execution result for a leaf solid in a pipeline.

Users should not instantiate this class.

property compute_input_event_dict

All events of type STEP_INPUT, keyed by input name.

Type

Dict[str, DagsterEvent]

property compute_output_events_dict

All events of type STEP_OUTPUT, keyed by output name

Type

Dict[str, List[DagsterEvent]]

property compute_step_events

All events generated by execution of the solid compute function.

Type

List[DagsterEvent]

property compute_step_failure_event

The STEP_FAILURE event, throws if it did not fail.

Type

DagsterEvent

property expectation_events_during_compute

All events of type STEP_EXPECTATION_RESULT.

Type

List[DagsterEvent]

property expectation_results_during_compute

All expectation results yielded by the solid

Type

List[ExpectationResult]

property failure_data

Any data corresponding to this step’s failure, if it failed.

Type

Union[None, StepFailureData]

get_output_event_for_compute(output_name='result')[source]

The STEP_OUTPUT event for the given output name.

Throws if not present.

Parameters

output_name (Optional[str]) – The name of the output. (default: ‘result’)

Returns

The corresponding event.

Return type

DagsterEvent

get_output_events_for_compute(output_name='result')[source]

The STEP_OUTPUT event for the given output name.

Throws if not present.

Parameters

output_name (Optional[str]) – The name of the output. (default: ‘result’)

Returns

The corresponding events.

Return type

List[DagsterEvent]

get_step_success_event()[source]

DagsterEvent: The STEP_SUCCESS event, throws if not present.

property input_events_during_compute

All events of type STEP_INPUT.

Type

List[DagsterEvent]

property materialization_events_during_compute

All events of type ASSET_MATERIALIZATION.

Type

List[DagsterEvent]

property materializations_during_compute

All materializations yielded by the solid.

Type

List[Materialization]

property output_events_during_compute

All events of type STEP_OUTPUT.

Type

List[DagsterEvent]

output_value(output_name='result')[source]

Get a computed output value.

Note that calling this method will reconstruct the pipeline context (including, e.g., resources) to retrieve materialized output values.

Parameters

output_name (str) – The output name for which to retrieve the value. (default: ‘result’)

Returns

None if execution did not succeed, the output value

in the normal case, and a dict of mapping keys to values in the mapped case.

Return type

Union[None, Any, Dict[str, Any]]

property output_values

The computed output values.

Returns None if execution did not succeed.

Returns a dictionary where keys are output names and the values are:
  • the output values in the normal case

  • a dictionary from mapping key to corresponding value in the mapped case

Note that accessing this property will reconstruct the pipeline context (including, e.g., resources) to retrieve materialized output values.

Type

Union[None, Dict[str, Union[Any, Dict[str, Any]]]

property retry_attempts

Number of times this step retried

property skipped

Whether solid execution was skipped.

Type

bool

property success

Whether solid execution was successful.

Type

bool

class dagster.CompositeSolidExecutionResult(solid, event_list, step_events_by_kind, reconstruct_context, pipeline_def, handle=None, output_capture=None)[source]

Execution result for a composite solid in a pipeline.

Users should not instantiate this class directly.

output_for_solid(handle_str, output_name='result')

Get the output of a solid by its solid handle string and output name.

Parameters
  • handle_str (str) – The string handle for the solid.

  • output_name (str) – Optional. The name of the output, default to DEFAULT_OUTPUT.

Returns

The output value for the handle and output_name.

result_for_handle(handle)

Get the result of a solid by its solid handle.

This allows indexing into top-level solids to retrieve the results of children of composite solids.

Parameters

handle (Union[str,SolidHandle]) – The handle for the solid.

Returns

The result of the given solid.

Return type

Union[CompositeSolidExecutionResult, SolidExecutionResult]

result_for_solid(name)

Get the result of a top level solid.

Parameters

name (str) – The name of the top-level solid or aliased solid for which to retrieve the result.

Returns

The result of the solid execution within the pipeline.

Return type

Union[CompositeSolidExecutionResult, SolidExecutionResult]

property solid_result_list

The results for each top level solid.

Type

List[Union[CompositeSolidExecutionResult, SolidExecutionResult]]

property step_event_list

List[DagsterEvent] The full list of events generated by steps in the execution.

Excludes events generated by the pipeline lifecycle, e.g., PIPELINE_START.

property success

Whether all steps in the execution were successful.

Type

bool

class dagster.DagsterEvent(event_type_value, pipeline_name, step_handle=None, solid_handle=None, step_kind_value=None, logging_tags=None, event_specific_data=None, message=None, pid=None, step_key=None)[source]

Events yielded by solid and pipeline execution.

Users should not instantiate this class.

event_type_value

Value for a DagsterEventType.

Type

str

pipeline_name
Type

str

step_key
Type

str

solid_handle
Type

SolidHandle

step_kind_value

Value for a StepKind.

Type

str

logging_tags
Type

Dict[str, str]

event_specific_data

Type must correspond to event_type_value.

Type

Any

message
Type

str

pid
Type

int

step_key

DEPRECATED

Type

Optional[str]

property event_type

The type of this event.

Type

DagsterEventType

class dagster.DagsterEventType(value)[source]

The types of events that may be yielded by solid and pipeline execution.

ALERT_START = 'ALERT_START'
ALERT_SUCCESS = 'ALERT_SUCCESS'
ASSET_MATERIALIZATION = 'ASSET_MATERIALIZATION'
ASSET_STORE_OPERATION = 'ASSET_STORE_OPERATION'
ENGINE_EVENT = 'ENGINE_EVENT'
HANDLED_OUTPUT = 'HANDLED_OUTPUT'
HOOK_COMPLETED = 'HOOK_COMPLETED'
HOOK_ERRORED = 'HOOK_ERRORED'
HOOK_SKIPPED = 'HOOK_SKIPPED'
LOADED_INPUT = 'LOADED_INPUT'
LOGS_CAPTURED = 'LOGS_CAPTURED'
OBJECT_STORE_OPERATION = 'OBJECT_STORE_OPERATION'
PIPELINE_CANCELED = 'PIPELINE_CANCELED'
PIPELINE_CANCELING = 'PIPELINE_CANCELING'
PIPELINE_DEQUEUED = 'PIPELINE_DEQUEUED'
PIPELINE_ENQUEUED = 'PIPELINE_ENQUEUED'
PIPELINE_FAILURE = 'PIPELINE_FAILURE'
PIPELINE_START = 'PIPELINE_START'
PIPELINE_STARTING = 'PIPELINE_STARTING'
PIPELINE_SUCCESS = 'PIPELINE_SUCCESS'
STEP_EXPECTATION_RESULT = 'STEP_EXPECTATION_RESULT'
STEP_FAILURE = 'STEP_FAILURE'
STEP_INPUT = 'STEP_INPUT'
STEP_OUTPUT = 'STEP_OUTPUT'
STEP_RESTARTED = 'STEP_RESTARTED'
STEP_SKIPPED = 'STEP_SKIPPED'
STEP_START = 'STEP_START'
STEP_SUCCESS = 'STEP_SUCCESS'
STEP_UP_FOR_RETRY = 'STEP_UP_FOR_RETRY'

Pipeline configuration

Run Config Schema

The run_config used by execute_pipeline() and execute_pipeline_iterator() has the following schema:

{
  # configuration for execution, required if executors require config
  execution: {
    # the name of one, and only one available executor, typically 'in_process' or 'multiprocess'
    __executor_name__: {
      # executor-specific config, if required or permitted
      config: {
        ...
      }
    }
  },

  # configuration for loggers, required if loggers require config
  loggers: {
    # the name of an available logger
    __logger_name__: {
      # logger-specific config, if required or permitted
      config: {
        ...
      }
    },
    ...
  },

  # configuration for resources, required if resources require config
  resources: {
    # the name of a resource
    __resource_name__: {
      # resource-specific config, if required or permitted
      config: {
        ...
      }
    },
    ...
  },

  # configuration for solids, required if solids require config
  solids: {

    # these keys align with the names of the solids, or their alias in this pipeline
    __solid_name__: {

      # pass any data that was defined via config_field
      config: ...,

      # configurably specify input values, keyed by input name
      inputs: {
        __input_name__: {
          # if an dagster_type_loader is specified, that schema must be satisfied here;
          # scalar, built-in types will generally allow their values to be specified directly:
          value: ...
        }
      },

      # configurably materialize output values
      outputs: {
        __output_name__: {
          # if an dagster_type_materializer is specified, that schema must be satisfied
          # here; pickleable types will generally allow output as follows:
          pickle: {
            path: String
          }
        }
      }
    }
  },

  # optionally use an available system storage for intermediates etc.
  intermediate_storage: {
    # the name of one, and only one available system storage, typically 'filesystem' or
    # 'in_memory'
    __storage_name__: {
      config: {
        ...
      }
    }
  }
}

Intermediate Storage

dagster.io_manager_from_intermediate_storage(intermediate_storage_def)[source]

Define an IOManagerDefinition from an existing IntermediateStorageDefinition.

This method is used to adapt an existing user-defined intermediate storage to a IO manager resource, for example:

my_io_manager_def = io_manager_from_intermediate_storage(my_intermediate_storage_def)

@pipeline(mode_defs=[ModeDefinition(resource_defs={"io_manager": my_io_manager_def})])
def my_pipeline():
    ...
Parameters

intermediate_storage_def (IntermediateStorageDefinition) – The intermediate storage definition to be converted to an IO manager definition.

Returns

IOManagerDefinition

dagster.mem_intermediate_storage IntermediateStorageDefinition[source]

The default in-memory intermediate storage.

In-memory intermediate storage is the default on any pipeline run that does not configure any custom intermediate storage.

Keep in mind when using this storage that intermediates will not be persisted after the pipeline run ends. Use a persistent intermediate storage like fs_intermediate_storage() to persist intermediates and take advantage of advanced features like pipeline re-execution.

dagster.fs_intermediate_storage IntermediateStorageDefinition[source]

The default filesystem intermediate storage.

Filesystem system storage is available by default on any ModeDefinition that does not provide custom system storages. To select it, include a fragment such as the following in config:

intermediate_storage:
  filesystem:
    base_dir: '/path/to/dir/'

You may omit the base_dir config value, in which case the filesystem storage will use the DagsterInstance-provided default.

dagster.default_intermediate_storage_defs List[IntermediateStorageDefinition]

list() -> new empty list list(iterable) -> new list initialized from iterable’s items

The default intermediate storages available on any ModeDefinition that does not provide custom intermediate storages. These are currently [mem_intermediate_storage, fs_intermediate_storage].

Executors

dagster.in_process_executor ExecutorDefinition[source]

The default in-process executor.

In most Dagster environments, this will be the default executor. It is available by default on any ModeDefinition that does not provide custom executors. To select it explicitly, include the following top-level fragment in config:

execution:
  in_process:

Execution priority can be configured using the dagster/priority tag via solid metadata, where the higher the number the higher the priority. 0 is the default and both positive and negative numbers can be used.

dagster.multiprocess_executor ExecutorDefinition[source]

The default multiprocess executor.

This simple multiprocess executor is available by default on any ModeDefinition that does not provide custom executors. To select the multiprocess executor, include a fragment such as the following in your config:

execution:
  multiprocess:
    config:
      max_concurrent: 4

The max_concurrent arg is optional and tells the execution engine how many processes may run concurrently. By default, or if you set max_concurrent to be 0, this is the return value of multiprocessing.cpu_count().

Execution priority can be configured using the dagster/priority tag via solid metadata, where the higher the number the higher the priority. 0 is the default and both positive and negative numbers can be used.

dagster.default_executors List[ExecutorDefinition]

list() -> new empty list list(iterable) -> new list initialized from iterable’s items

The default executors available on any ModeDefinition that does not provide custom executors. These are currently [in_process_executor, multiprocess_executor].

Contexts

class dagster.TypeCheckContext(run_id, log_manager, scoped_resources_builder, dagster_type)[source]

The context object available to a type check function on a DagsterType.

log

Centralized log dispatch from user code.

Type

DagsterLogManager

resources

An object whose attributes contain the resources available to this solid.

Type

Any

run_id

The id of this pipeline run.

Type

str