Execution¶

Executing pipelines¶

dagster.execute_pipeline(pipeline, run_config=None, mode=None, preset=None, tags=None, solid_selection=None, instance=None, raise_on_error=True)[source]¶

Execute a pipeline synchronously.

Users will typically call this API when testing pipeline execution, or running standalone scripts.

Parameters

pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.
run_config (Optional[dict]) – The configuration that parametrizes this run, as a dict.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.
raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur. Defaults to True, since this is the most useful behavior in test.
solid_selection (Optional[List[str]]) –
A list of solid selection queries (including single solid names) to execute. For example:
- ['some_solid']: selects some_solid itself.
- ['*some_solid']: select some_solid and all its ancestors (upstream dependencies).
- ['*some_solid+++']: select some_solid, all its ancestors, and its descendants (downstream dependencies) within 3 levels down.
- ['*some_solid', 'other_solid_a', 'other_solid_b+']: select some_solid and all its ancestors, other_solid_a itself, and other_solid_b and its direct child solids.

Returns

The result of pipeline execution.

Return type

PipelineExecutionResult

For the asynchronous version, see execute_pipeline_iterator().

dagster.execute_pipeline_iterator(pipeline, run_config=None, mode=None, preset=None, tags=None, solid_selection=None, instance=None)[source]¶

Execute a pipeline iteratively.

Rather than package up the result of running a pipeline into a single object, like execute_pipeline(), this function yields the stream of events resulting from pipeline execution.

This is intended to allow the caller to handle these events on a streaming basis in whatever way is appropriate.

Parameters

pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.
run_config (Optional[dict]) – The configuration that parametrizes this run, as a dict.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
solid_selection (Optional[List[str]]) –
A list of solid selection queries (including single solid names) to execute. For example:
- ['some_solid']: selects some_solid itself.
- ['*some_solid']: select some_solid and all its ancestors (upstream dependencies).
- ['*some_solid+++']: select some_solid, all its ancestors, and its descendants (downstream dependencies) within 3 levels down.
- ['*some_solid', 'other_solid_a', 'other_solid_b+']: select some_solid and all its ancestors, other_solid_a itself, and other_solid_b and its direct child solids.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

Returns

The stream of events resulting from pipeline execution.

Return type

Iterator[DagsterEvent]

Re-executing pipelines¶

dagster.reexecute_pipeline(pipeline, parent_run_id, run_config=None, step_selection=None, mode=None, preset=None, tags=None, instance=None, raise_on_error=True)[source]¶

Reexecute an existing pipeline run.

Users will typically call this API when testing pipeline reexecution, or running standalone scripts.

Parameters

pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.
parent_run_id (str) – The id of the previous run to reexecute. The run must exist in the instance.
run_config (Optional[dict]) – The configuration that parametrizes this run, as a dict.
solid_selection (Optional[List[str]]) –
A list of solid selection queries (including single solid names) to execute. For example:
- ['some_solid']: selects some_solid itself.
- ['*some_solid']: select some_solid and all its ancestors (upstream dependencies).
- ['*some_solid+++']: select some_solid, all its ancestors, and its descendants (downstream dependencies) within 3 levels down.
- ['*some_solid', 'other_solid_a', 'other_solid_b+']: select some_solid and all its ancestors, other_solid_a itself, and other_solid_b and its direct child solids.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.
raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur. Defaults to True, since this is the most useful behavior in test.

Returns

The result of pipeline execution.

Return type

PipelineExecutionResult

For the asynchronous version, see reexecute_pipeline_iterator().

dagster.reexecute_pipeline_iterator(pipeline, parent_run_id, run_config=None, step_selection=None, mode=None, preset=None, tags=None, instance=None)[source]¶

Reexecute a pipeline iteratively.

Rather than package up the result of running a pipeline into a single object, like reexecute_pipeline(), this function yields the stream of events resulting from pipeline reexecution.

This is intended to allow the caller to handle these events on a streaming basis in whatever way is appropriate.

Parameters

pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.
parent_run_id (str) – The id of the previous run to reexecute. The run must exist in the instance.
run_config (Optional[dict]) – The configuration that parametrizes this run, as a dict.
solid_selection (Optional[List[str]]) –
A list of solid selection queries (including single solid names) to execute. For example:
- ['some_solid']: selects some_solid itself.
- ['*some_solid']: select some_solid and all its ancestors (upstream dependencies).
- ['*some_solid+++']: select some_solid, all its ancestors, and its descendants (downstream dependencies) within 3 levels down.
- ['*some_solid', 'other_solid_a', 'other_solid_b+']: select some_solid and all its ancestors, other_solid_a itself, and other_solid_b and its direct child solids.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

Returns

The stream of events resulting from pipeline reexecution.

Return type

Iterator[DagsterEvent]

Executing solids¶

dagster.execute_solid(solid_def, mode_def=None, input_values=None, tags=None, run_config=None, raise_on_error=True)[source]¶

Execute a single solid in an ephemeral pipeline.

Intended to support unit tests. Input values may be passed directly, and no pipeline need be specified – an ephemeral pipeline will be constructed.

Parameters

solid_def (SolidDefinition) – The solid to execute.
mode_def (Optional[ModeDefinition]) – The mode within which to execute the solid. Use this if, e.g., custom resources, loggers, or executors are desired.
input_values (Optional[Dict[str, Any]]) – A dict of input names to input values, used to pass inputs to the solid directly. You may also use the run_config to configure any inputs that are configurable.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
run_config (Optional[dict]) – The configuration that parameterized this execution, as a dict.
raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur. Defaults to True, since this is the most useful behavior in test.

Returns

The result of executing the solid.

Return type

Union[CompositeSolidExecutionResult, SolidExecutionResult]

dagster.execute_solid_within_pipeline(pipeline_def, solid_name, inputs=None, run_config=None, mode=None, preset=None, tags=None, instance=None)[source]¶

Execute a single solid within an existing pipeline.

Intended to support tests. Input values may be passed directly.

Parameters

pipeline_def (PipelineDefinition) – The pipeline within which to execute the solid.
solid_name (str) – The name of the solid, or the aliased solid, to execute.
inputs (Optional[Dict[str, Any]]) – A dict of input names to input values, used to pass input values to the solid directly. You may also use the run_config to configure any inputs that are configurable.
run_config (Optional[dict]) – The configuration that parameterized this execution, as a dict.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

Returns

The result of executing the solid.

Return type

Union[CompositeSolidExecutionResult, SolidExecutionResult]

dagster.execute_solids_within_pipeline(pipeline_def, solid_names, inputs=None, run_config=None, mode=None, preset=None, tags=None, instance=None)[source]¶

Execute a set of solids within an existing pipeline.

Intended to support tests. Input values may be passed directly.

Parameters

pipeline_def (PipelineDefinition) – The pipeline within which to execute the solid.
solid_names (FrozenSet[str]) – A set of the solid names, or the aliased solids, to execute.
inputs (Optional[Dict[str, Dict[str, Any]]]) – A dict keyed on solid names, whose values are dicts of input names to input values, used to pass input values to the solids directly. You may also use the run_config to configure any inputs that are configurable.
run_config (Optional[dict]) – The configuration that parameterized this execution, as a dict.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode and preset.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both mode and preset.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None, an ephemeral instance will be used, and no artifacts will be persisted from the run.

Returns

The results of executing the solids, keyed by solid name.

Return type

Dict[str, Union[CompositeSolidExecutionResult, SolidExecutionResult]]

Execution context¶

class dagster.SolidExecutionContext(step_execution_context)[source]¶

The context object that can be made available as the first argument to a solid’s compute function.

The context object provides system information such as resources, config, and logging to a solid’s compute function. Users should not instantiate this object directly.

Example:

@solid
def hello_world(context: SolidExecutionContext):
    context.log.info("Hello, world!")

get_tag(key)[source]¶

Get a logging tag.

Parameters: key (tag) – The tag to get.
Returns: The value of the tag.
Return type: str

has_tag(key)[source]¶

Check if a logging tag is set.

Parameters: key (str) – The tag to check.
Returns: Whether the tag is set.
Return type: bool

property instance¶

The current Dagster instance

Type: DagsterInstance

property log¶

The log manager available in the execution context.

Type: DagsterLogManager

property mode_def¶

The mode of the current execution.

Type: ModeDefinition

property pdb¶

Gives access to pdb debugging from within the solid.

Example:

@solid
def debug_solid(context):
    context.pdb.set_trace()

Type: dagster.utils.forked_pdb.ForkedPdb

property pipeline_def¶

The currently executing pipeline.

Type: PipelineDefinition

property pipeline_name¶

The name of the currently executing pipeline.

Type: str

property pipeline_run¶

The current pipeline run

Type: PipelineRun

property resources¶

The currently available resources.

Type: Resources

property retry_number¶: Which retry attempt is currently executing i.e. 0 for initial attempt, 1 for first retry, etc.

property run_config¶

The run config for the current execution.

Type: dict

property run_id¶

The id of the current execution’s run.

Type: str

property solid_config¶: The parsed config specific to this solid.

property solid_def¶

The current solid definition.

Type: SolidDefinition

property step_launcher¶

The current step launcher, if any.

Type: Optional[StepLauncher]

dagster.build_solid_context(resources=None, config=None, instance=None)[source]¶

Builds solid execution context from provided parameters.

build_solid_context can be used as either a function or context manager. If there is a provided resource that is a context manager, then build_solid_context must be used as a context manager. This function can be used to provide the context argument when directly invoking a solid.

Parameters

resources (Optional[Dict[str, Any]]) – The resources to provide to the context. These can be either values or resource definitions.
config (Optional[Any]) – The solid config to provide to the context.
instance (Optional[DagsterInstance]) – The dagster instance configured for the context. Defaults to DagsterInstance.ephemeral().

Examples

context = build_solid_context()
solid_to_invoke(context)

with build_solid_context(resources={"foo": context_manager_resource}) as context:
    solid_to_invoke(context)

Validating Execution¶

dagster.validate_run_config(pipeline_def, run_config=None, mode=None)[source]¶

Function to validate a provided run config blob against a given pipeline and mode.

If validation is successful, this function will return a dictionary representation of the validated config actually used during execution.

Parameters

pipeline_def (PipelineDefinition) – The pipeline definition to validate run config against
run_config (Optional[Dict[str, Any]]) – The run config to validate
mode (str) – The mode of the pipeline to validate against (different modes may require different config)

Returns

A dictionary representation of the validated config.

Return type

Dict[str, Any]

Reconstructable pipelines¶

class dagster.reconstructable(target)[source]¶

Create a ReconstructablePipeline from a function that returns a PipelineDefinition, or a function decorated with @pipeline

When your pipeline must cross process boundaries, e.g., for execution on multiple nodes or in different systems (like dagstermill), Dagster must know how to reconstruct the pipeline on the other side of the process boundary.

This function implements a very conservative strategy for reconstructing pipelines, so that its behavior is easy to predict, but as a consequence it is not able to reconstruct certain kinds of pipelines, such as those defined by lambdas, in nested scopes (e.g., dynamically within a method call), or in interactive environments such as the Python REPL or Jupyter notebooks.

If you need to reconstruct pipelines constructed in these ways, you should use build_reconstructable_pipeline() instead, which allows you to specify your own strategy for reconstructing a pipeline.

Examples:

from dagster import PipelineDefinition, pipeline, reconstructable

@pipeline
def foo_pipeline():
    ...

reconstructable_foo_pipeline = reconstructable(foo_pipeline)


def make_bar_pipeline():
    return PipelineDefinition(...)

reconstructable_bar_pipeline = reconstructable(bar_pipeline)

dagster.core.definitions.reconstructable.build_reconstructable_pipeline(reconstructor_module_name, reconstructor_function_name, reconstructable_args=None, reconstructable_kwargs=None)[source]¶

Create a dagster.core.definitions.reconstructable.ReconstructablePipeline.

This function allows you to use the strategy of your choice for reconstructing pipelines, so that you can reconstruct certain kinds of pipelines that are not supported by reconstructable(), such as those defined by lambdas, in nested scopes (e.g., dynamically within a method call), or in interactive environments such as the Python REPL or Jupyter notebooks.

If you need to reconstruct pipelines constructed in these ways, use this function instead of reconstructable().

Parameters

reconstructor_module_name (str) – The name of the module containing the function to use to reconstruct the pipeline.
reconstructor_function_name (str) – The name of the function to use to reconstruct the pipeline.
reconstructable_args (Tuple) – Args to the function to use to reconstruct the pipeline. Values of the tuple must be JSON serializable.
reconstructable_kwargs (Dict[str, Any]) – Kwargs to the function to use to reconstruct the pipeline. Values of the dict must be JSON serializable.

Examples:

# module: mymodule

from dagster import PipelineDefinition, pipeline, build_reconstructable_pipeline

class PipelineFactory:
    def make_pipeline(*args, **kwargs):

        @pipeline
        def _pipeline(...):
            ...

        return _pipeline

def reconstruct_pipeline(*args):
    factory = PipelineFactory()
    return factory.make_pipeline(*args)

factory = PipelineFactory()

foo_pipeline_args = (...,...)

foo_pipeline_kwargs = {...:...}

foo_pipeline = factory.make_pipeline(*foo_pipeline_args, **foo_pipeline_kwargs)

reconstructable_foo_pipeline = build_reconstructable_pipeline(
    'mymodule',
    'reconstruct_pipeline',
    foo_pipeline_args,
    foo_pipeline_kwargs,
)

Pipeline and solid results¶

class dagster.PipelineExecutionResult(pipeline_def, run_id, event_list, reconstruct_context, output_capture=None)[source]¶

The result of executing a pipeline.

Returned by execute_pipeline(). Users should not instantiate this class directly.

output_for_solid(handle_str, output_name='result')¶

Get the output of a solid by its solid handle string and output name.

Parameters

handle_str (str) – The string handle for the solid.
output_name (str) – Optional. The name of the output, default to DEFAULT_OUTPUT.

Returns

The output value for the handle and output_name.

result_for_handle(handle)¶

Get the result of a solid by its solid handle.

This allows indexing into top-level solids to retrieve the results of children of composite solids.

Parameters: handle (Union[str,SolidHandle]) – The handle for the solid.
Returns: The result of the given solid.
Return type: Union[CompositeSolidExecutionResult, SolidExecutionResult]

result_for_solid(name)¶

Get the result of a top level solid.

Parameters: name (str) – The name of the top-level solid or aliased solid for which to retrieve the result.
Returns: The result of the solid execution within the pipeline.
Return type: Union[CompositeSolidExecutionResult, SolidExecutionResult]

property solid_result_list¶

The results for each top level solid.

Type: List[Union[CompositeSolidExecutionResult, SolidExecutionResult]]

property step_event_list¶

List[DagsterEvent] The full list of events generated by steps in the execution.

Excludes events generated by the pipeline lifecycle, e.g., PIPELINE_START.

property success¶

Whether all steps in the execution were successful.

Type: bool

class dagster.SolidExecutionResult(solid, step_events_by_kind, reconstruct_context, pipeline_def, output_capture=None)[source]¶

Execution result for a leaf solid in a pipeline.

Users should not instantiate this class.

property compute_input_event_dict¶

All events of type STEP_INPUT, keyed by input name.

Type: Dict[str, DagsterEvent]

property compute_output_events_dict¶

All events of type STEP_OUTPUT, keyed by output name

Type: Dict[str, List[DagsterEvent]]

property compute_step_events¶

All events generated by execution of the solid compute function.

Type: List[DagsterEvent]

property compute_step_failure_event¶

The STEP_FAILURE event, throws if it did not fail.

Type: DagsterEvent

property expectation_events_during_compute¶

All events of type STEP_EXPECTATION_RESULT.

Type: List[DagsterEvent]

property expectation_results_during_compute¶

All expectation results yielded by the solid

Type: List[ExpectationResult]

property failure_data¶

Any data corresponding to this step’s failure, if it failed.

Type: Union[None, StepFailureData]

get_output_event_for_compute(output_name='result')[source]¶

The STEP_OUTPUT event for the given output name.

Throws if not present.

Parameters: output_name (Optional[str]) – The name of the output. (default: ‘result’)
Returns: The corresponding event.
Return type: DagsterEvent

get_output_events_for_compute(output_name='result')[source]¶

The STEP_OUTPUT event for the given output name.

Throws if not present.

Parameters: output_name (Optional[str]) – The name of the output. (default: ‘result’)
Returns: The corresponding events.
Return type: List[DagsterEvent]

get_step_success_event()[source]¶: DagsterEvent: The STEP_SUCCESS event, throws if not present.

property input_events_during_compute¶

All events of type STEP_INPUT.

Type: List[DagsterEvent]

property materialization_events_during_compute¶

All events of type ASSET_MATERIALIZATION.

Type: List[DagsterEvent]

property materializations_during_compute¶

All materializations yielded by the solid.

Type: List[Materialization]

property output_events_during_compute¶

All events of type STEP_OUTPUT.

Type: List[DagsterEvent]

output_value(output_name='result')[source]¶

Get a computed output value.

Note that calling this method will reconstruct the pipeline context (including, e.g., resources) to retrieve materialized output values.

Parameters

output_name (str) – The output name for which to retrieve the value. (default: ‘result’)

Returns

None if execution did not succeed, the output value: in the normal case, and a dict of mapping keys to values in the mapped case.

Return type

Union[None, Any, Dict[str, Any]]

property output_values¶

The computed output values.

Returns None if execution did not succeed.

Returns a dictionary where keys are output names and the values are:

the output values in the normal case
a dictionary from mapping key to corresponding value in the mapped case

Note that accessing this property will reconstruct the pipeline context (including, e.g., resources) to retrieve materialized output values.

Type: Union[None, Dict[str, Union[Any, Dict[str, Any]]]

property retry_attempts¶: Number of times this step retried

property skipped¶

Whether solid execution was skipped.

Type: bool

property success¶

Whether solid execution was successful.

Type: bool

class dagster.CompositeSolidExecutionResult(solid, event_list, step_events_by_kind, reconstruct_context, pipeline_def, handle=None, output_capture=None)[source]¶

Execution result for a composite solid in a pipeline.

Users should not instantiate this class directly.

output_for_solid(handle_str, output_name='result')¶

Get the output of a solid by its solid handle string and output name.

Parameters

handle_str (str) – The string handle for the solid.
output_name (str) – Optional. The name of the output, default to DEFAULT_OUTPUT.

Returns

The output value for the handle and output_name.

result_for_handle(handle)¶

Get the result of a solid by its solid handle.

This allows indexing into top-level solids to retrieve the results of children of composite solids.

Parameters: handle (Union[str,SolidHandle]) – The handle for the solid.
Returns: The result of the given solid.
Return type: Union[CompositeSolidExecutionResult, SolidExecutionResult]

result_for_solid(name)¶

Get the result of a top level solid.

Parameters: name (str) – The name of the top-level solid or aliased solid for which to retrieve the result.
Returns: The result of the solid execution within the pipeline.
Return type: Union[CompositeSolidExecutionResult, SolidExecutionResult]

property solid_result_list¶

The results for each top level solid.

Type: List[Union[CompositeSolidExecutionResult, SolidExecutionResult]]

property step_event_list¶

List[DagsterEvent] The full list of events generated by steps in the execution.

Excludes events generated by the pipeline lifecycle, e.g., PIPELINE_START.

property success¶

Whether all steps in the execution were successful.

Type: bool

class dagster.DagsterEvent(event_type_value, pipeline_name, step_handle=None, solid_handle=None, step_kind_value=None, logging_tags=None, event_specific_data=None, message=None, pid=None, step_key=None)[source]¶

Events yielded by solid and pipeline execution.

Users should not instantiate this class.

event_type_value¶

Value for a DagsterEventType.

Type: str

pipeline_name¶

Type: str

step_key¶

Type: str

solid_handle¶

Type: SolidHandle

step_kind_value¶

Value for a StepKind.

Type: str

logging_tags¶

Type: Dict[str, str]

event_specific_data¶

Type must correspond to event_type_value.

Type: Any

message¶

Type: str

pid¶

Type: int

step_key¶

DEPRECATED

Type: Optional[str]

property event_type¶

The type of this event.

Type: DagsterEventType

class dagster.DagsterEventType(value)[source]¶

The types of events that may be yielded by solid and pipeline execution.

ALERT_START = 'ALERT_START'¶

ALERT_SUCCESS = 'ALERT_SUCCESS'¶

ASSET_MATERIALIZATION = 'ASSET_MATERIALIZATION'¶

ASSET_STORE_OPERATION = 'ASSET_STORE_OPERATION'¶

ENGINE_EVENT = 'ENGINE_EVENT'¶

HANDLED_OUTPUT = 'HANDLED_OUTPUT'¶

HOOK_COMPLETED = 'HOOK_COMPLETED'¶

HOOK_ERRORED = 'HOOK_ERRORED'¶

HOOK_SKIPPED = 'HOOK_SKIPPED'¶

LOADED_INPUT = 'LOADED_INPUT'¶

LOGS_CAPTURED = 'LOGS_CAPTURED'¶

OBJECT_STORE_OPERATION = 'OBJECT_STORE_OPERATION'¶

PIPELINE_CANCELED = 'PIPELINE_CANCELED'¶

PIPELINE_CANCELING = 'PIPELINE_CANCELING'¶

PIPELINE_DEQUEUED = 'PIPELINE_DEQUEUED'¶

PIPELINE_ENQUEUED = 'PIPELINE_ENQUEUED'¶

PIPELINE_FAILURE = 'PIPELINE_FAILURE'¶

PIPELINE_START = 'PIPELINE_START'¶

PIPELINE_STARTING = 'PIPELINE_STARTING'¶

PIPELINE_SUCCESS = 'PIPELINE_SUCCESS'¶

STEP_EXPECTATION_RESULT = 'STEP_EXPECTATION_RESULT'¶

STEP_FAILURE = 'STEP_FAILURE'¶

STEP_INPUT = 'STEP_INPUT'¶

STEP_OUTPUT = 'STEP_OUTPUT'¶

STEP_RESTARTED = 'STEP_RESTARTED'¶

STEP_SKIPPED = 'STEP_SKIPPED'¶

STEP_START = 'STEP_START'¶

STEP_SUCCESS = 'STEP_SUCCESS'¶

STEP_UP_FOR_RETRY = 'STEP_UP_FOR_RETRY'¶

Pipeline configuration¶

Run Config Schema¶

The run_config used by execute_pipeline() and execute_pipeline_iterator() has the following schema:

{
  # configuration for execution, required if executors require config
  execution: {
    # the name of one, and only one available executor, typically 'in_process' or 'multiprocess'
    __executor_name__: {
      # executor-specific config, if required or permitted
      config: {
        ...
      }
    }
  },

  # configuration for loggers, required if loggers require config
  loggers: {
    # the name of an available logger
    __logger_name__: {
      # logger-specific config, if required or permitted
      config: {
        ...
      }
    },
    ...
  },

  # configuration for resources, required if resources require config
  resources: {
    # the name of a resource
    __resource_name__: {
      # resource-specific config, if required or permitted
      config: {
        ...
      }
    },
    ...
  },

  # configuration for solids, required if solids require config
  solids: {

    # these keys align with the names of the solids, or their alias in this pipeline
    __solid_name__: {

      # pass any data that was defined via config_field
      config: ...,

      # configurably specify input values, keyed by input name
      inputs: {
        __input_name__: {
          # if an dagster_type_loader is specified, that schema must be satisfied here;
          # scalar, built-in types will generally allow their values to be specified directly:
          value: ...
        }
      },

      # configurably materialize output values
      outputs: {
        __output_name__: {
          # if an dagster_type_materializer is specified, that schema must be satisfied
          # here; pickleable types will generally allow output as follows:
          pickle: {
            path: String
          }
        }
      }
    }
  },

  # optionally use an available system storage for intermediates etc.
  intermediate_storage: {
    # the name of one, and only one available system storage, typically 'filesystem' or
    # 'in_memory'
    __storage_name__: {
      config: {
        ...
      }
    }
  }
}

Intermediate Storage¶

dagster.io_manager_from_intermediate_storage(intermediate_storage_def)[source]¶

Define an IOManagerDefinition from an existing IntermediateStorageDefinition.

This method is used to adapt an existing user-defined intermediate storage to a IO manager resource, for example:

my_io_manager_def = io_manager_from_intermediate_storage(my_intermediate_storage_def)

@pipeline(mode_defs=[ModeDefinition(resource_defs={"io_manager": my_io_manager_def})])
def my_pipeline():
    ...

Parameters: intermediate_storage_def (IntermediateStorageDefinition) – The intermediate storage definition to be converted to an IO manager definition.
Returns: IOManagerDefinition

dagster.mem_intermediate_storage IntermediateStorageDefinition[source]¶

The default in-memory intermediate storage.

In-memory intermediate storage is the default on any pipeline run that does not configure any custom intermediate storage.

Keep in mind when using this storage that intermediates will not be persisted after the pipeline run ends. Use a persistent intermediate storage like fs_intermediate_storage() to persist intermediates and take advantage of advanced features like pipeline re-execution.

dagster.fs_intermediate_storage IntermediateStorageDefinition[source]¶

The default filesystem intermediate storage.

Filesystem system storage is available by default on any ModeDefinition that does not provide custom system storages. To select it, include a fragment such as the following in config:

intermediate_storage:
  filesystem:
    base_dir: '/path/to/dir/'

You may omit the base_dir config value, in which case the filesystem storage will use the DagsterInstance-provided default.

dagster.default_intermediate_storage_defs List[IntermediateStorageDefinition]¶

list() -> new empty list list(iterable) -> new list initialized from iterable’s items

The default intermediate storages available on any ModeDefinition that does not provide custom intermediate storages. These are currently [mem_intermediate_storage, fs_intermediate_storage].

Executors¶

dagster.in_process_executor ExecutorDefinition[source]¶

The default in-process executor.

In most Dagster environments, this will be the default executor. It is available by default on any ModeDefinition that does not provide custom executors. To select it explicitly, include the following top-level fragment in config:

execution:
  in_process:

Execution priority can be configured using the dagster/priority tag via solid metadata, where the higher the number the higher the priority. 0 is the default and both positive and negative numbers can be used.

dagster.multiprocess_executor ExecutorDefinition[source]¶

The default multiprocess executor.

This simple multiprocess executor is available by default on any ModeDefinition that does not provide custom executors. To select the multiprocess executor, include a fragment such as the following in your config:

execution:
  multiprocess:
    config:
      max_concurrent: 4

The max_concurrent arg is optional and tells the execution engine how many processes may run concurrently. By default, or if you set max_concurrent to be 0, this is the return value of multiprocessing.cpu_count().

dagster.default_executors List[ExecutorDefinition]¶

list() -> new empty list list(iterable) -> new list initialized from iterable’s items

The default executors available on any ModeDefinition that does not provide custom executors. These are currently [in_process_executor, multiprocess_executor].

Contexts¶

class dagster.TypeCheckContext(run_id, log_manager, scoped_resources_builder, dagster_type)[source]¶

The context object available to a type check function on a DagsterType.

log¶

Centralized log dispatch from user code.

Type: DagsterLogManager

resources¶

An object whose attributes contain the resources available to this solid.

Type: Any

run_id¶

The id of this pipeline run.

Type: str