An asset is an object in persistent storage, such as a table, file, or persisted machine learning model. A software-defined asset is a Dagster object that couples an asset to the function and upstream assets that are used to produce its contents.
Create a definition for how to compute an asset.
An asset key, e.g. the name of a table.
A function, which can be run to compute the contents of the asset.
A set of upstream assets that are provided as inputs to the function when computing the asset.
Unlike an op, whose dependencies are determined by the graph it lives inside, an asset knows about the upstream assets it depends on. The upstream assets are inferred from the arguments to the decorated function. The name of the argument designates the name of the upstream asset.
An asset has an op inside it to represent the function that computes it. The name of the op will be the segments of the asset key, separated by double-underscores.
name (Optional[str]) – The name of the asset. If not provided, defaults to the name of the decorated function. The asset’s name must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, the asset’s key is the concatenation of the key_prefix and the asset’s name, which defaults to the name of the decorated function. Each item in key_prefix must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
ins (Optional[Mapping[str, AssetIn]]) – A dictionary that maps input names to information about the input.
non_argument_deps (Optional[Union[Set[AssetKey], Set[str]]]) – Set of asset keys that are upstream dependencies, but do not pass an input to the asset.
config_schema (Optional[ConfigSchema) – The configuration schema for the asset’s underlying op. If set, Dagster will check that config provided for the op matches this schema and fail if it does not. If not set, Dagster will accept any config provided for the op.
metadata (Optional[Dict[str, Any]]) – A dict of metadata entries for the asset.
required_resource_keys (Optional[Set[str]]) – Set of resource handles required by the op.
io_manager_key (Optional[str]) – The resource key of the IOManager used for storing the output of the op as an asset, and for loading it in downstream ops (default: “io_manager”). Only one of io_manager_key and io_manager_def can be provided.
io_manager_def (Optional[IOManagerDefinition]) – (Experimental) The definition of the IOManager used for storing the output of the op as an asset, and for loading it in downstream ops. Only one of io_manager_def and io_manager_key can be provided.
compute_kind (Optional[str]) – A string to represent the kind of computation that produces the asset, e.g. “dbt” or “spark”. It will be displayed in Dagit as a badge on the asset.
dagster_type (Optional[DagsterType]) – Allows specifying type validation functions that will be executed on the output of the decorated function after it runs.
partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the asset.
op_tags (Optional[Dict[str, Any]]) – A dictionary of tags for the op that computes the asset. Frameworks may expect and require certain metadata to be attached to a op. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value.
group_name (Optional[str]) – A string name used to organize multiple assets into groups. If not provided, the name “default” is used.
resource_defs (Optional[Mapping[str, ResourceDefinition]]) – (Experimental) A mapping of resource keys to resource definitions. These resources will be initialized during execution, and can be accessed from the context within the body of the function.
output_required (bool) – Whether the decorated function will always materialize an asset. Defaults to True. If False, the function can return None, which will not be materialized to storage and will halt execution of downstream assets.
freshness_policy (FreshnessPolicy) – A constraint telling Dagster how often this asset is intended to be updated with respect to its root data.
retry_policy (Optional[RetryPolicy]) – The retry policy for the op that computes the asset.
code_version (Optional[str]) – (Experimental) Version of the code that generates this asset. In general, versions should be set only for code that deterministically produces the same output when given the same inputs.
Examples
@asset
def my_asset(my_upstream_asset: int) -> int:
return my_upstream_asset + 1
Defines an asset dependency.
If provided, the asset’s key is the concatenation of the key_prefix and the input name. Only one of the “key_prefix” and “key” arguments should be provided.
Optional[Union[str, Sequence[str]]]
The asset’s key. Only one of the “key_prefix” and “key” arguments should be provided.
Optional[Union[str, Sequence[str], AssetKey]]
A dict of the metadata for the input. For example, if you only need a subset of columns from an upstream table, you could include that in metadata and the IO manager that loads the upstream table could use the metadata to determine which columns to load.
Optional[Dict[str, Any]]
Defines what partitions to depend on in the upstream asset. If not provided, defaults to the default partition mapping for the partitions definition, which is typically maps partition keys to the same partition keys in upstream assets.
Optional[PartitionMapping]
Allows specifying type validation functions that will be executed on the input of the decorated function before it runs.
A SourceAsset represents an asset that will be loaded by (but not updated by) Dagster.
Metadata associated with the asset.
List[MetadataEntry]
The key for the IOManager that will be used to load the contents of the asset when it’s used as an input to other assets inside a job.
Optional[str]
(Experimental) The definition of the IOManager that will be used to load the contents of the asset when it’s used as an input to other assets inside a job.
Optional[IOManagerDefinition]
(Experimental) resource definitions that may be required by the dagster.IOManagerDefinition
provided in the io_manager_def argument.
Optional[Mapping[str, ResourceDefinition]]
The description of the asset.
Optional[str]
Defines the set of partition keys that compose the asset.
Optional[PartitionsDefinition]
Optional[SourceAssetObserveFunction]
Creates a definition of a job which will materialize a selection of assets. This will only be resolved to a JobDefinition once placed in a code location.
name (str) – The name for the job.
selection (Union[str, Sequence[str], Sequence[AssetKey], Sequence[AssetsDefinition], AssetSelection]) –
The assets that will be materialized when the job is run.
The selected assets must all be included in the assets that are passed to the assets argument of the Definitions object that this job is included on.
The string “my_asset*” selects my_asset and all downstream assets within the code location. A list of strings represents the union of all assets selected by strings within the list.
The selection will be resolved to a set of assets once the when location is loaded.
config –
Describes how the Job is parameterized at runtime.
If no value is provided, then the schema for the job’s run config is a standard format based on its solids and resources.
If a dictionary is provided, then it must conform to the standard config schema, and it will be used as the job’s run config for the job whenever the job is executed. The values provided will be viewable and editable in the Dagit playground, so be careful with secrets.
If a ConfigMapping
object is provided, then the schema for the job’s run config is
determined by the config mapping, and the ConfigMapping, which should return
configuration in the standard format to configure the job.
tags (Optional[Mapping[str, Any]]) – Arbitrary information that will be attached to the execution of the Job. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value. These tag values may be overwritten by tag values provided at invocation time.
description (Optional[str]) – A description for the Job.
partitions_def (Optional[PartitionsDefinition]) – Defines the set of partitions for this job. All AssetDefinitions selected for this job must have a matching PartitionsDefinition.
executor_def (Optional[ExecutorDefinition]) – How this Job will be executed. Defaults to multi_or_in_process_executor
,
which can be switched between multi-process and in-process modes of execution. The
default mode of execution is multi-process.
The job, which can be placed inside a code location.
UnresolvedAssetJobDefinition
Examples
# A job that targets all assets in the code location:
@asset
def asset1():
...
defs = Definitions(
assets=[asset1],
jobs=[define_asset_job("all_assets")],
)
# A job that targets a single asset
@asset
def asset1():
...
defs = Definitions(
assets=[asset1],
jobs=[define_asset_job("all_assets", selection=[asset1])],
)
# A job that targets all the assets in a group:
defs = Definitions(
assets=assets,
jobs=[define_asset_job("marketing_job", selection=AssetSelection.groups("marketing"))],
)
# Resources are supplied to the assets, not the job:
@asset(required_resource_keys={"slack_client"})
def asset1():
...
defs = Definitions(
assets=[asset1],
jobs=[define_asset_job("all_assets")],
resources={"slack_client": prod_slack_client},
)
An AssetSelection defines a query over a set of assets, normally all the assets in a code location.
You can use the “|”, “&”, and “-” operators to create unions, intersections, and differences of asset selections, respectively.
AssetSelections are typically used with define_asset_job()
.
Examples
# Select all assets in group "marketing":
AssetSelection.groups("marketing")
# Select all assets in group "marketing", as well as the asset with key "promotion":
AssetSelection.groups("marketing") | AssetSelection.keys("promotion")
# Select all assets in group "marketing" that are downstream of asset "leads":
AssetSelection.groups("marketing") & AssetSelection.keys("leads").downstream()
# Select all assets in a list of assets:
AssetSelection.assets(*my_assets_list)
# Select all assets except for those in group "marketing"
AssetSelection.all() - AssetSelection.groups("marketing")
Returns a selection that includes all assets that are downstream of any of the assets in this selection, selecting the assets in this selection by default. Iterates through each asset in this selection and returns the union of all downstream assets.
of 2 means all assets that are children or grandchildren of the assets in this selection.
If the include_self flag is False, return each downstream asset that is not part of the original selection. By default, set to True.
Returns a selection that includes assets that belong to any of the provided groups.
Returns a selection that includes assets with any of the provided keys.
Examples
AssetSelection.keys(AssetKey(["a"]))
AssetSelection.keys("a")
AssetSelection.keys(AssetKey(["a"]), AssetKey(["b"]))
AssetSelection.keys("a", "b")
asset_key_list = [AssetKey(["a"]), AssetKey(["b"])]
AssetSelection.keys(*asset_key_list)
Given an asset selection, returns a new asset selection that contains all of the sink assets within the original asset selection.
A sink asset is an asset that has no downstream dependencies within the asset selection. The sink asset can have downstream dependencies outside of the asset selection.
Given an asset selection, returns a new asset selection that contains all of the source assets within the original asset selection.
A source asset is an asset that has no upstream dependencies within the asset selection. The source asset can have downstream dependencies outside of the asset selection.
Returns a selection that includes all assets that are upstream of any of the assets in this selection, selecting the assets in this selection by default. Iterates through each asset in this selection and returns the union of all downstream assets.
depth (Optional[int]) – If provided, then only include assets to the given depth. A depth of 2 means all assets that are parents or grandparents of the assets in this selection.
include_self (bool) – If True, then include the assets in this selection in the result. If the include_self flag is False, return each upstream asset that is not part of the original selection. By default, set to True.
A FreshnessPolicy specifies how up-to-date you want a given asset to be.
Attaching a FreshnessPolicy to an asset definition encodes an expectation on the upstream data that you expect to be incorporated into the current state of that asset at certain points in time. How this is calculated differs depending on if the asset is unpartitioned or time-partitioned (other partitioning schemes are not supported).
For time-partitioned assets, the current data time for the asset is simple to calculate. The upstream data that is incorporated into the asset is exactly the set of materialized partitions for that asset. Thus, the current data time for the asset is simply the time up to which all partitions have been materialized.
For unpartitioned assets, the current data time is based on the upstream materialization records that were read to generate the current state of the asset. More specifically, imagine you have two assets, where A depends on B. If B has a FreshnessPolicy defined, this means that at time T, the most recent materialization of B should have come after a materialization of A which was no more than maximum_lag_minutes ago. This calculation is recursive: any given asset is expected to incorporate up-to-date data from all of its upstream assets.
It is assumed that all asset definitions with no upstream asset definitions consume from some always-updating source. That is, if you materialize that asset at time T, it will incorporate all data up to time T.
If cron_schedule is not defined, the given asset will be expected to incorporate upstream data from no more than maximum_lag_minutes ago at all points in time. For example, “The events table should always have data from at most 1 hour ago”.
If cron_schedule is defined, the given asset will be expected to incorporate upstream data from no more than maximum_lag_minutes ago at each cron schedule tick. For example, “By 9AM, the signups table should contain all of yesterday’s data”.
The freshness status of assets with policies defined will be visible in the UI. If you are using an asset reconciliation sensor, this sensor will kick off runs to help keep your assets up to date with respect to their FreshnessPolicy.
maximum_lag_minutes (float) – An upper bound for how old the data contained within this asset may be.
cron_schedule (Optional[str]) – A cron schedule string (e.g. "0 1 * * *"
) specifying a
series of times by which the maximum_lag_minutes constraint must be satisfied. If
no cron schedule is provided, then this constraint must be satisfied at all times.
cron_schedule_timezone (Optional[str]) – Timezone in which the cron schedule should be evaluated. If not specified, defaults to UTC. Supported strings for timezones are the ones provided by the IANA time zone database <https://www.iana.org/time-zones> - e.g. “America/Los_Angeles”.
# At any point in time, this asset must incorporate all upstream data from at least 30 minutes ago.
@asset(freshness_policy=FreshnessPolicy(maximum_lag_minutes=30))
def fresh_asset():
...
# At any point in time, this asset must incorporate all upstream data from at least 30 minutes ago.
@asset(freshness_policy=FreshnessPolicy(maximum_lag_minutes=30))
def cron_up_to_date_asset():
...
Constructs a list of assets and source assets from the given modules.
modules (Iterable[ModuleType]) – The Python modules to look for assets inside.
group_name (Optional[str]) – Group name to apply to the loaded assets. The returned assets will be copies of the loaded objects, with the group name added.
key_prefix (Optional[Union[str, Sequence[str]]]) – Prefix to prepend to the keys of the loaded assets. The returned assets will be copies of the loaded objects, with the prefix prepended.
A list containing assets and source assets defined in the given modules.
Sequence[Union[AssetsDefinition, SourceAsset]]
Constructs a list of assets, source assets, and cacheable assets from the module where this function is called.
group_name (Optional[str]) – Group name to apply to the loaded assets. The returned assets will be copies of the loaded objects, with the group name added.
key_prefix (Optional[Union[str, Sequence[str]]]) – Prefix to prepend to the keys of the loaded assets. The returned assets will be copies of the loaded objects, with the prefix prepended.
A list containing assets, source assets, and cacheable assets defined in the module.
Sequence[Union[AssetsDefinition, SourceAsset, CachableAssetsDefinition]]
Constructs a list of assets and source assets that includes all asset definitions, source assets, and cacheable assets in all sub-modules of the given package module.
A package module is the result of importing a package.
package_module (ModuleType) – The package module to looks for assets inside.
group_name (Optional[str]) – Group name to apply to the loaded assets. The returned assets will be copies of the loaded objects, with the group name added.
key_prefix (Optional[Union[str, Sequence[str]]]) – Prefix to prepend to the keys of the loaded assets. The returned assets will be copies of the loaded objects, with the prefix prepended.
A list containing assets, source assets, and cacheable assets defined in the module.
Sequence[Union[AssetsDefinition, SourceAsset, CacheableAssetsDefinition]]
Constructs a list of assets, source assets, and cacheable assets that includes all asset definitions and source assets in all sub-modules of the given package.
package_name (str) – The name of a Python package to look for assets inside.
group_name (Optional[str]) – Group name to apply to the loaded assets. The returned assets will be copies of the loaded objects, with the group name added.
key_prefix (Optional[Union[str, Sequence[str]]]) – Prefix to prepend to the keys of the loaded assets. The returned assets will be copies of the loaded objects, with the prefix prepended.
A list containing assets, source assets, and cacheable assets defined in the module.
Sequence[Union[AssetsDefinition, SourceAsset, CacheableAssetsDefinition]]
Defines a set of assets that are produced by the same op or graph.
AssetsDefinitions are typically not instantiated directly, but rather produced using the
@asset
or @multi_asset
decorators.
Maps assets that are produced by this definition to assets that they depend on. The dependencies can be either “internal”, meaning that they refer to other assets that are produced by this definition, or “external”, meaning that they refer to assets that aren’t produced by this definition.
Constructs an AssetsDefinition from a GraphDefinition.
graph_def (GraphDefinition) – The GraphDefinition that is an asset.
keys_by_input_name (Optional[Mapping[str, AssetKey]]) – A mapping of the input names of the decorated graph to their corresponding asset keys. If not provided, the input asset keys will be created from the graph input names.
keys_by_output_name (Optional[Mapping[str, AssetKey]]) – A mapping of the output names of the decorated graph to their corresponding asset keys. If not provided, the output asset keys will be created from the graph output names.
key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, key_prefix will be prepended to each key in keys_by_output_name. Each item in key_prefix must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
internal_asset_deps (Optional[Mapping[str, Set[AssetKey]]]) – By default, it is assumed that all assets produced by the graph depend on all assets that are consumed by that graph. If this default is not correct, you pass in a map of output names to a corrected set of AssetKeys that they depend on. Any AssetKeys in this list must be either used as input to the asset or produced within the graph.
partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the assets.
group_name (Optional[str]) – A group name for the constructed asset. Assets without a group name are assigned to a group called “default”.
resource_defs (Optional[Mapping[str, ResourceDefinition]]) – (Experimental) A mapping of resource keys to resource definitions. These resources will be initialized during execution, and can be accessed from the body of ops in the graph during execution.
partition_mappings (Optional[Mapping[str, PartitionMapping]]) – Defines how to map partition keys for this asset to partition keys of upstream assets. Each key in the dictionary correponds to one of the input assets, and each value is a PartitionMapping. If no entry is provided for a particular asset dependency, the partition mapping defaults to the default partition mapping for the partitions definition, which is typically maps partition keys to the same partition keys in upstream assets.
metadata_by_output_name (Optional[Mapping[str, MetadataUserInput]]) – Defines metadata to be associated with each of the output assets for this node. Keys are names of the outputs, and values are dictionaries of metadata to be associated with the related asset.
freshness_policies_by_output_name_ouptut_name (Optional[Mapping[str, FreshnessPolicy]]) – Defines a FreshnessPolicy to be associated with some or all of the output assets for this node. Keys are the names of the outputs, and values are the FreshnessPolicies to be attached to the associated asset.
descriptions_by_output_name (Optional[Mapping[str, str]]) – Defines a description to be associated with each of the output asstes for this graph.
Constructs an AssetsDefinition from an OpDefinition.
op_def (OpDefinition) – The OpDefinition that is an asset.
keys_by_input_name (Optional[Mapping[str, AssetKey]]) – A mapping of the input names of the decorated op to their corresponding asset keys. If not provided, the input asset keys will be created from the op input names.
keys_by_output_name (Optional[Mapping[str, AssetKey]]) – A mapping of the output names of the decorated op to their corresponding asset keys. If not provided, the output asset keys will be created from the op output names.
key_prefix (Optional[Union[str, Sequence[str]]]) – If provided, key_prefix will be prepended to each key in keys_by_output_name. Each item in key_prefix must be a valid name in dagster (ie only contains letters, numbers, and _) and may not contain python reserved keywords.
internal_asset_deps (Optional[Mapping[str, Set[AssetKey]]]) – By default, it is assumed that all assets produced by the op depend on all assets that are consumed by that op. If this default is not correct, you pass in a map of output names to a corrected set of AssetKeys that they depend on. Any AssetKeys in this list must be either used as input to the asset or produced within the op.
partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the assets.
group_name (Optional[str]) – A group name for the constructed asset. Assets without a group name are assigned to a group called “default”.
partition_mappings (Optional[Mapping[str, PartitionMapping]]) – Defines how to map partition keys for this asset to partition keys of upstream assets. Each key in the dictionary correponds to one of the input assets, and each value is a PartitionMapping. If no entry is provided for a particular asset dependency, the partition mapping defaults to the default partition mapping for the partitions definition, which is typically maps partition keys to the same partition keys in upstream assets.
metadata_by_output_name (Optional[Mapping[str, MetadataUserInput]]) – Defines metadata to be associated with each of the output assets for this node. Keys are names of the outputs, and values are dictionaries of metadata to be associated with the related asset.
freshness_policies_by_output_name_ouptut_name (Optional[Mapping[str, FreshnessPolicy]]) – Defines a FreshnessPolicy to be associated with some or all of the output assets for this node. Keys are the names of the outputs, and values are the FreshnessPolicies to be attached to the associated asset.
Returns a representation of this asset as a SourceAsset
.
If this is a multi-asset, the “key” argument allows selecting which asset to return a SourceAsset representation of.
key (Optional[Union[str, Sequence[str], AssetKey]]]) – If this is a multi-asset, select which asset to return a SourceAsset representation of. If not a multi-asset, this can be left as None.
SourceAsset
Create a combined definition of multiple assets that are computed using the same op and same upstream assets.
Each argument to the decorated function references an upstream asset that this asset depends on. The name of the argument designates the name of the upstream asset.
name (Optional[str]) – The name of the op.
outs – (Optional[Dict[str, AssetOut]]): The AssetOuts representing the produced assets.
ins (Optional[Mapping[str, AssetIn]]) – A dictionary that maps input names to information about the input.
non_argument_deps (Optional[Union[Set[AssetKey], Set[str]]]) – Set of asset keys that are upstream dependencies, but do not pass an input to the multi_asset.
config_schema (Optional[ConfigSchema) – The configuration schema for the asset’s underlying op. If set, Dagster will check that config provided for the op matches this schema and fail if it does not. If not set, Dagster will accept any config provided for the op.
required_resource_keys (Optional[Set[str]]) – Set of resource handles required by the underlying op.
compute_kind (Optional[str]) – A string to represent the kind of computation that produces the asset, e.g. “dbt” or “spark”. It will be displayed in Dagit as a badge on the asset.
internal_asset_deps (Optional[Mapping[str, Set[AssetKey]]]) – By default, it is assumed that all assets produced by a multi_asset depend on all assets that are consumed by that multi asset. If this default is not correct, you pass in a map of output names to a corrected set of AssetKeys that they depend on. Any AssetKeys in this list must be either used as input to the asset or produced within the op.
partitions_def (Optional[PartitionsDefinition]) – Defines the set of partition keys that compose the assets.
op_tags (Optional[Dict[str, Any]]) – A dictionary of tags for the op that computes the asset. Frameworks may expect and require certain metadata to be attached to a op. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value.
can_subset (bool) – If this asset’s computation can emit a subset of the asset keys based on the context.selected_assets argument. Defaults to False.
resource_defs (Optional[Mapping[str, ResourceDefinition]]) – (Experimental) A mapping of resource keys to resource definitions. These resources will be initialized during execution, and can be accessed from the context within the body of the function.
group_name (Optional[str]) – A string name used to organize multiple assets into groups. This group name will be applied to all assets produced by this multi_asset.
retry_policy (Optional[RetryPolicy]) – The retry policy for the op that computes the asset.
code_version (Optional[str]) – (Experimental) Version of the code encapsulated by the multi-asset. If set, this is used as a default code version for all defined assets.
Defines one of the assets produced by a @multi_asset
.
If provided, the asset’s key is the
concatenation of the key_prefix and the asset’s name. When using @multi_asset
, the
asset name defaults to the key of the “outs” dictionary Only one of the “key_prefix” and
“key” arguments should be provided.
Optional[Union[str, Sequence[str]]]
The asset’s key. Only one of the “key_prefix” and “key” arguments should be provided.
Optional[Union[str, Sequence[str], AssetKey]]
The type of this output. Should only be set if the correct type can not be inferred directly from the type signature of the decorated function.
Optional[Union[Type, DagsterType]]]
Human-readable description of the output.
Optional[str]
Whether the presence of this field is required. (default: True)
bool
The resource key of the IO manager used for this output. (default: “io_manager”).
Optional[str]
A dict of the metadata for the output. For example, users can provide a file path if the data object will be stored in a filesystem, or provide information of a database table when it is going to load the data into the table.
Optional[Dict[str, Any]]
A string name used to organize multiple assets into groups. If not provided, the name “default” is used.
Optional[str]
The version of the code that generates this asset.
Optional[str]
A policy which indicates how up to date this asset is intended to be.
Optional[FreshnessPolicy]
Caches resource definitions that are used to load asset values across multiple load invocations.
Should not be instantiated directly. Instead, use
get_asset_value_loader()
.
Loads the contents of an asset as a Python object.
Invokes load_input on the IOManager
associated with the asset.
asset_key (Union[AssetKey, Sequence[str], str]) – The key of the asset to load.
python_type (Optional[Type]) – The python type to load the asset as. This is what will be returned inside load_input by context.dagster_type.typing_type.
partition_key (Optional[str]) – The partition of the asset to load.
resource_config (Optional[Any]) – A dictionary of resource configurations to be passed
to the IOManager
.
The contents of an asset as a Python object.