XComs can be "pushed", meaning sent by a task, or "pulled", meaning received by a task. They are defined by a key, value, and timestamp. XComs allow tasks to exchange task metadata or small amounts of data. XCom (short for cross-communication) is a native feature within Airflow. The first method for passing data between Airflow tasks is to use XCom, which is a key Airflow feature for sharing task data. Large data sets will require a method making use of intermediate storage and possibly utilizing an external processing framework. As we'll describe in detail below, XComs are one method of passing data between task, but they are only appropriate for small amounts of data. Knowing the size of the data you are passing between Airflow tasks is important when deciding which implementation method to use. This will help you recover and ensure no data is lost should you have any failures. When designing a DAG that passes data between tasks, it is important to ensure that each task is idempotent. However, this concept also applies to tasks within your DAG if every task in your DAG is idempotent, your full DAG will be idempotent as well. We often hear about this concept as it applies to your entire DAG if you execute the same DAGRun multiple times, you will get the same result. This is the property whereby an operation can be applied multiple times without changing the result. Ensure IdempotencyĪn important concept for any data pipeline, including an Airflow DAG, is idempotency. In this guide, we'll walk through the two most commonly used methods, discuss when to use each, and show some example DAGs to demonstrate the implementation.īefore we dive into the specifics, there are a couple of high-level concepts that are important when writing DAGs where data is shared between tasks. There are a few methods you can use to implement data sharing between your Airflow tasks. But, maybe one of your downstream tasks requires metadata about an upstream task or processes the results of the task immediately before it. If you've been writing DAGs, you probably know that breaking them up into appropriately small tasks is best practice for debugging and recovering quickly from failures. Sharing data between tasks is a very common use case in Airflow.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |