如何使用气流中的执行日期创建路径?

2024-04-20 13:50:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下气流dag:

    start_task = DummyOperator(task_id='start_task', dag=dag)

    gcs_export_uri_template = 'adstest/2018/08/31/*'
    update_bigquery = GoogleCloudStorageToBigQueryOperator(
        dag=dag,
        task_id='load_ads_to_BigQuery',
        bucket=GCS_BUCKET_ID,
        destination_project_dataset_table=table_name_template,
        source_format='CSV',
        source_objects=[gcs_export_uri_template],
        schema_fields=dc(),
        create_disposition='CREATE_IF_NEEDED',
        write_disposition='WRITE_append',
        skip_leading_rows = 1,
        google_cloud_storage_conn_id=CONNECTION_ID,
        bigquery_conn_id=CONNECTION_ID
    )

start_task >> update_bigquery

这个dag将数据从adstest/2018/08/31/*加载到BigQuery,效果非常好。在

我想修改Dag以基于execution date运行日期:

^{pr2}$

例如,如果执行日期是2018-09-02,我希望DAG转到:

Execution date : adstest/2018/09/02/*
Execution date - 1 days : adstest/2018/09/01/*
Execution date - 2 days : adstest/2018/08/31/*

我怎么能做到呢?在

编辑: 这是我更新的代码:

for i in range(5, 0, -1):
    gcs_export_uri_template = ['''adstest/{{ macros.ds_format(macros.ds_add(ds, -{0}), '%Y-%m-%d', '%Y/%m/%d') }}/*'''.format(i)]
    update_bigquery = GoogleCloudStorageToBigQueryOperator(
        dag=dag,
        task_id='load_ads_to_BigQuery-{}'.format(i),
        bucket=GCS_BUCKET_ID,
        destination_project_dataset_table=table_name_template,
        source_format='CSV',
        source_objects=gcs_export_uri_template,
        schema_fields=dc(),
        create_disposition='CREATE_IF_NEEDED',
        write_disposition='WRITE_APPEND',
        skip_leading_rows=1,
        google_cloud_storage_conn_id=CONNECTION_ID,
        bigquery_conn_id=CONNECTION_ID
    )
    start_task >> update_bigquery

编辑2:

我的代码:

for i in range(5, 0, -1):
    gcs_export_uri_template = ['''adstest/{{ macros.ds_format(macros.ds_add(ds, -params.i), '%Y-%m-%d', '%Y/%m/%d') }}/*'''.format(i)]

    update_bigquery = GoogleCloudStorageToBigQueryOperator(
        dag=dag,
        task_id='load_ads_to_BigQuery-{}'.format(i),
        bucket=GCS_BUCKET_ID,
        destination_project_dataset_table=table_name_template,
        source_format='CSV',
        source_objects=gcs_export_uri_template,
        schema_fields=dc(),
        params={'i': i},
        create_disposition='CREATE_IF_NEEDED',
        write_disposition='WRITE_APPEND',
        skip_leading_rows=1,
        google_cloud_storage_conn_id=CONNECTION_ID,
        bigquery_conn_id=CONNECTION_ID
    )

模板: enter image description here

代码给出以下错误:

"Source URI must not contain the ',' character: gs://adstest/{ macros.ds_format(macros.ds_add(ds, -params.i), '%Y-%m-%d', '%Y/%m/%d') }/*">

Tags: idformatsourcetasktabledstemplateexport
1条回答
网友
1楼 · 发布于 2024-04-20 13:50:50

您可以使用Airflow Macros来实现这一点,如下所示:

gcs_export_uri_template=[
    "adstest/{{ macros.ds_format(ds, '%Y-%m-%d', '%Y/%m/%d') }}/*",
    "adstest/{{ macros.ds_format(prev_ds, '%Y-%m-%d', '%Y/%m/%d') }}/*",
    "adstest/{{ macros.ds_format(macros.ds_add(ds, -2), '%Y-%m-%d', '%Y/%m/%d') }}/*"
]

update_bigquery = GoogleCloudStorageToBigQueryOperator(
    dag=dag,
    task_id='load_ads_to_BigQuery',
    bucket=GCS_BUCKET_ID,
    destination_project_dataset_table=table_name_template,
    source_format='CSV',
    source_objects=gcs_export_uri_template,
    schema_fields=dc(),
    create_disposition='CREATE_IF_NEEDED',
    write_disposition='WRITE_APPEND',
    skip_leading_rows = 1,
    google_cloud_storage_conn_id=CONNECTION_ID,
    bigquery_conn_id=CONNECTION_ID
)

运行上述代码时,可以在Web UI中签入呈现的参数:

Airflow Macros Date


对于编辑的评论

您需要在params参数中传递循环变量i的值,并将其作为params.i在字符串中使用,如下所示:

^{pr2}$

相关问题 更多 >