BigQuery使用Python使用自动检测模式从拼花文件创建外部表

2024-05-15 05:01:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我找不到任何使用自动检测模式从Paquet文件创建外部表的示例。这是我目前的代码:

    bq_client = bigquery.Client.from_service_account_json(key_path)
    table_name = "my_table"
    table_id = f"{PROJECT_ID}.{DATASET}.{table_name}"    
    dataset_ref = bq_client.dataset(DATASET)

    table_ref = bigquery.TableReference(dataset_ref, table_id)
    table_schema = [bigquery.schema.SchemaField("example","STRING")] # I don't want this
    table = bigquery.Table(table_ref, table_schema) # I don't want this
    
    external_config = bigquery.ExternalConfig(source_format='PARQUET')
    source_uris = [f"gs://path/to/file_name.snappy.parquet"]

    external_config.source_uris = source_uris
    external_config.autodetect = True
    table.external_data_configuration = external_config # Not sure how to do this
    
    bq_client.create_table(table) # and this without table schema
    logger.debug("Created table '{}'.".format(table_id))

目前,我必须指定表模式。我想改为自动检测模式。请帮忙。多谢各位


Tags: nameclientrefidconfigsourceschematable
1条回答
网友
1楼 · 发布于 2024-05-15 05:01:25

查看文档https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet#loading_parquet_data_into_a_new_table

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"

job_config = bigquery.LoadJobConfig(source_format=bigquery.SourceFormat.PARQUET,)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.parquet"

load_job = client.load_table_from_uri(
    uri, table_id, job_config=job_config
)  # Make an API request.

load_job.result()  # Waits for the job to complete.

destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

相关问题 更多 >

    热门问题