在Bigquery中使用作业配置创建范围分区表

2024-04-19 22:31:04 发布

您现在位置:Python中文网/ 问答频道 /正文

尝试将csv文件读入dataframe并使用该dataframe加载到具有范围分区的Bigquery表中。但是获取一个长时间的400 POST无效值:不是正确的类型错误

复制步骤: 使用google cloud bigquery v1.24.0

Test.csv

Name, Age, DOB
"rona", 10, 01-01-2010
"king", 20, 05-01-2000

下面是要复制的代码

import pandas as pd
from google.cloud import bigquery

def Range_Partitioning(field, dict_range):
    cRangePartition = bigquery.RangePartitioning(range_=bigquery.PartitionRange(start=dict_range.get("Start"), interval=dict_range.get("Interval"), end=dict_range.get("End")),
                field=field)
    return cRangePartition

df = pd.read_csv("Test.txt", dtype={ "Name": "str", "Age": "int64", "DOB": "str"}, parse_dates=["DOB"])
BQClient = bigquery.Client()
Dataset = "Test"
TableName = "Load_Range_Test"
schema = [
    {
        "name": "Name",
        "type": "STRING",
        "mode": "REQUIRED"
    },
    {
        "name": "Age",
        "type": "INTEGER",
        "mode": "REQUIRED"
    },
    {
        "name": "DOB",
        "type": "DATE",
        "mode": "REQUIRED"
    }
]
TableRef = sProjectId + "." + Dataset + "." + TableName
RangePartition = Range_Partitioning("Age", {"start":0,  "interval":1, "end":100})
WriteOption = "WRITE_TRUNCATE"
JobConfig = bigquery.LoadJobConfig(
                    schema=schema,
                    write_disposition=WriteOption,
                    range_partitioning=RangePartition)
Job = BQClient.load_table_from_dataframe(df, TableRef, job_config=JobConfig)
Job.result()

错误: 400 POST Long的值无效:类型不正确

当我不进行范围分区时,它可以工作,只有在使用范围分区时,我才会出现此错误


Tags: csvnametestfielddataframeagegetschema
1条回答
网友
1楼 · 发布于 2024-04-19 22:31:04

我已经正确地完成了代码工作,省略了Range_Partitioning(field, dict_range)函数用法,显式地指定了bigquery.RangePartitioning参数:

RangePartition = bigquery.RangePartitioning(
field="Age",
range_=bigquery.PartitionRange(start=0, end=100, interval=1)
)
WriteOption = "WRITE_TRUNCATE"
JobConfig = bigquery.LoadJobConfig(
                    schema=schema,
                    write_disposition=WriteOption,
                    range_partitioning=RangePartition)
Job = BQClient.load_table_from_dataframe(df, TableRef, job_config=JobConfig)
Job.result()

似乎是cRangePartition返回不一致的范围值

相关问题 更多 >