如何将panda列转换成大查询表日期表单

2024-04-25 21:48:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个panda数据框,其日期格式如下:

发布日期=2018-08-31 我使用panda to_gbq()函数将数据转储到bigquery表中。在转储数据之前,我确保列的格式与表scheme匹配。publishedDate仅在bigquery表中是日期。如何实现类似于:

     df['PublishDate'] = df['PublishDate'].astype('?????')

我试过约会时间

^{pr2}$

但这些都没用!在


Tags: to数据函数df格式时间bigquerypanda
2条回答

我在pandas gbq中找不到日期类型的支持。在

另一个选项是使用bigquery客户机插入:

from google.cloud import bigquery


def chunks(l, chunk_size):
    for i in range(0, len(l), chunk_size):
        yield l[i:i + chunk_size]


CLIENT_ROW_LIMIT = 10000
SCHEMA = [
    bigquery.SchemaField('...'),
]

def push_with_date(df):
    client = bigquery.Client(project='...')
    dataset = client.dataset('...')
    table_ref = dataset.table('...')
    rows = [row.tolist() for index, row in df.iterrows()]
    for i, chunk in enumerate(chunks(rows, CLIENT_ROW_LIMIT)):
        print('pushing', i)
        errors = client.insert_rows(table_ref, chunk, SCHEMA)
        if errors:
            # Handle
            raise Exception

阿菲克,熊猫gbq doesn't seem to have support for the DATE type。因此,最好的选择可能是将列导出为时间戳,然后使用SQL查询将其转换为日期。在

df['PublishTimestamp'] = pd.to_datetime(
    df['PublishDate'],
    format='%Y-%m-%d',
    errors='coerce'
)
df.to_gbq("YOUR-DATASET.YOUR-TABLE", project_id="YOUR-PROJECT")

client = bigquery.Client()

job_config = bigquery.QueryJobConfig()
table_ref = client.dataset("YOUR-DATASET").table("YOUR-TABLE")
job_config.destination = ref_table
job_config.write_disposition = "WRITE_TRUNCATE"

sql = """
    SELECT
      *,
      DATE(PublishTimestamp) as PublishDate
    FROM
      `YOUR-PROJECT.YOUR-DATASET.YOUR-TABLE`
"""

query_job = client.query(
    sql,
    job_config=job_config
)
query_job.result()

相关问题 更多 >