将大文件压缩为BigQuery

import pandas as pd destination_table = 'product_data.FS_orders' project_id = '##' pkey ='##' chunks = [] for chunk in pd.read_csv('Historic_orders.csv',chunksize=1000, encoding='windows-1252', names=['Orderdate','Weborderno','Productcode','Quantitysold','Paymentmethod','ProductGender','DeviceType','Brand','ProductDescription','OrderType','ProductCategory','UnitpriceGBP' 'Webtype1','CostPrice','Webtype2','Webtype3','Variant','Orderlinetax']): chunk.replace(r' *!','Null', regex=True) chunk.to_gbq(destination_table, project_id, if_exists='append', private_key=pkey) chunks.append(chunk) df = pd.concat(chunks, axis=0) print(df.head(5)) pd.to_csv('Historic_orders_cleaned.csv')

1条回答

网友

1楼 · 发布于 2024-04-20 11:22:53

问题： -为什么是流媒体而不是简单的加载？这样，您可以上载1 GB的批而不是1000行。流式处理通常是这样的情况，即您确实有需要在发生时追加的连续数据。如果在收集数据和加载作业之间有1天的休息时间，则只加载数据通常更安全。see here。你知道吗

除此之外。我也遇到过从csv文件加载bigQuery中的表的问题，大多数情况下，要么是1）编码（我看到您使用的是非utf-8编码），要么是2）无效字符，一些逗号在文件中间丢失，从而导致了中断。你知道吗

为了验证这一点，如果向后插入行呢？你也有同样的错误吗？你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章