<p>andreapierleoni发布的上述代码片段适用于<code>google-cloud-bigquery</code>python客户机的旧版本,例如,<code>google-cloud-bigquery</code>的<code>0.25.0</code>版本,它碰巧通过<code>pip install apache-beam[gcp]</code>安装。在</p>
<p>但是,bigquerypython客户机API在<code>google-cloud-bigquery</code>的较新版本中发生了巨大的变化,例如在我当前使用的<code>1.8.0</code>版本中,<code>bigquery.TableFieldSchema()</code>和{<cd8>}都不起作用。在</p>
<p>如果您使用的是<code>google-cloud-bigquery</code>包的最新版本,下面介绍如何从JSON文件获取所需的<code>SchemaField</code>列表(例如,创建表所必需的)。这是AndreaPierleoni发布的代码的改编版(谢谢!)在</p>
<pre><code>def _get_field_schema(field):
name = field['name']
field_type = field.get('type', 'STRING')
mode = field.get('mode', 'NULLABLE')
fields = field.get('fields', [])
if fields:
subschema = []
for f in fields:
fields_res = _get_field_schema(f)
subschema.append(fields_res)
else:
subschema = []
field_schema = bigquery.SchemaField(name=name,
field_type=field_type,
mode=mode,
fields=subschema
)
return field_schema
def parse_bq_json_schema(schema_filename):
schema = []
with open(schema_filename, 'r') as infile:
jsonschema = json.load(infile)
for field in jsonschema:
schema.append(_get_field_schema(field))
return schema
</code></pre>
<p>现在,假设您有一个表的<a href="https://cloud.google.com/bigquery/docs/schemas#specifying_a_json_schema_file" rel="nofollow noreferrer">schema already defined in JSON</a>。假设您有<a href="https://gist.github.com/nonbeing/9e1cbec94a4ad7a17cf3948db5ccb901" rel="nofollow noreferrer">this particular "schema.json" file</a>,那么使用上面的helper方法,您可以获得Python客户机所需的<code>SchemaField</code>表示,如下所示:</p>
^{pr2}$
<p>现在,对于<a href="https://cloud.google.com/bigquery/docs/tables#bigquery-create-table-python" rel="nofollow noreferrer">create a table having the above schema using the Python SDK</a>,您将执行以下操作:</p>
<pre><code>dataset_ref = bqclient.dataset('YOUR_DATASET')
table_ref = dataset_ref.table('YOUR_TABLE')
table = bigquery.Table(table_ref, schema=res_schema)
</code></pre>
<p>您可以选择按如下方式设置基于时间的分区(如果需要):</p>
<pre><code>table.time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field='timestamp' # name of column to use for partitioning
)
</code></pre>
<p>最后创建了表格:</p>
<pre><code>table = bqclient.create_table(table)
print('Created table {}, partitioned on column {}'.format(
table.table_id, table.time_partitioning.field))
</code></pre>