<p>我已经使用您的两个查询执行了一些测试,它们是以相同的方式执行的</p>
<p>首先,我必须指出<strong>query()</strong>方法接收一个字符串,并使用<em>作业配置</em>来配置作业。此外,文档没有提到与<em>查询字符串</em>,<a href="https://googleapis.dev/python/bigquery/latest/_modules/google/cloud/bigquery/client.html#Client.query" rel="nofollow noreferrer">here</a>中的额外空格相关的任何问题</p>
<p>此外,如果您导航到BigQuery UI,一次复制并粘贴一个查询并执行它,您将在<strong><em>作业信息</em></strong>下看到,两个查询将处理大约23Gb的数据,相同数量的数据将是计费的<strong><em>字节</em></strong>。因此,如果您的set<code>bigquery.QueryJobConfig(maximum_bytes_billed=23000000000)</code>和<em>省略了<code>to_dataframe()</code>方法</em>,那么上面提到的两个查询都会运行得很好</p>
<p><strong>更新</strong>:</p>
<p>根据<a href="https://cloud.google.com/bigquery/docs/cached-results" rel="nofollow noreferrer">documentation</a>,默认情况下<code>use_query_cache</code>设置为true,这意味着如果运行相同的查询,它将从以前的查询中检索结果。因此,不会处理任何字节。如果以前运行查询时没有<code>maximum_bytes_billed</code>集。然后以最大数量运行同一查询,即使该查询的处理量大于您现在设置的处理量,该查询仍将运行</p>
<p>在您的例子中,我使用了来自AI平台的Python3笔记本和Shell中的.py文件来运行以下代码</p>
<p>第一个代码</p>
<pre><code>from google.cloud import bigquery
import pandas
client = bigquery.Client()
dataset_ref = client.dataset("stackoverflow", project="bigquery-public-data")
dataset = client.get_dataset(dataset_ref)
job_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
job_config.use_query_cache = False
working_query = """
SELECT a.id, a.body, a.owner_user_id
FROM `bigquery-public-data.stackoverflow.posts_answers` AS a
INNER JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q
ON q.id = a.parent_id
WHERE q.tags LIKE '%bigquery%'
"""
answers_query_job = client.query(working_query, job_config)
answers_query_job.to_dataframe()
</code></pre>
<p>第二个代码</p>
<pre><code>from google.cloud import bigquery
import pandas
client = bigquery.Client()
dataset_ref = client.dataset("stackoverflow", project="bigquery-public-data")
dataset = client.get_dataset(dataset_ref)
job_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
job_config.use_query_cache = False
bad_query = """
SELECT a.id, a.body, a.owner_user_id
FROM `bigquery-public-data.stackoverflow.posts_answers` AS a
INNER JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q
ON q.id = a.parent_id
WHERE q.tags LIKE '%bigquery%'
"""
answers_query_job = client.query(working_query, job_config)
answers_query_job.to_dataframe()
</code></pre>
<p>上述代码均无效。它们导致了以下错误:</p>
<pre><code>Query exceeded limit for bytes billed: 10000000000. 24460132352 or higher required.
</code></pre>
<p>另一方面,如果设置了<code>job_config = bigquery.QueryJobConfig(maximum_bytes_billed=25000000000)</code>。两个查询都正常运行</p>