在Python的BigQuery中，空格会导致问题问题的回答

在Python的BigQuery中，空格会导致问题

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有以下Python代码来检索BigQuery数据集。然后我在Jupyter<a href="https://www.kaggle.com/calyxus/bigquery-problem/edit" rel="nofollow noreferrer">Notebook on kaggle</a>上执行两个查询working_query和bad_query。唯一的区别是在后面的第3行添加了一个空格 <code>...posts_questions` as q</code>这会使bad_查询与消息一起失败 <code>Query exceeded limit for bytes billed: 10000000000. 24460132352 or higher required.</code> 我知道已经启用了成本控制，但不知道这是怎么回事。我将来怎样才能避免这样的陷阱？有人能解释一下这个问题吗 <pre><code>from google.cloud import bigquery client = bigquery.Client() dataset_ref = client.dataset("stackoverflow", project="bigquery-public-data") dataset = client.get_dataset(dataset_ref) safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10) answers_query_job = client.query(working_query, job_config=safe_config) answers_query_job.to_dataframe() </code></pre> <pre><code>working_query = """ SELECT a.id, a.body, a.owner_user_id FROM `bigquery-public-data.stackoverflow.posts_answers` AS a INNER JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q ON q.id = a.parent_id WHERE q.tags LIKE '%bigquery%' """ </code></pre> <pre><code>bad_query = """ SELECT a.id, a.body, a.owner_user_id FROM `bigquery-public-data.stackoverflow.posts_answers` AS a INNER JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q ON q.id = a.parent_id WHERE q.tags LIKE '%bigquery%' """ </code></pre> 更新： 这是一个缓存问题，因为在激活成本控制之前运行了正在运行的查询。这样，即使启用了成本控制，它也可以从缓存中检索数据。查询必须完全相同才能共享缓存，因此即使添加了空格也可以防止出现这种情况

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我已经使用您的两个查询执行了一些测试，它们是以相同的方式执行的 首先，我必须指出query（）方法接收一个字符串，并使用作业配置来配置作业。此外，文档没有提到与查询字符串，<a href="https://googleapis.dev/python/bigquery/latest/_modules/google/cloud/bigquery/client.html#Client.query" rel="nofollow noreferrer">here</a>中的额外空格相关的任何问题 此外，如果您导航到BigQuery UI，一次复制并粘贴一个查询并执行它，您将在作业信息下看到，两个查询将处理大约23Gb的数据，相同数量的数据将是计费的字节。因此，如果您的set<code>bigquery.QueryJobConfig(maximum_bytes_billed=23000000000)</code>和省略了<code>to_dataframe()</code>方法，那么上面提到的两个查询都会运行得很好 更新： 根据<a href="https://cloud.google.com/bigquery/docs/cached-results" rel="nofollow noreferrer">documentation</a>，默认情况下<code>use_query_cache</code>设置为true，这意味着如果运行相同的查询，它将从以前的查询中检索结果。因此，不会处理任何字节。如果以前运行查询时没有<code>maximum_bytes_billed</code>集。然后以最大数量运行同一查询，即使该查询的处理量大于您现在设置的处理量，该查询仍将运行 在您的例子中，我使用了来自AI平台的Python3笔记本和Shell中的.py文件来运行以下代码 第一个代码 <pre><code>from google.cloud import bigquery import pandas client = bigquery.Client() dataset_ref = client.dataset("stackoverflow", project="bigquery-public-data") dataset = client.get_dataset(dataset_ref) job_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10) job_config.use_query_cache = False working_query = """ SELECT a.id, a.body, a.owner_user_id FROM `bigquery-public-data.stackoverflow.posts_answers` AS a INNER JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q ON q.id = a.parent_id WHERE q.tags LIKE '%bigquery%' """ answers_query_job = client.query(working_query, job_config) answers_query_job.to_dataframe() </code></pre> 第二个代码 <pre><code>from google.cloud import bigquery import pandas client = bigquery.Client() dataset_ref = client.dataset("stackoverflow", project="bigquery-public-data") dataset = client.get_dataset(dataset_ref) job_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10) job_config.use_query_cache = False bad_query = """ SELECT a.id, a.body, a.owner_user_id FROM `bigquery-public-data.stackoverflow.posts_answers` AS a INNER JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q ON q.id = a.parent_id WHERE q.tags LIKE '%bigquery%' """ answers_query_job = client.query(working_query, job_config) answers_query_job.to_dataframe() </code></pre> 上述代码均无效。它们导致了以下错误： <pre><code>Query exceeded limit for bytes billed: 10000000000. 24460132352 or higher required. </code></pre> 另一方面，如果设置了<code>job_config = bigquery.QueryJobConfig(maximum_bytes_billed=25000000000)</code>。两个查询都正常运行

在Python的BigQuery中，空格会导致问题

1 个回答

相关Python问题