在每日批处理作业中间获取“查询执行期间超出资源”

2024-04-19 15:17:03 发布

您现在位置:Python中文网/ 问答频道 /正文

对于bigquery python客户端,已配置为使用标准SQL

query_job = self.client.run_async_query(str(uuid.uuid4()), query_str)
query_job.use_query_cache = True     #  query cache 
query_job.use_legacy_sql = False

但是,在发送查询时,会在批处理作业执行过程中出现以下400个错误-抱怨执行过程中超出了资源。这个查询相当简单——在一个每日分区表(每天大约有4000万行,总数据量为15-20G)中,在30分钟范围内获得及时排序的行。由于每个查询覆盖30分钟的范围,因此同一个查询将运行48次以覆盖一天。每个查询返回500k-150万行,数据量在数百兆字节的范围内。下面的查询最初执行得很好,但只有在10-20次迭代之后,才会弹出RESOURCES exceeds错误。你知道吗

bigquery的专家、专家、以前遇到过同样问题的开发人员是否可以帮助您给出一些提示,说明这里可能出了什么问题。非常感谢!你知道吗

罗伊

SELECT
  user_id,
  client_ip,
  url,
  req_ts,
  req_body,
  resp_body,
  status
FROM
  xxxx.table
WHERE
  DATE(_PARTITIONTIME) = '2017-09-16'
  AND req_ts >= '2017-09-16 15:30:00'
  AND req_ts < '2017-09-16 16:00:00' order by req_ts



File "../datastore/bigquery.py", line 202, in sendQuery

    query_job.result()  #Wait for job to complete

  File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/job.py", line 492, in result

    return super(_AsyncJob, self).result(timeout=timeout)

  File "/usr/local/lib/python2.7/dist-packages/google/api/core/future/polling.py", line 104, in result

    self._blocking_poll(timeout=timeout)

  File "/usr/local/lib/python2.7/dist-packages/google/api/core/future/polling.py", line 84, in _blocking_poll

    retry_(self._done_or_raise)()

  File "/usr/local/lib/python2.7/dist-packages/google/api/core/retry.py", line 258, in retry_wrapped_func

    on_error=on_error,

  File "/usr/local/lib/python2.7/dist-packages/google/api/core/retry.py", line 175, in retry_target

    return target()

  File "/usr/local/lib/python2.7/dist-packages/google/api/core/future/polling.py", line 62, in _done_or_raise

    if not self.done():

  File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/job.py", line 1301, in done

    self._query_results = self._client.get_query_results(self.name)

  File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/client.py", line 196, in get_query_results

    method='GET', path=path, query_params=extra_params)

  File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 293, in api_request

    raise exceptions.from_http_response(response)

BadRequest: 400 GET https://www.googleapis.com/bigquery/v2/projects/fluted-house-161501/queries/ab8534f8-fe52-448c-84fe-b8702ee7b87c?maxResults=0: Resources exceeded during query execution: The query could not be executed in the allotted memory.

Tags: inpyselfapilibpackagesusrlocal
1条回答
网友
1楼 · 发布于 2024-04-19 15:17:03

问题在于,在输出结果之前,将整个结果移动到一个辅助进程进行最终排序的顺序。如果结果足够大,这通常会导致“查询执行期间超出资源”

这里的建议是要么增加一些合理的数量限制-在这种情况下-偏序发生在所有工人和最终的顺序是在一个节点上,但现在相当减少的结果或只是删除所有的顺序,并在客户端进行您的排序

有关ORDER BY的更多信息,请参见^{}第二段

相关问题 更多 >