对于bigquery python客户端,已配置为使用标准SQL
query_job = self.client.run_async_query(str(uuid.uuid4()), query_str)
query_job.use_query_cache = True # query cache
query_job.use_legacy_sql = False
但是,在发送查询时,会在批处理作业执行过程中出现以下400个错误-抱怨执行过程中超出了资源。这个查询相当简单——在一个每日分区表(每天大约有4000万行,总数据量为15-20G)中,在30分钟范围内获得及时排序的行。由于每个查询覆盖30分钟的范围,因此同一个查询将运行48次以覆盖一天。每个查询返回500k-150万行,数据量在数百兆字节的范围内。下面的查询最初执行得很好,但只有在10-20次迭代之后,才会弹出RESOURCES exceeds
错误。你知道吗
bigquery的专家、专家、以前遇到过同样问题的开发人员是否可以帮助您给出一些提示,说明这里可能出了什么问题。非常感谢!你知道吗
罗伊
SELECT
user_id,
client_ip,
url,
req_ts,
req_body,
resp_body,
status
FROM
xxxx.table
WHERE
DATE(_PARTITIONTIME) = '2017-09-16'
AND req_ts >= '2017-09-16 15:30:00'
AND req_ts < '2017-09-16 16:00:00' order by req_ts
File "../datastore/bigquery.py", line 202, in sendQuery
query_job.result() #Wait for job to complete
File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/job.py", line 492, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/google/api/core/future/polling.py", line 104, in result
self._blocking_poll(timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/google/api/core/future/polling.py", line 84, in _blocking_poll
retry_(self._done_or_raise)()
File "/usr/local/lib/python2.7/dist-packages/google/api/core/retry.py", line 258, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python2.7/dist-packages/google/api/core/retry.py", line 175, in retry_target
return target()
File "/usr/local/lib/python2.7/dist-packages/google/api/core/future/polling.py", line 62, in _done_or_raise
if not self.done():
File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/job.py", line 1301, in done
self._query_results = self._client.get_query_results(self.name)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/client.py", line 196, in get_query_results
method='GET', path=path, query_params=extra_params)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
BadRequest: 400 GET https://www.googleapis.com/bigquery/v2/projects/fluted-house-161501/queries/ab8534f8-fe52-448c-84fe-b8702ee7b87c?maxResults=0: Resources exceeded during query execution: The query could not be executed in the allotted memory.
问题在于,在输出结果之前,将整个结果移动到一个辅助进程进行最终排序的顺序。如果结果足够大,这通常会导致“查询执行期间超出资源”
这里的建议是要么增加一些合理的数量限制-在这种情况下-偏序发生在所有工人和最终的顺序是在一个节点上,但现在相当减少的结果或只是删除所有的顺序,并在客户端进行您的排序
有关ORDER BY的更多信息,请参见^{} 第二段
相关问题 更多 >
编程相关推荐