简化配置单元日期范围。
hpdr的Python项目详细描述
当您的配置单元分区是yyyy、mm、dd、hh而不是yyyymmddhh时,组合日期 有他们的靶场会失控。
hpdr解决了这个问题
>>>fromhpdrimportapi>>>rng=api.build('2016102612','2017122612').partition_range>>>print(rng.build_display())((YYYY=2016ANDMM=10ANDDD=26ANDHH>=12)OR(YYYY=2016ANDMM=10ANDDD>26)OR(YYYY=2016ANDMM>10)OR(YYYY=2017ANDMM<12)OR(YYYY=2017ANDMM=12ANDDD<26)OR(YYYY=2017ANDMM=12ANDDD=26ANDHH<12))>>>print(rng.build_display(pretty=True))((YYYY=2016ANDMM=10ANDDD=26ANDHH>=12)OR(YYYY=2016ANDMM=10ANDDD>26)OR(YYYY=2016ANDMM>10)OR(YYYY=2017ANDMM<12)OR(YYYY=2017ANDMM=12ANDDD<26)OR(YYYY=2017ANDMM=12ANDDD=26ANDHH<12))
也许你认为在当地时间,但存储你的数据在UTC
>>>fromhpdrimportapi>>>rng=api.build('2016102612','2017122612',...dzone='America/Los_Angeles',...qzone='UTC').partition_range>>>print(rng.build_display(pretty=True))((YYYY=2016ANDMM=10ANDDD=26ANDHH>=19)OR(YYYY=2016ANDMM=10ANDDD>26)OR(YYYY=2016ANDMM>10)OR(YYYY=2017ANDMM<12)OR(YYYY=2017ANDMM=12ANDDD<26)OR(YYYY=2017ANDMM=12ANDDD=26ANDHH<20)
或者,您的日期范围太大,无法在一个查询中运行,而将其分解是一种痛苦
importsubprocess,os,os.path,tempfile,datetimefromhpdrimportapiQUERY_FILE='myquery.hql'OUT_FILE='out.txt'begin=datetime.datetime(2016,11,1)end=datetime.datetime(2016,11,30)step='5days'withopen(QUERY_FILE)asf:template=f.read()specs=api.build_with_steps(begin=begin,end=end,step=step)ifos.path.isfile(OUT_FILE):os.remove(OUT_FILE)forspecinspecs:query=spec.substitute(template)withtempfile.NamedTemporaryFile()asf:f.write(query)f.flush()cmd=['/usr/bin/hive','-f',f.name]print(spec.partition_range.build_display())withopen(OUT_FILE,'a')asoutfile:subprocess.check_call(cmd,stdout=outfile)
打印:
(YYYY=2016 AND MM=11 AND DD>=01 AND DD<06) (YYYY=2016 AND MM=11 AND DD>=06 AND DD<11) (YYYY=2016 AND MM=11 AND DD>=11 AND DD<16) (YYYY=2016 AND MM=11 AND DD>=16 AND DD<21) (YYYY=2016 AND MM=11 AND DD>=21 AND DD<26) (YYYY=2016 AND MM=11 AND DD>=26 AND DD<30)
它运行从包含hpdr_u变量的模板生成的6个配置单元查询。像这样的:
SELECT YEAR(event_timestamp), MONTH(event_timestamp), DAY(event_timestamp), FROM my_table WHERE event_timestamp >= '${HPDR_begin_ts}' AND event_timestamp < '${HPDR_end_ts}' AND ${HPDR_range}
第一个查询如下所示。
SELECT YEAR(event_timestamp), MONTH(event_timestamp), DAY(event_timestamp), FROM my_table WHERE event_timestamp >= '2016-11-01 00:00:00' AND event_timestamp < '2016-11-06 00:00:00' AND (YYYY=2016 AND MM=11 AND DD>=01 AND DD<06)
第一个查询可用的HPDR_u变量的完整列表是:
variable value --------------------------- ------------------- HPDR_dzone UTC HPDR_qzone UTC HPDR_begin_ts 2016-11-01 00:00:00 HPDR_end_ts 2016-11-06 00:00:00 HPDR_slop_begin_ts 2016-11-01 00:00:00 HPDR_slop_end_ts 2016-11-06 00:00:00 HPDR_begin_unixtime 1477983600 HPDR_begin_unixtime_ms 1477983600000 HPDR_begin_yyyymmdd 20161101 HPDR_begin_yyyy 2016 HPDR_begin_mm 11 HPDR_begin_dd 01 HPDR_begin_hh 00 HPDR_begin_min 00 HPDR_begin_sec 00 HPDR_end_unixtime 1478415600 HPDR_end_unixtime_ms 1478415600000 HPDR_end_yyyymmdd 20161106 HPDR_end_yyyy 2016 HPDR_end_mm 11 HPDR_end_dd 06 HPDR_end_hh 00 HPDR_end_min 00 HPDR_end_sec 00 HPDR_slop_begin_unixtime 1477983600 HPDR_slop_begin_unixtime_ms 1477983600000 HPDR_slop_begin_yyyymmdd 20161101 HPDR_slop_begin_yyyy 2016 HPDR_slop_begin_mm 11 HPDR_slop_begin_dd 01 HPDR_slop_begin_hh 00 HPDR_slop_begin_min 00 HPDR_slop_begin_sec 00 HPDR_slop_end_unixtime 1478415600 HPDR_slop_end_unixtime_ms 1478415600000 HPDR_slop_end_yyyymmdd 20161106 HPDR_slop_end_yyyy 2016 HPDR_slop_end_mm 11 HPDR_slop_end_dd 06 HPDR_slop_end_hh 00 HPDR_slop_end_min 00 HPDR_slop_end_sec 00