如何使用时间戳间隔条件加速python循环

2024-06-07 17:13:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这段代码,这是相当匆忙,但它的工作一般。它唯一能永远运行的东西。其思想是更新包含1495748行的表上的2列,以便首先查询时间戳列表的编号。对于每个更新值,必须进行比较,其中时间戳必须在每小时间隔内,该间隔由来自api的两个不同DICT中的两个时间戳形成。有没有一种方法可以稍微加快速度,或者多处理它

提示:db_mac=db_连接到Postgres数据库

响应如下所示: {'meta':{'source':'National Oceanical and Atmospheric Administration,Deutscher Wetterdianst'},'数据':[时间:2019-11-26 23:00:00;当地时间:2019-11-27 00:00;温度:8.3;露点:5.9;湿度:85;降水:0;降水:无;降水:无;雪深:无;风速:11;峰值:21;风向:160;压力:1004.2;条件:4};时间:2019-11-27 00:00

import requests
import db_mac
from collections import defaultdict
import datetime
import time
t = time.time()

station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]

dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)

hist_weather_list = []

for d in dates:
    end = d[1]
    start = d[0]
    print(start, end)
    url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=<APIKEY>".format(station=station[0], start=start, end=end, timezone=station[-1])

    response = requests.get(url)
    weather = response.json()
    print(weather)
    for i in weather["data"]:
        hist_weather_list.append(i)

sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)

hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))

for row in result:
    try:
        ts_dump = datetime.datetime.timestamp(row[0])
        for i, hour in enumerate(hist_weather_list):
            ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))

            ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))

            if ts1 <= ts_dump and ts_dump < ts2:

                insert_dict[row[0]] = (hour["temperature"], hour["pressure"])

    except Exception as e:
        pass



hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))


for key, value in insert_dict.items():
    sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
    db_mac.execute(sql2)


hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))

更新用于多处理的代码。我将让它在夜间运行,并更新运行时间

import requests
import db_mac
from collections import defaultdict
import datetime
import time
import multiprocessing as mp
t = time.time()



station = [10382,"DE","Berlin / Tegel",52.5667,13.3167,37,"EDDT",10382,"TXL","Europe/Berlin"]

dates = [("2019-11-20","2019-11-22"), ("2019-11-27","2019-12-02") ]
insert_dict = defaultdict(tuple)

hist_weather_list = []

for d in dates:
    end = d[1]
    start = d[0]
    print(start, end)
    url = "https://api.meteostat.net/v1/history/hourly?station={station}&start={start}&end={end}&time_zone={timezone}&&time_format=Y-m-d%20H:i&key=wzwi2YR5".format(station=station[0], start=start, end=end, timezone=station[-1])

    response = requests.get(url)
    weather = response.json()
    print(weather)
    for i in weather["data"]:
        hist_weather_list.append(i)

sql = "select timestamp from dump order by timestamp asc"
result = db_mac.execute(sql)

hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step1 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))


def find_parameters(x):

    for row in result[x[0]:x[1]]:
        try:
            ts_dump = datetime.datetime.timestamp(row[0])
            for i, hour in enumerate(hist_weather_list):
                ts1 = datetime.datetime.timestamp(datetime.datetime.strptime(hour["time"], '%Y-%m-%d %H:%M:%S'))

                ts2 = datetime.datetime.timestamp(datetime.datetime.strptime(hist_weather_list[i + 1]["time"], '%Y-%m-%d %H:%M:%S'))

                if ts1 <= ts_dump and ts_dump < ts2:

                    insert_dict[row[0]] = (hour["temperature"], hour["pressure"])

        except Exception as e:
            pass

step1 = int(len(result) /4)
step2 = 2 * step1
step3 = 3 * step1
step4 = len(result)
steps = [[0,step1],[step1,step2],[step2,step3], [step3,step4]]


pool = mp.Pool(mp.cpu_count())
pool.map(find_parameters, steps)

hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step2 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))


for key, value in insert_dict.items():
    sql2 = """UPDATE dump SET temperature = """ + str(value[0]) + """, pressure = """+ str(value[1]) + """ WHERE timestamp = '"""+ str(key) + """';"""
    db_mac.execute(sql2)


hours, rem = divmod(time.time() - t, 3600)
minutes, seconds = divmod(rem, 60)
print("step3 {:0>2}:{:0>2}:{:05.2f}".format(int(hours),int(minutes),seconds))

更新2 它完成了,并在一个树莓圆周率上运行了4个内核,运行了2:45个小时。但是有没有更有效的方法来做这些事情呢


Tags: importfordatetimetimestarttimestampintend
1条回答
网友
1楼 · 发布于 2024-06-07 17:13:43

因此,我可以想出一些小办法来加快速度。我认为任何一点都有帮助,尤其是当你有很多行要处理的时候。首先,print语句会大大降低代码的速度。如果不需要的话,我会把它们扔掉

最重要的是,在循环的每次迭代中都调用api。等待API的响应可能占用了您的大部分时间。我看了一下您正在使用的api,但不知道您使用它的确切情况,也不知道您的日期“开始”和“结束”是什么样子,但是如果您可以在更少的调用中完成,那么肯定会大大加快这个循环。另一种方法是,api似乎有一个可以下载和使用的.csv版本的数据。在本地数据上运行这个会快得多。如果你选择这条路线,我建议你使用熊猫。(对不起,如果你已经知道熊猫了,我解释得太多了)你可以使用:df=pd.read\u csv(“filename.csv”)并从那里轻松编辑表格。您还可以执行df.to_sql(params)来写入数据库。如果您需要帮助形成此代码的熊猫版本,请告诉我

另外,从您的代码中不确定这是否会导致错误,但我会尝试,而不是您的for循环(在weather[“data”]中为I)。 历史天气列表+=天气[“数据”] 或者可能 历史天气列表+=[天气[“数据”]

让我知道进展如何

相关问题 更多 >

    热门问题