重写循环/加速cod

2024-04-28 23:39:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我写了一段代码。我相信它可以以更快的方式实现。有没有人有什么建议。我知道嵌套for循环不是很像python。你知道吗

这段代码基本上是通过改变窗口大小来计算文档主题矩阵中文档之间的KLD散度。你知道吗

def calculate(dataframe):
dates = []
novelty = []
transience = []
resonance = []
time_frame = []
for w in range(1, 500, 1):
    print(w)
    for i in range(w, doc_topic.shape[0] - w ):
        time_frame.append(w)
        avg_transience = 0
        avg_novelty = 0
        dates.append(dataframe.iloc[i]['date'])
        novelties = []
        transiences = []
        for d in range(1, w+1):
            novelties.append(KLD(doc_topic[i], doc_topic[i-d]))
            transiences.append(KLD(doc_topic[i], doc_topic[i+d]))
        avg_novelty = 1/w * np.sum(novelties)
        avg_transience = 1/w * np.sum(transiences)
        transience.append(avg_transience)
        novelty.append(avg_novelty)
        resonance = [a - b for a, b in zip(novelty, transience)]
        resonance = np.array(resonance)
df_kld = pd.DataFrame(list(zip(transience, novelty, resonance)),
          columns=['transience','novelty', 'resonance'])
df_kld['time_frame'] = time_frame
df_kld['dates'] = dates
df_kld.to_pickle('df_kld_final.pkl')
return df_kld

Tags: indffortopicdoctimeframeavg