通过嵌套python字典进行高效迭代

2024-04-25 19:55:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据集,其中的值被确定为满足特定的标准,用于执行概率计算,作为求和的一部分。目前,我将数据保存在嵌套字典中,以简化确定性处理的过程。你知道吗

我正在使用的算法被证明是非常昂贵的,过了一段时间就会占用内存。你知道吗

处理的psudocode如下所示:

for businessA in business : # iterate over 77039 values 
    for businessB in business : # iterate over 77039 values
        if businessA != businessB :
            for rating in business[businessB] : # where rating is 1 - 5
                for review in business[businessB][rating] :
                    user = reviewMap[review]['user'];
                    if user in business[businessA]['users'] :
                        for users in business[businessA]['users'] :
                            # do something
                # do probability
                # a print is here

如何更有效地编写上述内容,以保持每个业务a的准确概率总和?你知道吗


编辑包括源代码-这里,businessA和businessB在不同的字典中,但是值得注意的是,它们在每个字典中都有相同的businessid(bid)。它只是改变了每种方法的价值键:值对。你知道吗

def crossMatch(TbidMap) :
    for Tbid in TbidMap :
        for Lbid in LbidMap :
            # Ensure T and L aren't the same business
            if Tbid != Lbid :
                # Get numer of reviews at EACH STAR rate for L
                for stars in LbidMap[Lbid] :
                    posTbid = 0;
                    # For each review check if user rated the Tbid
                    for Lreview in LbidMap[Lbid][stars] :
                        user = reviewMap[Lreview]['user'];
                        if user in TbidMap[Tbid] :
                            # user rev'd Tbid, get their Trid & see if gave Tbid pos rev
                            for Trid in TbidMap[Tbid][user] :
                                Tstar = reviewMap[Trid]['stars'];
                                if Tstar in pos_list :
                                    posTbid += 1;
                    #probability calculations happen here

Tags: inforif字典businessusersreviewrating
1条回答
网友
1楼 · 发布于 2024-04-25 19:55:51

在你的数据集中有超过50亿个公司的组合,这真的会给你的记忆带来压力。我认为您正在将所有结果存储到内存中;相反,我将临时转储到数据库并释放容器。这是一个方法的草图,因为我没有真正的数据来测试,当你遇到困难时,它可能更容易回应你的困难。理想情况下,应该有一个用于嵌套列表的临时容器,这样您就可以使用executemany,但这是一个用缩写名和没有测试数据嵌套的容器,很难理解。你知道吗

import sqlite3

def create_interim_mem_dump(cursor, connection):

    query = """CREATE TABLE IF NOT EXISTS ratings(
            Tbid TEXT,
            Lbid TEXT,
            posTbid TEXT)
            """
    cursor.execute(query)
    connection.commit()


def crossMatch(TbidMap, cursor, connection) :
    for Tbid in TbidMap :
        for Lbid in LbidMap :
            # Ensure T and L aren't the same business
            if Tbid != Lbid :
                # Get numer of reviews at EACH STAR rate for L
                for stars in LbidMap[Lbid] :
                    posTbid = 0;
                    # For each review check if user rated the Tbid
                    for Lreview in LbidMap[Lbid][stars] :
                        user = reviewMap[Lreview]['user'];
                        if user in TbidMap[Tbid] :
                            # user rev'd Tbid, get their Trid & see if gave Tbid pos rev
                            for Trid in TbidMap[Tbid][user] :
                                Tstar = reviewMap[Trid]['stars'];
                                if Tstar in pos_list :
                                    posTbid += 1;   
                    query = """INSERT INTO ratings (Tbid, Lbid, posTbid) 
                            VALUES (?, ?, ?)"""
                    cursor.execute(query, (Tbid, Lbid, posTbid))
        connection.commit()



if __name__ == '__main__':
    conn = sqlite3.connect('collated_ratings.db')
    c = conn.cursor()

    create_db = create_interim_mem_dump(c, conn)
    your_data = 'Some kind of dictionary into crossMatch()'
    c.close()
    conn.close()

相关问题 更多 >