如何使用python上的loop将字典作为值插入字典

reader = np.array(pd.read_csv("rating_final.csv")) included_cols = [0, 1, 2] sample= {} target=[] target1 =[] for row in reader: content = list(row[i] for i in included_cols) target.append(content[0]) target1.append(content[1:3]) sample = dict(zip(target, target1))

2条回答

网友

1楼 · 编辑于 2024-04-20 15:59:59

这应该是你想要的：

import collections

reader = ...
sample = collections.defaultdict(dict)

for user_id, place_id, rating in reader:
    rating = int(rating)
    sample[user_id][place_id] = rating

print(sample)
# -> {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}}

^{}是一个方便实用程序，每当您试图访问字典中不存在的键时，它都会提供默认值。如果您不喜欢它（例如，因为您希望sample['non-existent-user-id]因KeyError而失败），请使用以下命令：

reader = ...
sample = {}

for user_id, place_id, rating in reader:
    rating = int(rating)
    if user_id not in sample:
        sample[user_id] = {}
    sample[user_id][place_id] = rating

网友

2楼 · 编辑于 2024-04-20 15:59:59

示例中的预期输出是不可能的，因为{'1333': 2}不会与键相关联。不过，如果使用dict个dict，您可以得到{'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}}：

sample = {}
for row in reader:
    userID, placeID, rating = row[:3]
    sample.setdefault(userID, {})[placeID] = rating  # Possibly int(rating)?

或者，使用^{}来避免需要^{}（或者涉及try/except KeyError或if userID in sample:的替代方法，这些方法牺牲了setdefault的原子性，以换取不必要地创建空的dict）：

import collections

sample = collections.defaultdict(dict)
for row in reader:
    userID, placeID, rating = row[:3]
    sample[userID][placeID] = rating

# Optional conversion back to plain dict
sample = dict(sample)

转换回普通的dict可以确保将来的查找不会自动激活键，将KeyError提升为正常状态，如果您print它看起来像正常的dict。你知道吗

如果included_cols很重要（因为名称或列索引可能会更改），您可以使用operator.itemgetter来加快并简化一次提取所有所需列的过程：

from collections import defaultdict
from operator import itemgetter

included_cols = (0, 1, 2)
# If columns in data were actually:
# rating, foo, bar, userID, placeID
# we'd do this instead, itemgetter will handle all the rest:
# included_cols = (3, 4, 0)
get_cols = itemgetter(*included_cols)  # Create function to get needed indices at once

sample = defaultdict(dict)
# map(get_cols, ...) efficiently converts each row to a tuple of just 
# the three desired values as it goes, which also lets us unpack directly
# in the for loop, simplifying code even more by naming all variables directly
for userID, placeID, rating in map(get_cols, reader):
    sample[userID][placeID] = rating  # Possibly int(rating)?

相关问题更多 >

编程相关推荐

热门问题

热门文章