循环CSV文件中的数据，以便将“1”和“0”输出到文本文件（Python）

data = {} productIds = [] for row in reader: productIds.append(row['productCode']) if row['basketID'] not in data: data[row['basketID']] = [row['productCode']] else: data[row['basketID']].append(row['productCode']) productIds = sorted(set(productIds)) for item in productIds: txtFile.write("%s " % item) txtFile.write('\n') for key in data: # Will loop through each basket for value in data[key]: #Loop through each product in basket for i in productIds: # Go through list of available products if value == i: txtFile.write('1 ') else: txtFile.write('0 ') txtFile.write('\n')

3条回答

网友

1楼 · 编辑于 2024-05-16 09:01:57

试试这个：

data = {} 
productIds = [] 

for row in reader:
    productIds.append(row['productCode']) 
    if row['basketID'] not in data:
        data[row['basketID']] = set(row['productCode'])
    else:
        data[row['basketID']].add(row['productCode'])

productIds = sorted(set(productIds))

for item in productIds:
    txtFile.write("%s " % item)
txtFile.write('\n')

for key in data: # Will loop through each basket
    for value in sorted(data[key]): #Loop through each product in basket
        for i in productIds: # Go through list of available products
            if value == i: 
                txtFile.write('1 ')
            else:
                txtFile.write('0 ')
    txtFile.write('\n')

网友

2楼 · 编辑于 2024-05-16 09:01:57

问题在于最后一个for循环。您正在遍历每个篮子并迭代当前篮子中的每个产品。对于每个项目，您都要检查它是否等于当前productId。由于有3个productID，您将获得篮子中存在的项目的3倍条目。你知道吗

例如：对于basket1，您正在循环执行第一项=>；23 为此，您在输出文件中创建3个条目：对于productIds中的i 123=23=>；1 223=24=>；0 三。23=25=>；0

另外，你还有一个问题。由于您的dict不是按键排序的，因此篮子循环的顺序不能保证按递增顺序从basket1到basket5。

将最后一个for循环替换为：（对dict排序，然后进行正确的迭代）

data=collections.OrderedDict(sorted(data.items()));
for key in data: # Will loop through each basket
    for productId in productIds: #Loop through each productId
        if productId in data[key]: # check if productId in the basket products 
            txtFile.write('1 ')
        else:
            txtFile.write('0 ')
    txtFile.write('\n')

输出：

网友

3楼 · 编辑于 2024-05-16 09:01:57

我想你应该试试这个。首先读取为数据帧

>>> df = pd.read_csv("lia.csv")
>>> df
   basketID  productCode
0         1           23
1         1           24
2         1           25
3         2           23
4         3           23
5         4           25
6         5           24
7         5           25

那么

g1 = df.groupby( [ "productCode","basketID"] ).count()
g1
Empty DataFrame
Columns: []
Index: [(23, 1), (23, 2), (23, 3), (24, 1), (24, 5), (25, 1), (25, 4), (25, 5)

相关问题更多 >

编程相关推荐

热门问题

热门文章