检查CSV中的下一行是否与当前行中的值具有相同的ID

2条回答

网友

1楼 · 编辑于 2024-04-19 06:09:06

考虑使用^{}
使用^{}选择组
- 当对单个列执行.groupby时，该组作为str返回，如果对多个列执行.groupby，则返回tuple
- o_id是str，表示用于groupby的值
- o_id必须是list或tuple才能zip与groupby_list一起创建dict
- d是每个groupby组的数据帧
使用^{}遍历每个组的行
- 返回由第一个_表示的index，因为不需要它
- 返回data，从中删除groupby_list中的标签，然后使用^{}将余数转换为dict，并将其附加到list，att_list
- 循环遍历组的所有行后，将items_list作为值分配给group['items']
迭代每个组后，将dict、group附加到dict_list
dict_list可以通过以下方式转换回数据帧：
- df = pd.json_normalize(dict_list, 'items', meta=groupby_list)

预期产量

{'items': [{'name': 'item6', 'product_code': '323098', 'purchase_price': 1990},
           {'name': 'item7', 'product_code': '3209233', 'purchase_price': 590}],
 'orderId': '012006192158-63823'}

生成预期输出的代码

import pandas as pd
import json

# read in the file
df = pd.read_csv('test.csv')

dict_list = list()
groupby_list = ['order id']

for o_id, d in df.groupby(groupby_list):
    if type(o_id) != tuple:
        o_id = [o_id]
    group = dict(zip(groupby_list, o_id))
    items_list = list()
    for _, data in d.iterrows():
        data = data.drop(labels=groupby_list)
        items_list.append(data.to_dict())
    group['items'] = items_list
    dict_list.append(group)

# save to a file
with open('test.json', 'w') as f:
    json.dump(dict_list, f, ensure_ascii=False)
    jsonfile.write('\n')

最终输出：`dict_list`

[{
        'items': [{
                'name': 'item6',
                'product_code': 323098,
                'purchase_price': 1990
            }, {
                'name': 'item7',
                'product_code': 320923,
                'purchase_price': 590
            }, {
                'name': 'item8',
                'product_code': 325051,
                'purchase_price': 590
            }, {
                'name': 'item9',
                'product_code': 325446,
                'purchase_price': 1990
            }
        ],
        'order id': '012006192158-63823'
    }, {
        'items': [{
                'name': 'item4',
                'product_code': 326353,
                'purchase_price': 1990
            }, {
                'name': 'item5',
                'product_code': 321625,
                'purchase_price': 1490
            }
        ],
        'order id': '012006221101-40527'
    }, {
        'items': [{
                'name': 'item3',
                'product_code': 329452,
                'purchase_price': 1490
            }
        ],
        'order id': '012006241026-13750'
    }, {
        'items': [{
                'name': 'item1',
                'product_code': 321618,
                'purchase_price': 1380
            }, {
                'name': 'item2',
                'product_code': 321618,
                'purchase_price': 690
            }
        ],
        'order id': '012006251700-68811'
    }
]

`test.csv`

order id,name,product_code,purchase_price
012006251700-68811,item1,321618,1380
012006251700-68811,item2,321618,690
012006241026-13750,item3,329452,1490
012006221101-40527,item4,326353,1990
012006221101-40527,item5,321625,1490
012006192158-63823,item6,323098,1990
012006192158-63823,item7,320923,590
012006192158-63823,item8,325051,590
012006192158-63823,item9,325446,1990

网友

2楼 · 编辑于 2024-04-19 06:09:06

在问题中的代码中，我认为它将通过缩进行来实现您想要的功能

 current_purchase = purchase

所以它在else块内

然而，这种任务——迭代集合并按键分组——可以通过使用itertools.groupby函数来简化。给定一个已排序的集合，它将为您进行分组。operator.itemgetter函数可用于减少从行字典中获取值所需的代码量

import csv
import itertools
import operator
import json

output = {'purchases': []}

reader = csv.DictReader(buf)

# Sort the rows by `id` - if the data is not guaranteed to be sorted.
# If the order id guaranteed, pass `reader` to itertools.groupby.
keyfunc = operator.itemgetter('id')
rows = sorted(reader, key=keyfunc)

# Make a function to build the item dictionaries.
item_keys = ('id', 'name')
item_values = operator.itemgetter(*item_keys)
additional_keys = ('product_code', 'purchase_price')
additional_values = operator.itemgetter(*additional_keys)


def build_item(purchase):
    item = dict(zip(item_keys, item_values(purchase)))
    item['additionalFields'] = dict(zip(additional_keys, additional_values(purchase)))
    return item


for _, purchases in itertools.groupby(rows, keyfunc):
    # Get the first row, because we need some of the data to build purchaseBody.
    purchase = next(purchases)
    # Initialise the items dict with data from the first purchase, and add the rest.
    items = [build_item(purchase)]
    items.extend(build_item(purchase) for p in purchases)
    purchaseBody = {
        'id': purchase['id'],
        'user': {'email': purchase['email']},
        'total': sum(float(item['additionalFields']['purchase_price']) for item in items),
        'createdAt': '2020-08-02',
        'items': items,
    }
    output['purchases'].append(purchaseBody)

with open('file.json', 'w') as jsonfile:
    json.dump(output, jsonfile, ensure_ascii=False)
    jsonfile.write('\n')

预期产量

生成预期输出的代码

最终输出：`dict_list`

`test.csv`

相关问题更多 >

编程相关推荐

热门问题

热门文章

检查CSV中的下一行是否与当前行中的值具有相同的ID

预期产量

生成预期输出的代码

最终输出：dict_list

test.csv

相关问题 更多 >

编程相关推荐

热门问题

热门文章

最终输出：`dict_list`

`test.csv`

相关问题更多 >