检查CSV中的下一行是否与当前行中的值具有相同的ID

2024-04-19 06:09:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在做一个项目,从CSV读取购买数据,并为API负载输出JSON。有多行具有相同的订单ID,因为它们中的每一行都有一个单独的项目,我希望在创建购买负载之前将这些项目合并到一个数组中

order id             name    product_code   purchase_price
012006251700-68811   item1   321618         1380
012006251700-68811   item1   321618         690
012006241026-13750   item2   329452         1490
012006221101-40527   item3   326353         1990
012006221101-40527   item4   321625         1490
012006192158-63823   item5   323098         1990
012006192158-63823   item6   320923         590
012006192158-63823   item7   325051         590
012006192158-63823   item8   325446         1990

我已经能够从CSV导入行,并且正在检查当前购买的ID值,但是还不能得到我想要的结果

下面的代码应该检查下一行的ID值是否相同,如果相同,则只将项目详细信息添加到项目数组中

如果下一行没有相同的ID,else语句将把完整的购买添加到purchases数组中

import csv
import json

output = {'purchases': []}
items = {'items': []}
purchaseBody = {}
current_purchase = None

with open('tester - tester.csv') as csv_file:
    for purchase in csv.DictReader(csv_file):
        if current_purchase is not None and purchase['id'] == current_purchase['id']:
            items['items'].append({'id': purchase['id'],
                                  'name': purchase['name'],
                                  'product_code': purchase['product_code']
                                  'purchase_price': purchase['purchase_price'],
                                  })
                 
        else:

            purchaseBody = {
                'id': purchase['id'],
                'user': {'email': purchase['email']},
                'total': purchase['total'],
                'createdAt': purchase['createdAt']
                }
            items['items'].append({'id': purchase['id'],
                                  'name': purchase['name'],
                                  'product_code': purchase['product_code']
                                  'purchase_price': purchase['purchase_price'],
                                  })
            output['purchases'].append(purchaseBody)
            items = {'items': []}   
            purchaseBody.update(items)


        current_purchase = purchase


with open('file.json', 'w') as jsonfile:
    json.dump(output, jsonfile, ensure_ascii=False)
    jsonfile.write('\n')

所需的输出应类似于以下内容:

{
    
    "purchases": [{
    
        "id": "purchase id",
    
        "user": {
    
            "email": "email"
    
        },
    
        "items": [{
    
                "id": "id1",
    
                "name": "name1",
    
                "additionalFields": {
    
                    "product_code": "product_code1",
    
                    "purchase_price": "purchase_price1"
    
                }
    
            },
    
            {
    
                "id": "id2",
    
                "name": "name2",
    
                "additionalFields": {
    
                    "product_code": "product_code2",
    
                    "purchase_price": "purchase_price2"
    
                }
    
            }
    
        ],
    
        "total": "total",
    
        "createdAt": "createdAt"
    
    }]
    
}

Tags: csv项目nameidemailcodeitemscurrent
2条回答
  • 考虑使用^{}
  • 使用^{}选择组
    • 当对单个列执行.groupby时,该组作为str返回,如果对多个列执行.groupby,则返回tuple
    • o_idstr,表示用于groupby的值
    • o_id必须是listtuple才能zipgroupby_list一起创建dict
    • d是每个groupby组的数据帧
  • 使用^{}遍历每个组的行
    • 返回由第一个_表示的index,因为不需要它
    • 返回data,从中删除groupby_list中的标签,然后使用^{}将余数转换为dict,并将其附加到listatt_list
    • 循环遍历组的所有行后,将items_list作为值分配给group['items']
  • 迭代每个组后,将dictgroup附加到dict_list
  • dict_list可以通过以下方式转换回数据帧:
    • df = pd.json_normalize(dict_list, 'items', meta=groupby_list)

预期产量

{'items': [{'name': 'item6', 'product_code': '323098', 'purchase_price': 1990},
           {'name': 'item7', 'product_code': '3209233', 'purchase_price': 590}],
 'orderId': '012006192158-63823'}

生成预期输出的代码

import pandas as pd
import json

# read in the file
df = pd.read_csv('test.csv')

dict_list = list()
groupby_list = ['order id']

for o_id, d in df.groupby(groupby_list):
    if type(o_id) != tuple:
        o_id = [o_id]
    group = dict(zip(groupby_list, o_id))
    items_list = list()
    for _, data in d.iterrows():
        data = data.drop(labels=groupby_list)
        items_list.append(data.to_dict())
    group['items'] = items_list
    dict_list.append(group)

# save to a file
with open('test.json', 'w') as f:
    json.dump(dict_list, f, ensure_ascii=False)
    jsonfile.write('\n')

最终输出:dict_list

[{
        'items': [{
                'name': 'item6',
                'product_code': 323098,
                'purchase_price': 1990
            }, {
                'name': 'item7',
                'product_code': 320923,
                'purchase_price': 590
            }, {
                'name': 'item8',
                'product_code': 325051,
                'purchase_price': 590
            }, {
                'name': 'item9',
                'product_code': 325446,
                'purchase_price': 1990
            }
        ],
        'order id': '012006192158-63823'
    }, {
        'items': [{
                'name': 'item4',
                'product_code': 326353,
                'purchase_price': 1990
            }, {
                'name': 'item5',
                'product_code': 321625,
                'purchase_price': 1490
            }
        ],
        'order id': '012006221101-40527'
    }, {
        'items': [{
                'name': 'item3',
                'product_code': 329452,
                'purchase_price': 1490
            }
        ],
        'order id': '012006241026-13750'
    }, {
        'items': [{
                'name': 'item1',
                'product_code': 321618,
                'purchase_price': 1380
            }, {
                'name': 'item2',
                'product_code': 321618,
                'purchase_price': 690
            }
        ],
        'order id': '012006251700-68811'
    }
]

test.csv

order id,name,product_code,purchase_price
012006251700-68811,item1,321618,1380
012006251700-68811,item2,321618,690
012006241026-13750,item3,329452,1490
012006221101-40527,item4,326353,1990
012006221101-40527,item5,321625,1490
012006192158-63823,item6,323098,1990
012006192158-63823,item7,320923,590
012006192158-63823,item8,325051,590
012006192158-63823,item9,325446,1990

在问题中的代码中,我认为它将通过缩进行来实现您想要的功能

 current_purchase = purchase

所以它在else块内

然而,这种任务——迭代集合并按键分组——可以通过使用itertools.groupby函数来简化。给定一个已排序的集合,它将为您进行分组。operator.itemgetter函数可用于减少从行字典中获取值所需的代码量

import csv
import itertools
import operator
import json

output = {'purchases': []}

reader = csv.DictReader(buf)

# Sort the rows by `id` - if the data is not guaranteed to be sorted.
# If the order id guaranteed, pass `reader` to itertools.groupby.
keyfunc = operator.itemgetter('id')
rows = sorted(reader, key=keyfunc)

# Make a function to build the item dictionaries.
item_keys = ('id', 'name')
item_values = operator.itemgetter(*item_keys)
additional_keys = ('product_code', 'purchase_price')
additional_values = operator.itemgetter(*additional_keys)


def build_item(purchase):
    item = dict(zip(item_keys, item_values(purchase)))
    item['additionalFields'] = dict(zip(additional_keys, additional_values(purchase)))
    return item


for _, purchases in itertools.groupby(rows, keyfunc):
    # Get the first row, because we need some of the data to build purchaseBody.
    purchase = next(purchases)
    # Initialise the items dict with data from the first purchase, and add the rest.
    items = [build_item(purchase)]
    items.extend(build_item(purchase) for p in purchases)
    purchaseBody = {
        'id': purchase['id'],
        'user': {'email': purchase['email']},
        'total': sum(float(item['additionalFields']['purchase_price']) for item in items),
        'createdAt': '2020-08-02',
        'items': items,
    }
    output['purchases'].append(purchaseBody)

with open('file.json', 'w') as jsonfile:
    json.dump(output, jsonfile, ensure_ascii=False)
    jsonfile.write('\n')

相关问题 更多 >