将CSV转为字典，指定列并去除重复值

0 投票

4 回答

40 浏览

提问于 2025-04-13 16:42

我的输入csv文件大概是这样的：

Flower;Price;Customer;Color
Rose;20;Sam;Red, Yellow, Pink, Red
Orchids;50;Jim;White, Purple, White
Tulips;15;Genny;Red, White, Yellow, Yellow

首先，我想丢掉“价格”和“客户”这两列，只保留“花”和“颜色”。为此，我正在做：

data = pd.read_csv(filename, usecols=['Flower','Color'], sep=';')

现在，我想从“颜色”这一列中去掉重复的值，并保持它们是有序的（有序集合）。

我尝试过的一些方法，但都没有成功：

data.set_index("Name").T.to_dict('sorted(set)')

data.unique().agg(set).to_dict()

我期望的输出是这样的：

{
"Rose": [Pink, Red, Yellow],
"Orchids":[Purple, White],
"Tulips":[Red, White, Yellow]
}

请告诉我我缺少了什么。谢谢你的帮助..!

数据处理字典数据清洗数据转换 csv 列选择去重有序集合

4 个回答

你可以像下面这样使用核心的Python：

lines = """Flower;Price;Customer;Color
Rose;20;Sam;Red, Yellow, Pink, Red
Orchids;50;Jim;White, Purple, White
Tulips;15;Genny;Red, White, Yellow, Yellow"""

result = {}
for line in lines.split('\n')[1:]:
    parts = line.split(';')
    result[parts[0]] = list(set(list(map(lambda x: x.strip(), line.split(';')[-1].split(',')))))

结果：

{
    "Rose": [
        "Red",
        "Yellow",
        "Pink"
    ],
    "Orchids": [
        "White",
        "Purple"
    ],
    "Tulips": [
        "Red",
        "Yellow",
        "White"
    ]
}

回答于 2025-04-13 由 Python大师

分享举报

用pandas来做这个事情没什么意义，因为你并不需要一个数据框（DataFrame）。

直接用标准库里的csv模块就可以了：

import csv
import re

out = {row['Flower']: sorted(set(re.split(', *', row['Color'].strip())))
       for row in csv.DictReader(filename, delimiter=';')}

输出结果：

{'Rose': ['Pink', 'Red', 'Yellow'],
 'Orchids': ['Purple', 'White'],
 'Tulips': ['Red', 'White', 'Yellow']}

回答于 2025-04-13 由 Python大师

分享举报

代码

out = {key: sorted(set(colors.split(', '))) for key, colors in data.values}

输出:

{'Rose': ['Pink', 'Red', 'Yellow'],
 'Orchids': ['Purple', 'White'],
 'Tulips': ['Red', 'White', 'Yellow']}

示例代码

import pandas as pd
import io

txt = '''Flower;Price;Customer;Color
Rose;20;Sam;Red, Yellow, Pink, Red
Orchids;50;Jim;White, Purple, White
Tulips;15;Genny;Red, White, Yellow, Yellow'''

data = pd.read_csv(io.StringIO(txt), usecols=['Flower','Color'], sep=';')

数据

    Flower  Color
0   Rose    Red, Yellow, Pink, Red
1   Orchids White, Purple, White
2   Tulips  Red, White, Yellow, Yellow

回答于 2025-04-13 由 Python大师

分享举报

将CSV转为字典，指定列并去除重复值

4 个回答

撰写回答