当存在python列表时，如何获得dataframe列的唯一值

df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow , red'], 'orange'], 'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']}) Output: colors names 0 green Terry 1 green Nor 2 purple Franck 3 [yellow , red] Pete 4 orange Agnes

df = df[~df.colors.str.contains(',', na=False)] # Nothing happens df = df[~df.colors.str.contains('[', na=False)] # Output: error: unterminated character set at position 0 df = df[~df.colors.str.contains(']', na=False)] # Nothing happens

3条回答

网友

1楼 · 编辑于 2024-05-29 01:41:27

假设数据帧中的每个值都很重要，下面是我经常使用的“解包列表”技术：

import re

def unlock_list_from_string(string, delim=','):
    """
    lists are stored as strings (in csv files) ex. '[1,2,3]'
    this function unlocks that list
    """
    if type(string)!=str:
        return string

    # remove brackets
    clean_string = re.sub('\[|\]', '', string)
    unlocked_string = clean_string.split(delim)
    unlocked_list = [x.strip() for x in unlocked_string]
    return unlocked_list

all_colors_nested = df['colors'].apply(unlock_list_from_string)
# unnest
all_colors = [x for y in all_colors_nested for x in y ]

print(all_colors)
# ['green', 'green', 'purple', 'yellow', 'red', 'orange']

网友

2楼 · 编辑于 2024-05-29 01:41:27

让我们用type

df.colors.apply(lambda x : type(x)!=list)
0     True
1     True
2     True
3    False
4     True
Name: colors, dtype: bool

网友

3楼 · 编辑于 2024-05-29 01:41:27

如果值是列表，则通过isinstance方法检查：

#changed sample data
df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow' , 'red'], 'orange'], 
                   'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']})

df = df[~df.colors.map(lambda x : isinstance(x, list))]
print (df)
   colors   names
0   green   Terry
1   green     Nor
2  purple  Franck
4  orange   Agnes

您的解决方案应该通过强制转换为字符串和regex=False参数进行更改：

df = df[~df.colors.astype(str).str.contains('[', na=False, regex=False)] 
print (df)
   colors   names
0   green   Terry
1   green     Nor
2  purple  Franck
4  orange   Agnes

另外，如果要将熊猫0.25+的所有唯一值包括在列表中：

s = df.colors.map(lambda x : x if isinstance(x, list) else [x]).explode().unique().tolist()
print (s)
['green', 'purple', 'yellow', 'red', 'orange']

相关问题更多 >

编程相关推荐

热门问题

热门文章