从python中的数据帧中计算不同的单词

网友

1楼 · 编辑于 2024-04-24 02:53:30

这里有一个非常类似于@anky\u 91的解决方案：

In [96]: df.col_A.str.replace(r"\s*,\s*", ",").str.get_dummies(",").sum()
Out[96]:
angry        2
happy        4
not happy    1
sad          3
dtype: int64

网友

2楼 · 编辑于 2024-04-24 02:53:30

让我们将melt或stack与str.split和value_counts一起使用：

df['col_A'].str.split(r',\s?', expand=True).melt()['value'].value_counts()

或者

df['col_A'].str.split(r',\s?', expand=True).stack().value_counts()

输出：

happy        4
sad          3
angry        2
not happy    1
dtype: int64

网友

3楼 · 编辑于 2024-04-24 02:53:30

一个班轮，不能保证效率，但它的工作：）

pd.Series([x.strip() for x in df.col_A.str.split(',').sum()]).value_counts()

输出：

happy        4
sad          3
angry        2
not happy    1

效率测试：

%timeit pd.Series([x.strip() for x in df.col_A.str.split(',').sum()]).value_counts()
1.19 ms ± 35.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit pd.Series(sum([list(map(str.strip, i.split(','))) for i in df['col_A']], [])).value_counts()
1.13 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

输出：

效率测试：

相关问题更多 >

编程相关推荐

热门问题

热门文章

从python中的数据帧中计算不同的单词

输出：

效率测试：

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >