如何在Python中对数据帧中的每一行使用split函数?

2024-04-23 20:31:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我想计算一个单词在评论字符串中重复的次数

我正在读取csv文件并使用下面的行将其存储在python数据帧中

reviews = pd.read_csv("amazon_baby.csv")

下面几行中的代码在我将其应用于单个审阅时有效。

print reviews["review"][1]
a = reviews["review"][1].split("disappointed")
print a
b = len(a)
print b

上面几行的输出是

it came early and was not disappointed. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.
['it came early and was not ', '. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.']
2

当我使用下面的行对整个数据帧应用相同的逻辑时。我收到一条错误消息

reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1

错误消息:

Traceback (most recent call last):
  File "C:/Users/gouta/PycharmProjects/MLCourse1/Classifier.py", line 12, in <module>
    reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
  File "C:\Users\gouta\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2360, in __getattr__
    (type(self).__name__, name))
AttributeError: 'Series' object has no attribute 'split'

Tags: andcsv数据lenmynotitreview
3条回答

可以使用.str对一系列字符串使用字符串方法:

reviews["review"].str.split("disappointed")

pandas 0.20.3有pandas.Series.str.split()作用于序列的每个字符串并执行拆分。所以你可以简单的分割然后计算分割的次数

len(reviews['review'].str.split('disappointed')) - 1

pandas.Series.str.split

您正在尝试拆分数据帧的整个review列(这是错误消息中提到的系列)。您要做的是对数据帧的每一行应用一个函数,您可以通过调用数据帧上的apply来执行此操作:

f = lambda x: len(x["review"].split("disappointed")) -1
reviews["disappointed"] = reviews.apply(f, axis=1)

相关问题 更多 >