我想计算一个单词在评论字符串中重复的次数
我正在读取csv文件并使用下面的行将其存储在python数据帧中
reviews = pd.read_csv("amazon_baby.csv")
下面几行中的代码在我将其应用于单个审阅时有效。
print reviews["review"][1]
a = reviews["review"][1].split("disappointed")
print a
b = len(a)
print b
上面几行的输出是
it came early and was not disappointed. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.
['it came early and was not ', '. i love planet wise bags and now my wipe holder. it keps my osocozy wipes moist and does not leak. highly recommend it.']
2
当我使用下面的行对整个数据帧应用相同的逻辑时。我收到一条错误消息
reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
错误消息:
Traceback (most recent call last):
File "C:/Users/gouta/PycharmProjects/MLCourse1/Classifier.py", line 12, in <module>
reviews['disappointed'] = len(reviews["review"].split("disappointed"))-1
File "C:\Users\gouta\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2360, in __getattr__
(type(self).__name__, name))
AttributeError: 'Series' object has no attribute 'split'
可以使用
.str
对一系列字符串使用字符串方法:pandas 0.20.3有pandas.Series.str.split()作用于序列的每个字符串并执行拆分。所以你可以简单的分割然后计算分割的次数
pandas.Series.str.split
您正在尝试拆分数据帧的整个review列(这是错误消息中提到的系列)。您要做的是对数据帧的每一行应用一个函数,您可以通过调用数据帧上的apply来执行此操作:
相关问题 更多 >
编程相关推荐