我正在尝试使用pandas创建一个列表/数组,其中包含以下文本文件“review/text”字段中的所有单词:
product/productId: B001E4KFG0 review/userId: A3SGXH7AUHU8GW review/profileName: delmartian review/helpfulness: 1/1 review/score:
5.0 review/time: 1303862400 review/summary: Good Quality Dog Food review/text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most.
product/productId: B00813GRG4 review/userId: A1D87F6ZCVE5NK review/profileName: dll pa review/helpfulness: 0/0 review/score: 1.0 review/time: 1346976000 review/summary: Not as Advertised review/text: Product arrived labeled as Jumbo Salted Peanuts...
(文本文件foods.txt位于:http://snap.stanford.edu/data/web-FineFoods.html)
我的最终目标是识别评论/文本字段中出现的所有独特单词
我编写了以下代码:
import pandas as pd
f=open("foods.txt","r")
df=pd.read_csv(f,names=['product/productId','review/userId','review/profileName','review/helpfulness','review/score','review/time','review/summary'])
selected = df[ df['review/summary'] ]
print(selected)
selected.to_csv('result.csv', sep=' ', header=False)
但是,我得到的错误如下:
ValueError: cannot index with vector containing NA / NaN values
有什么建议/意见吗
我认为您必须这样做才能从文件中提取所有记录,并获得审阅/摘要值。您不需要数据帧
其输出将为:
我认为你的问题范围还包括写一个新文件
您可以打开一个文件并将字典作为一行编写。这将包含所有细节。我将把这部分留给你来解决
我查看了S.Ghoshal提供的链接,得出以下结论:
现在,以review/text开头的行中的所有单词都转储到一个文件中。接下来,我需要创建一个包含所有唯一单词的列表
CSV文件表示逗号分隔的值。我在你的档案里没有看到任何昏迷
它看起来像一个破损的字典(每个条目缺少分隔逗号):
相关问题 更多 >
编程相关推荐