AttributeError: 'list'对象没有'split'属性,当我尝试分割CSV文件中的一行时
我有一个包含10行文本的csv文件,所有文本都在一列里。对于每一行,我想去掉一些常见的无意义词(也叫停用词),然后把处理后的内容保存回同一个csv文件,只是去掉了这些停用词。
这是我的代码:
def remove_stopwords(filename):
new_text_list=[]
cr = csv.reader(open(filename,"rU").readlines()[1:])
cachedStopWords = stopwords.words("english")
for row in cr:
text = ' '.join([word for word in row.split() if word not in cachedStopWords])
print text
new_text_list.append(text)
但是我总是遇到这个错误:
AttributeError: 'list' object has no attribute 'split'
看起来我csv文件中的行不能用.split来分割,因为它们是一个列表。那我该怎么解决这个问题呢?
这是我的csv文件的样子:
Text
I am very pleased with the your software for contractors. It is tailored quite neatly for the construction industry.
We have two different companies, one is real estate management and one is health and nutrition services. It works great for both.
上面的例子是我csv文件的前三行。当我运行这行代码时:
cr = csv.reader(open(filename,"rU").readlines()[1:])
print cr[2]
我得到的是:
['We have two different companies, one is real estate management and one is health and nutrition services. It works great for both.']
谢谢,
1 个回答
3
你的数据文件不是CSV格式——里面的单词是用空格分开的,而不是用逗号。所以你不需要用到CSV模块。你只需要逐行读取文件,然后用 row = line.split()
这个方法来根据空格把每一行分开。
def remove_stopwords(filename):
new_text_list = []
cachedStopWords = set(stopwords.words("english"))
with open(filename, "rU") as f:
next(f) # skip one line
for line in f:
row = line.split()
text = ' '.join([word for word in row
if word not in cachedStopWords])
print(text)
new_text_list.append(text)
顺便提一下,在一个 set
中检查某个元素是否存在是O(1)的操作,而在一个 list
中检查则是O(n)的操作。所以把 cachedStopWords
设置成一个集合会更有优势:
cachedStopWords = set(stopwords.words("english"))