如何在Python中避免每行打印重复的单词?
我还在慢慢适应Python呢!我需要一点小帮助:在我的程序里,有两个csv文件,一个叫“testclaims”,另一个叫“notinlist”。在writer3中,我让程序把每一行的每个单词都打印到一个新的csv文件里,每个单词占一行。比如,如果testclaims中的一行是:
The boy fell and the boy got hurt
那么输出结果是:
The
boy
fell
and
the
boy
got
hurt
但是我不想让它在同一行中重复打印相同的单词。我希望输出结果是:
The
boy
fell
and
the
got
hurt
我尝试了好一段时间,玩了玩Counter和频率的东西,但就是搞不定。如果你们能帮帮我,那就太好了!这是我的代码:
import csv
with open("testclaims.csv") as file1, open("masterlist.csv") as file2,
open("stopwords.csv") as file3,\
open("output.csv", "wb+") as file4, open("output2.csv", "wb+") as file5:
writer = csv.writer(file4)
writer2 = csv.writer(file5)
key_words = [word.strip() for word in file2.readlines()]
stop_words = [word.strip() for word in file3.readlines()]
internal_stop_words = [' a ', ' an ', ' and ', 'as ', ' at ', ' be ', 'ed ',
'ers ', ' for ',\
' he ', ' if ', ' in ', ' is ', ' it ', ' of ', ' on ', ' to ', 'her ', 'hers '\
' do ', ' did ', ' a ', ' b ', ' c ', ' d ', ' e ', ' f ', ' g ', ' h ', ' i ',\
' j ', ' k ', ' l ', ' m ', 'n ', ' n', ' nc ' ' o ', ' p ', ' q ', ' r ', ' s ',\
' t ', ' u ', ' v ', ' w ', ' x ', ' y ', 'z ', ',', '"', 'ers ', ' th ', ' gc ',\
' so ', ' ot ', ' ft ', ' ow ', ' ir ', ' ho ', ' er ', ]
for row in file1:
row = row.strip()
row = row.lower()
for stopword in internal_stop_words:
if stopword in row:
row = row.replace(stopword," ")
for key in key_words:
if key in row:
writer.writerow([key, row])
for word in row.split(): #This Part Here!
writer3.writerow([word])
if not any(key in row for key in key_words):
writer2.writerow([row])
2 个回答
1
我们来看看用一个有序字典(OrderedDict)做点简单的事情吧...
>>> import collections
>>> print "\n".join(collections.OrderedDict.fromkeys("The boy fell and the boy got hurt".split()).keys())
The
boy
fell
and
the
got
hurt
1
使用 set()
row = 'The boy fell and the boy got hurt'
s = set()
for word in row.split():
if word not in s:
s.add(word)
#print word
writer3.writerow([word])