如何在Python中避免每行打印重复的单词?

0 投票
2 回答
788 浏览
提问于 2025-04-18 14:47

我还在慢慢适应Python呢!我需要一点小帮助:在我的程序里,有两个csv文件,一个叫“testclaims”,另一个叫“notinlist”。在writer3中,我让程序把每一行的每个单词都打印到一个新的csv文件里,每个单词占一行。比如,如果testclaims中的一行是:

The boy fell and the boy got hurt

那么输出结果是:

The
boy
fell
and
the
boy
got
hurt

但是我不想让它在同一行中重复打印相同的单词。我希望输出结果是:

The
boy
fell
and
the
got
hurt

我尝试了好一段时间,玩了玩Counter和频率的东西,但就是搞不定。如果你们能帮帮我,那就太好了!这是我的代码:

import csv

with open("testclaims.csv") as file1, open("masterlist.csv") as file2,
    open("stopwords.csv") as file3,\
    open("output.csv", "wb+") as file4, open("output2.csv", "wb+") as file5:
    writer = csv.writer(file4)
    writer2 = csv.writer(file5)
    key_words = [word.strip() for word in file2.readlines()]
    stop_words = [word.strip() for word in file3.readlines()]
    internal_stop_words = [' a ', ' an ', ' and ', 'as ', ' at ', ' be ', 'ed ',
          'ers ', ' for ',\
          ' he ', ' if ', ' in ', ' is ', ' it ', ' of ', ' on ', ' to ', 'her ', 'hers '\
          ' do ', ' did ', ' a ', ' b ', ' c ', ' d ', ' e ', ' f ', ' g ', ' h ', ' i ',\
          ' j ', ' k ', ' l ', ' m ', 'n ', ' n', ' nc ' ' o ', ' p ', ' q ', ' r ', ' s ',\
          ' t ', ' u ', ' v ', ' w ', ' x ', ' y ', 'z ', ',', '"', 'ers ', ' th ', ' gc ',\
                   ' so ', ' ot ', ' ft ', ' ow ', ' ir ', ' ho ', ' er ', ]
    for row in file1:
        row = row.strip()
        row = row.lower()

        for stopword in internal_stop_words:
            if stopword in row:
                row = row.replace(stopword," ")

        for key in key_words:
            if key in row:
                writer.writerow([key, row])

        for word in row.split(): #This Part Here!
            writer3.writerow([word])

        if not any(key in row for key in key_words):
            writer2.writerow([row])

2 个回答

1

我们来看看用一个有序字典(OrderedDict)做点简单的事情吧...

>>> import collections
>>> print "\n".join(collections.OrderedDict.fromkeys("The boy fell and the boy got hurt".split()).keys())
The
boy
fell
and
the
got
hurt
1

使用 set()

row = 'The boy fell and the boy got hurt'

s = set()

for word in row.split():
    if word not in s:
        s.add(word)
        #print word
        writer3.writerow([word])

撰写回答