打开文件,将每行拆分成列表,然后对每个单词在每行中检查是否在列表中,如果不在,则将其添加到列表中。

2024-04-20 07:44:10 发布

您现在位置:Python中文网/ 问答频道 /正文

执行以下操作的最佳方法是什么?示例文档(hello.txt)包含以下内容:

>>> repr(hello.txt) #show object representations 

Hello there! This is a sample text. \n Ten plus ten is twenty. \n Twenty times two is forty \n 

>>> print(hello.txt) 

Hello There. This is a sample text 
Ten plus ten is twenty 
Twenty times two is forty 

待办事项: 打开一个文件,将每一行拆分成一个列表,然后检查每一行中的每个单词是否在列表中,如果没有,则将其追加到列表中

open_file = open('hello.txt')
lst = list() #create empty list 

for line in open_file:     
    line = line.rstrip()   #strip white space at the end of each line 
    words = line.split()   #split string into a list of words 

    for word in words:
        if word not in words:
            #Missing code here; tried 'if word not in words', but then it produces a empty list 
            lst.append(word) 

lst.sort()
print(lst)

以上代码的输出:

['Hello', 'Ten', 'There', 'This', 'Twenty', 'a', 'forty', 'is', 'is', 'is', 'plus', 'sample', 'ten', 'text', 'times', 'twenty', 'two']

“is”字符串出现3次,而它应该只出现一次。我一直在想如何编写代码来检查每一行上的每个单词,看看这个单词是否在列表中,如果不在列表中,就把它追加到列表中。。

所需输出:

['Hello', 'Ten', 'There', 'This', 'Twenty', 'a', 'forty', 'is', 'plus', 'sample', 'ten', 'text', 'times', 'twenty', 'two']

Tags: sampletexttxthello列表islineplus
3条回答

解决方案:

open_file = open('hello.txt') #open file 

lst = list() #create empty list 


for line in open_file:  #1st for loop strip white space and split string into list of words 
    line = line.rstrip()
    words = line.split()
    for word in words:  #nested for loop, check if the word is in list and if not append it to the list
        if word not in lst:
            lst.append(word)

lst.sort() #sort the list of words: alphabetically 
print(lst) #print the list of words

你的错误在于这两行:

for word in words:
     if word not in words:

也许你的意思是:

for word in words:
     if word not in lst:

不管它值多少钱,下面是我将如何编写整个程序:

import string
result = sorted(set(
    word.strip(string.punctuation)
    for line in open('hello.txt')
    for word in line.split()))
print result

集合是唯一成员身份的理想选择。

hello.txt的内容:

Hello there! This is a sample text. 
 Ten plus ten is twenty. 
 Twenty times two is forty 

代码:

result = set()

with open('hello.txt', 'r') as myfile:
    for line in myfile:
        temp = [result.add(item) for item in line.strip().split()]

for item in result:
    print(item)

结果:

text.
Twenty
This
ten
a
sample
times
twenty.
Hello
is
there!
two
forty
plus
Ten

你也可以修改你的代码,比如说if word not in lst,而不是if word not in words,这样它就可以工作了。

如果你想排序一个集合。。。好吧,集合是无序的,但是您可以用sorted(result)对输出进行排序。

相关问题 更多 >