如何在Python中存储目标词

2024-04-19 02:24:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个关于如何在列表中存储目标词的问题。你知道吗

我有一个文本文件:

apple tree apple_tree
banana juice banana_juice
dinner time dinner_time
divorce lawyer divorce_lawyer
breakfast table breakfast_table

我想读这个文件,只存储名词…但我正在努力与Python的代码。你知道吗

file = open("text.txt","r")
for f in file.readlines():
    words.append(f.split(" "))

我不知道如何用空格分割行,并消除带“u3;”的复合词。。。你知道吗

list = [apple, tree, banana, juice, dinner, time...]

Tags: 文件treeapple目标列表timetablefile
3条回答

此代码仅存储不带下划线的单词,并且全部存储在一个列表中,而不是嵌套列表:

words = []
file = open("text.txt","r")
for f in file.readlines():
    words += [i for i in f.split(" ") if not '_' in i]
print(words)
import re

file = ["apple tree apple_tree apple_tree_tree apple_tree_ _",
"banana juice banana_juice",
"dinner time dinner_time",
"divorce lawyer divorce_lawyer",
"breakfast table breakfast_table"]

#approach 1 - list comprehensions
words=[]
for f in file:
    words += [x for x in f.split(" ") if '_' not in x]

print(words)

#approach 2 - regular expressions
words=[]
for f in file:
    f = re.sub(r"\s*\w*_[\w_]*\s*", "", f)
    words += f.split(" ")

print(words)

以上两种方法都可以。 IMO-first更好(正则表达式可能代价高昂),而且更具python特性

试试这个代码。很好用。你知道吗

拆分整个字符串&只添加列表中没有复合词的值(即那些词没有_

代码:

temp = """apple tree apple_tree
banana juice banana_juice
dinner time dinner_time
divorce lawyer divorce_lawyer
breakfast table breakfast_table"""

new_arr = [i for i in temp.split() if not '_' in i]
print(new_arr)

输出:

['apple', 'tree', 'banana', 'juice', 'dinner', 'time', 'divorce', 'lawyer', 'breakfast', 'table']

相关问题 更多 >