从课文中找出每一个可能的单词

2024-04-29 09:17:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个文本:

text = "renoncent au développement. Au lieu de cela,elles s'attaquent à la jugulaire: investir dans un bien immobilier en exploitation qui génère des bénéfices.Avant d'investir, donnée s'est comportée en tant que grand promoteur. Pour déterminer si un projet 'offre potentiel' de profit réaliste,  pesez les antécédents de la et l'équilibre risque récompense potentiel de tout nouveau projet majeur. Souvent, qui cherche une approche intermédiaire formera un partenariat ou une coentreprise avec une entreprise qui est déjà sur le terrain et qui réalise des profits."

我想从这篇课文中得到一个包含课文中每个单词的列表


Tags: text文本delaetenunest
3条回答

您可以将其添加到集合中,这样就不会有任何重复项,如果不需要,可以删除逗号:

words = set()
for word in text.split(" "):
    words.add(word.replace(',',''))
if ',' in words:
    words.remove(',')

这不是最有效的,但可以使用列表

text = "Conscious of its spiritual and moral heritage, the Union is founded on the indivisible, universal values of human dignity, freedom, equality and solidarity; it is based on the principles of democracy and the rule of law. It places the individual at the heart of its activities, by establishing the citizenship of the Union and by creating an area of freedom, security and justice."

words = []

def get_unique_words(text):
    # converts all alphabetical characters to lower
    lower_text = text.lower()
    # splits string on space character 
    split_text = lower_text.split(' ')

    # empty list to populate unique words
    results_list = []
    # iterate over the list
    for word in split_text:
        # check to see if value is already in results lists
        if word not in results_list:
            # append the word if it is unique
            results_list.append(word)
    return results_list

results = get_unique_words(text)

print(results)

印刷品

['conscious', 'of', 'its', 'spiritual', 'and', 'moral', 'heritage,', 'the', 'union', 'is', 'founded', 'on', 'indivisible,', 'universal', 'values', 'human', 'dignity,', 'freedom,', 'equality', 'solidarity;', 'it', 'based', 'principles', 'democracy', 'rule', 'law.', 'places', 'individual', 'at', 'heart', 'activities,', 'by', 'establishing', 'citizenship', 'creating', 'an', 'area', 'security', 'justice.']

在将单词添加到列表时,可以删除“,”。您还可以使用OrderedDict模块删除重复项

text = "Conscious of its spiritual and moral heritage, the Union is founded on the indivisible, universal values of human dignity, freedom, equality and solidarity; it is based on the principles of democracy and the rule of law. It places the individual at the heart of its activities, by establishing the citizenship of the Union and by creating an area of freedom, security and justice."
words = []
from collections import OrderedDict
for word in text.split(" "):
   words.append(word.strip(",")) #=== Remove ',' from word
list1=list(OrderedDict.fromkeys(words)) #=== Remove duplicates
print(list1)

相关问题 更多 >