使用python在列表中搜索多个单词

2024-04-26 07:50:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在进行我的第一个python项目。目标是能够通过搜索和打印包含我生成的单词列表中特定单词的句子来总结网页信息。例如,以下(大)列表包含我在商业网站上使用cewl生成的“商业关键术语”

business_list = ['business', 'marketing', 'market', 'price', 'management', 'terms', 'product', 'research', 'organisation', 'external', 'operations', 'organisations', 'tools', 'people', 'sales', 'growth', 'quality', 'resources', 'revenue', 'account', 'value', 'process', 'level', 'stakeholders', 'structure', 'company', 'accounts', 'development', 'personal', 'corporate', 'functions', 'products', 'activity', 'demand', 'share', 'services', 'communication', 'period', 'example', 'total', 'decision', 'companies', 'service', 'working', 'businesses', 'amount', 'number', 'scale', 'means', 'needs', 'customers', 'competition', 'brand', 'image', 'strategies', 'consumer', 'based', 'policy', 'increase', 'could', 'industry', 'manufacture', 'assets', 'social', 'sector', 'strategy', 'markets', 'information', 'benefits', 'selling', 'decisions', 'performance', 'training', 'customer', 'purchase', 'person', 'rates', 'examples', 'strategic', 'determine', 'matrix', 'focus', 'goals', 'individual', 'potential', 'managers', 'important', 'achieve', 'influence', 'impact', 'definition', 'employees', 'knowledge', 'economies', 'skills', 'buying', 'competitive', 'specific', 'ability', 'provide', 'activities', 'improve', 'productivity', 'action', 'power', 'capital', 'related', 'target', 'critical', 'stage', 'opportunities', 'section', 'system', 'review', 'effective', 'stock', 'technology', 'relationship', 'plans', 'opportunity', 'leader', 'niche', 'success', 'stages', 'manager', 'venture', 'trends', 'media', 'state', 'negotiation', 'network', 'successful', 'teams', 'offer', 'generate', 'contract', 'systems', 'manage', 'relevant', 'published', 'criteria', 'sellers', 'offers', 'seller', 'campaigns', 'economy', 'buyers', 'everyone', 'medium', 'valuable', 'model', 'enterprise', 'partnerships', 'buyer', 'compensation', 'partners', 'leaders', 'build', 'commission', 'engage', 'clients', 'partner', 'quota', 'focused', 'modern', 'career', 'executive', 'qualified', 'tactics', 'supplier', 'investors', 'entrepreneurs', 'financing', 'commercial', 'finances', 'entrepreneurial', 'entrepreneur', 'reports', 'interview', 'ansoff']

下面的程序允许我从我指定的URL复制所有文本,并将其组织到一个列表中,列表中的元素用句子分隔

from bs4 import BeautifulSoup
import urllib.request as ul

url = input("Enter URL: ")
html = ul.urlopen(url).read()

soup = BeautifulSoup(html, 'lxml')
for script in soup(["script", "style"]):
    script.decompose()
strips = list(soup.stripped_strings)
# Joining list to form single text
text = " ".join(strips)
text = text.lower()
# Replacing substitutes of '.'
for i in range(len(text)):
    if text[i] in "?!:;":
        text = text.replace(text[i], ".")
# Splitting text by sentences
sentences = text.split(".")

我目前的目标是让程序打印出所有包含上述一个(或多个)关键术语的句子,但我一次只成功地打印出一个单词

# Word to search for in the text
word_search = input("Enter word: ")
word_search = word_search.lower()
sentences_with_word = []
for x in sentences:
               if x.count(word_search)>0:
                          sentences_with_word.append(x)
# Separating sentences into separate lines
sentence_text = "\n\n".join(sentences_with_word)
print(sentence_text)

有人能演示一下如何一次完成整个列表吗?谢谢

编辑

正如MachineLearner所建议的,这里是一个单个单词的输出示例。如果我使用wikipedia's page on marketing作为URL,并选择单词“营销”作为“word_search”的输入,这是生成的输出的一部分(尽管整个输出几乎有600行长)

marketing mix the marketing mix is a foundational tool used to guide decision making in marketing

 the marketing mix represents the basic tools which marketers can use to bring their products or services to market

 they are the foundation of managerial marketing and the marketing plan typically devotes a section to the marketing mix

 the 4ps [ edit ] the traditional marketing mix refers to four broad levels of marketing decision

Tags: thetotextin列表forsearchservice
1条回答
网友
1楼 · 发布于 2024-04-26 07:50:36

使用双循环检查列表中包含的多个单词:

for sentence in sentences:
  for word in words:
    if sentence.count(word) > 0:
      output.append(sentence)
      # Do not forget to break the second loop, else
      # you'll end up with multiple times the same sentence
      # in the output array if the sentence contains 
      # multiple words
      break

相关问题 更多 >