我目前正在进行我的第一个python项目。目标是能够通过搜索和打印包含我生成的单词列表中特定单词的句子来总结网页信息。例如,以下(大)列表包含我在商业网站上使用cewl生成的“商业关键术语”
business_list = ['business', 'marketing', 'market', 'price', 'management', 'terms', 'product', 'research', 'organisation', 'external', 'operations', 'organisations', 'tools', 'people', 'sales', 'growth', 'quality', 'resources', 'revenue', 'account', 'value', 'process', 'level', 'stakeholders', 'structure', 'company', 'accounts', 'development', 'personal', 'corporate', 'functions', 'products', 'activity', 'demand', 'share', 'services', 'communication', 'period', 'example', 'total', 'decision', 'companies', 'service', 'working', 'businesses', 'amount', 'number', 'scale', 'means', 'needs', 'customers', 'competition', 'brand', 'image', 'strategies', 'consumer', 'based', 'policy', 'increase', 'could', 'industry', 'manufacture', 'assets', 'social', 'sector', 'strategy', 'markets', 'information', 'benefits', 'selling', 'decisions', 'performance', 'training', 'customer', 'purchase', 'person', 'rates', 'examples', 'strategic', 'determine', 'matrix', 'focus', 'goals', 'individual', 'potential', 'managers', 'important', 'achieve', 'influence', 'impact', 'definition', 'employees', 'knowledge', 'economies', 'skills', 'buying', 'competitive', 'specific', 'ability', 'provide', 'activities', 'improve', 'productivity', 'action', 'power', 'capital', 'related', 'target', 'critical', 'stage', 'opportunities', 'section', 'system', 'review', 'effective', 'stock', 'technology', 'relationship', 'plans', 'opportunity', 'leader', 'niche', 'success', 'stages', 'manager', 'venture', 'trends', 'media', 'state', 'negotiation', 'network', 'successful', 'teams', 'offer', 'generate', 'contract', 'systems', 'manage', 'relevant', 'published', 'criteria', 'sellers', 'offers', 'seller', 'campaigns', 'economy', 'buyers', 'everyone', 'medium', 'valuable', 'model', 'enterprise', 'partnerships', 'buyer', 'compensation', 'partners', 'leaders', 'build', 'commission', 'engage', 'clients', 'partner', 'quota', 'focused', 'modern', 'career', 'executive', 'qualified', 'tactics', 'supplier', 'investors', 'entrepreneurs', 'financing', 'commercial', 'finances', 'entrepreneurial', 'entrepreneur', 'reports', 'interview', 'ansoff']
下面的程序允许我从我指定的URL复制所有文本,并将其组织到一个列表中,列表中的元素用句子分隔
from bs4 import BeautifulSoup
import urllib.request as ul
url = input("Enter URL: ")
html = ul.urlopen(url).read()
soup = BeautifulSoup(html, 'lxml')
for script in soup(["script", "style"]):
script.decompose()
strips = list(soup.stripped_strings)
# Joining list to form single text
text = " ".join(strips)
text = text.lower()
# Replacing substitutes of '.'
for i in range(len(text)):
if text[i] in "?!:;":
text = text.replace(text[i], ".")
# Splitting text by sentences
sentences = text.split(".")
我目前的目标是让程序打印出所有包含上述一个(或多个)关键术语的句子,但我一次只成功地打印出一个单词
# Word to search for in the text
word_search = input("Enter word: ")
word_search = word_search.lower()
sentences_with_word = []
for x in sentences:
if x.count(word_search)>0:
sentences_with_word.append(x)
# Separating sentences into separate lines
sentence_text = "\n\n".join(sentences_with_word)
print(sentence_text)
有人能演示一下如何一次完成整个列表吗?谢谢
编辑
正如MachineLearner所建议的,这里是一个单个单词的输出示例。如果我使用wikipedia's page on marketing作为URL,并选择单词“营销”作为“word_search”的输入,这是生成的输出的一部分(尽管整个输出几乎有600行长)
marketing mix the marketing mix is a foundational tool used to guide decision making in marketing
the marketing mix represents the basic tools which marketers can use to bring their products or services to market
they are the foundation of managerial marketing and the marketing plan typically devotes a section to the marketing mix
the 4ps [ edit ] the traditional marketing mix refers to four broad levels of marketing decision
使用双循环检查列表中包含的多个单词:
相关问题 更多 >
编程相关推荐