web内容的Python正则表达式

def urlRead(url): """Gets and returns the content of the chosen URL""" webpage = urllib.request.urlopen(url) page_contents = webpage.read() return page_contents def getPrices(content): content = re.findall(r'£435', content.decode()) print(content) def main(): page_contents = '' url = input('Please enter in the kayak url!: ') content = urlRead(url) getPrices(content) if __name__ == '__main__': main()

1条回答

网友

1楼 · 发布于 2024-04-19 21:50:24

正如@Mr Lister所说，如果可以避免的话，就不应该尝试使用正则表达式解析HTML。 Beautiful Soup是一个HTML解析库，可以帮助您完成所需的工作：

response = urllib2.urlopen('https://www.google.com/finance?q=NYSE%3AAAPL')
html = response.read()
soup = BeautifulSoup(html, "lxml")
aaplPrice = soup.find(id='price-panel').div.span.span.text
aaplVar = soup.find(id='price-panel').div.div.span.find_all('span')[1].string.split('(')[1].split(')')[0]
aapl = aaplPrice + ' ' + aaplVar

相关问题更多 >

编程相关推荐

热门问题

热门文章