无法在python中打印从regex(严格来说仅限于此)模块检索到的数据?

2024-04-26 03:03:59 发布

您现在位置:Python中文网/ 问答频道 /正文

在这里,我使用python中的're'模块来抓取一个网页,有4次迭代,每次迭代后返回空数组,如['',但输出应该是所需股票的股价符号。那里在打印时,regex变量中没有错误没错。那个源代码包含在下面。你知道吗

import urllib
import re

symbolslist = ["appl","spy","goog","nflx"]

i=0
while i<len(symbolslist):
        url ="http://in.finance.yahoo.com/q?s=" +symbolslist[i] +"&ql=1"
        htmlfile = urllib.urlopen(url)
        htmltext = htmlfile.read()
        regex ='<span id="yfs_l84_'+symbolslist[i] +'">(.+?)</span>'
        pattern = re.compile(regex)
        print regex
        price = re.findall(pattern,htmltext)
        print "price of ",symbolslist[i],"is",price
        i+=1

在输出中没有语法或缩进错误,输出如下

<span id="yfs_l84_appl">(.+?)</span>
price of  appl is []
<span id="yfs_l84_spy">(.+?)</span>
price of  spy is []
<span id="yfs_l84_goog">(.+?)</span>
price of  goog is []
<span id="yfs_l84_nflx">(.+?)</span>
price of  nflx is []

在数组中,不打印库存的值

已爬网的网页是https://in.finance.yahoo.com/q?s=NFLX&ql=0


Tags: ofreid网页ispriceregexgoog
1条回答
网友
1楼 · 发布于 2024-04-26 03:03:59

作为另一种方法,您可能会发现使用^{}库更容易,如下所示:

from yahoo_finance import Share

for symbol in ["appl", "spy", "goog", "nflx"]:
    yahoo = Share(symbol)
    print 'Price of {} is {}'.format(symbol, yahoo.get_price())

提供以下输出:

Price of appl is 96.11
Price of spy is 186.63
Price of goog is 682.40
Price of nflx is 87.40

尝试使用正则表达式解析HTML数据从来都不是明智之举。你知道吗


另一种方法是首先使用BeautifulSoup提取信息:

from bs4 import BeautifulSoup
import requests
import re

for symbol in ["appl", "spy", "goog", "nflx"]:
    url = 'http://finance.yahoo.com/q?s={}'.format(symbol)
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")

    data = soup.find('span', attrs= {'id' : re.compile(r'yfs_.*?_{}'.format(symbol.lower()))})
    print 'Price of {} is {}'.format(symbol, data.text)

相关问题 更多 >