使用正则表达式从Yahoo抓取股票数据的单行多重获取

0 投票

3 回答

698 浏览

提问于 2025-04-18 12:12

import urllib
import re

stocks_symbols = ['aapl', 'spy', 'goog', 'nflx', 'msft']

for i in range(len(stocks_symbols)):
    htmlfile = urllib.urlopen("https://finance.yahoo.com/q?s=" + stocks_symbols[i])
    htmltext = htmlfile.read(htmlfile)
    regex = '<span id="yfs_l84_' + stocks_symbols[i] + '">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern, htmltext)

    regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'
    pattern1 = re.compile(regex1)
    name1 = re.findall(pattern1, htmltext)
    print "Price of", stocks_symbols[i].upper(), name1, "is", price[0]

我想问题出在 regex1 上，

regex1 = '<h2 id="yui_3_9_1_9_(.^?))">(.+?)</h2>'

我试着看文档，但没能搞明白。

在这个程序中，我想通过输入一个股票代码的列表来抓取股票名称和股票价格。

我觉得我在做的事情是把两个 (.+?) 放在一个变量里，这样似乎不太对。

输出：

Traceback (most recent call last):
  File "C:\Py\stock\stocks.py", line 14, in <module>
    pattern1 = re.compile(regex1)
  File "C:\canopy-1.4.0.1938.win-x86\lib\re.py", line 190, in compile
    return _compile(pattern, flags)
  File "C:\canopy-1.4.0.1938.win-x86\lib\re.py", line 242, in _compile
    raise error, v # invalid expression
error: nothing to repeat

正则表达式数据解析网络爬虫股票代码股票价格股票名称股票数据抓取

3 个回答

这是一个使用 requests、lxml 和 css 选择器 的示例

import requests
import lxml, lxml.cssselect

stocks_symbols = ['aapl', 'spy', 'goog', 'nflx', 'msft']

for symbol in stocks_symbols:

    r = requests.get("https://finance.yahoo.com/q?s=" + symbol)
    html = lxml.html.fromstring(r.text)

    price = html.cssselect('span#yfs_l84_' + symbol)
    print '%s: %s' % (symbol.upper(), price[0].text)

    # there is no `h2` with `id` started wiht "yui_3_9_1_9_"
    # so I can't test this part of code

    #names = html.cssselect('h2[id^="yui_3_9_1_9_"]')
    #for x in names:
    #    print x.text, x.attrib('id')[len('yui_3_9_1_9_'):]

结果：

AAPL: 94.03
SPY: 198.20
GOOG: 584.73
NFLX: 472.35
MSFT: 41.80

回答于 2025-04-18 由 Python大师

分享举报

你可以使用BeautifulSoup来提取价格：

import requests
from bs4 import BeautifulSoup
stocks_symbols = ['aapl', 'spy', 'goog', 'nflx', 'msft']

for stock in stocks_symbols:
    htmlfile = requests.get("https://finance.yahoo.com/q?s={}".format(stock))
    soup = BeautifulSoup(htmlfile.content)
    price = [x.text for x in soup.findAll("span",id="yfs_l84_{}".format(stock))]
    print ("Price of {}  is {}".format(stock.upper(), price[0]))
Price of AAPL  is 94.03
Price of SPY  is 198.20
Price of GOOG  is 584.73
Price of NFLX  is 472.35
Price of MSFT  is 41.80

回答于 2025-04-18 由 Python大师

分享举报

^ 表示字符串的开头，而后面的 ? 在正则表达式中是不合法的。如果你把你的正则表达式改成 regex1 = '(.+?)'，它就可以正常工作了。注意你还多了一个括号。

另外，有更好的方法来获取雅虎的股票信息。你可以通过 YQL 查询很多表格（包括股票信息），还有一个 YQL-Console，你可以在这里试验你的查询。

从那里得到的结果是 JSON 或 XML，这些格式在一些 Python 库中处理起来非常方便。

回答于 2025-04-18 由 Python大师

分享举报

使用正则表达式从Yahoo抓取股票数据的单行多重获取

3 个回答

撰写回答