雅虎财经Python网站刮板关键统计数据和财务报表

2024-05-13 09:00:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我对编程相当陌生,这是我阅读各种指南后的第一个项目。我试图从雅虎财经关键统计页面和财务报表(即http://finance.yahoo.com/q/ks?s=GOOG+Key+Statistics)中获取数据。财务数据的链接位于关键统计页面的底部。key statistics函数的代码似乎可以工作。在

但是对于statement函数,pattern3中使用的entry变量不会获得负值。这个问题在现金流量表中尤为明显。对于负值,输入应该如下所示

entry = '<td align="right">(.+?)</td>'

我正确地处理这个问题吗?有没有一种简单的方法可以获得财务报表的所有价值并将它们列为一个列表?在

我在Python2.7中的代码:

^{pr2}$

Tags: 项目函数代码http编程指南页面关键
1条回答
网友
1楼 · 发布于 2024-05-13 09:00:22

我不相信你用来提取信息的方法是最可靠的方法,但是我改变了你的代码来捕捉你需要的信息。我更新了正则表达式以检查括号,并在末尾添加了一个节来替换

    import urllib
    import re

    keystat = '<td class="yfnc_tabledata1">(.+?)</td>'
    date = '<th scope="col" style="border-top:2px solid #000;text-align:right; font-        weight:bold">(.+?)</th>' #obtain the date; only works for income statement
    total = '<strong>(.+?)&nbsp;&nbsp;</strong>' #obtain data for any totals from statements
    entry = '<td align="right">(\(?.+?\)?)</td>' #obtain data for any entries on     statements that are not totals


    def keystatfunc(symbol):
        url = 'http://finance.yahoo.com/q/ks?s=' + symbol + '+Key+Statistics'
        htmlfile = urllib.urlopen(url)
        htmltext = htmlfile.read()
        regex = '<span id="yfs_j10_' + symbol + '">(.+?)</span>'
        pattern = re.compile(regex)
        pattern2 = re.compile(keystat)
        marketcap = re.findall(pattern, htmltext)
        keystats = re.findall(pattern2, htmltext)
        return (marketcap + keystats[1:31]) #creates a list with all the data on key statistics page)


    def statement(symbol, period, statementtype): #period: "quarter" or "annually"; statementtype: is, bs, or cf (income statement, balance sheet, cash flow statement)
        if period == "quarterly" and statementtype == "bs":
            url = 'http://finance.yahoo.com/q/bs?s=' + symbol
        elif period == "annual" and statementtype == "bs":
            url = 'http://finance.yahoo.com/q/bs?s=' + symbol + '&annual'
        elif period == "quarterly" and statementtype == "is":
            url = 'http://finance.yahoo.com/q/is?s=' + symbol + '&annual'
        elif period == "annual" and statementtype == "is":
            url = 'http://finance.yahoo.com/q/is?s=' + symbol + '&annual'
        elif period == "quarterly" and statementtype == "cf":
            url = 'http://finance.yahoo.com/q/cf?s=' + symbol + '&annual'
        elif period == "annual" and statementtype == "cf":
            url = 'http://finance.yahoo.com/q/cf?s=' + symbol + '&annual'
        htmlfile = urllib.urlopen(url)
        htmltext = htmlfile.read()
        pattern = re.compile(date)
        pattern2 = re.compile(total)
        pattern3 = re.compile(entry)
        dates = re.findall(pattern, htmltext)
        totals = re.findall(pattern2, htmltext)
        entries = re.findall(pattern3, htmltext)
        entriesFixed = []
        for e in entries:
            entriesFixed.append(e.replace('&nbsp;',''))
        return (dates + totals + entriesFixed)



    print keystatfunc("goog")

相关问题 更多 >