在不可描述的标记之间刮取文本

<div class="yui-u first yfi-start-content"><div class="yfi_quote_summary"><div id="yfi_quote_summary_data" class="rtq_table"><table id="table1"><tr><th scope="row" width="48%">Prev Close:</th><td class="yfnc_tabledata1">565.07</td></tr>

Traceback (most recent call last): next_FirstTable_tag = FirstTable_tag.findNextSibling AttributeError: 'NoneType' object has no attribute 'findNextSibling' <<< Process finished. (Exit code 1)

编辑2

此后，本帖将关注斯堪的纳维亚半岛正在协助开发的解决方案：

import sys import urllib.request url = "http://finance.yahoo.com/q?s=GOOG" urlRunner = urllib.request.urlopen(url) data = urlRunner.read() from bs4 import BeautifulSoup soup = BeautifulSoup(data) import re tables = soup.findAll("table", id = re.compile('^table')) result = {} for table in tables: for th, td in zip(table.findAll("th"), table.findAll("td")): result[th.text] = td.text print(result)

结果：

{'52wk Range:': '502.80 - 604.83', 'Market Cap:': '381.04B', 'Next Earnings Date:': 'N/A', 'P/E (ttm):': '29.52', 'Avg Vol (3m):': '1,701,610', 'EPS (ttm):': '19.09', '1y Target Est:': 'N/A', 'Volume:': '561,384', 'Ask:': '563.98 x 100', 'Div & Yield:': 'N/A (N/A) ', 'Bid:': '563.56 x 100', 'Beta:': '1.144', 'Open:': '568.00', "Day's Range:": '562.53 - 569.77', 'Prev Close:': '566.37'}

2条回答

网友

1楼 · 编辑于 2024-05-17 01:18:14

from bs4 import BeautifulSoup
import re
import urllib2

url = "http://finance.yahoo.com/q?s=GOOG"
html = urllib2.urlopen(url).read()
bs = BeautifulSoup(html)

#Find the two tables which ID's start with "table".
tables = bs.findAll("table", id=re.compile('^table')) 

result = {}

#Iterate the tables.
for table in tables:
    #Iterate both th and td in order.
    for th, td in zip(table.findAll("th"), table.findAll("td")):
        result[th.text] = td.text

print result

1）什么决定结果的顺序？字典不保留顺序，所以它们是随机排列的。如果需要order，可以使用OrderedDict或包含元组的列表。数据从左栏从上到下刮取，然后从右栏自上而下刮取。
2）我相信数据现在在字典里？如果我想以后重用这些数据并将某些数据点插入到不同的函数中，我应该怎么做。。。另外，如何重新组织名称和值并以更直观的方式显示它们（例如，多行列表，其中每行以描述开头，有空格和短划线，然后显示值）？一旦我重新组织了结果，它应该存储在元组中还是其他什么？

^{pr2}$

至于排序，我们将进入基本编程问题，您只需阅读python中的不同容器，就可以更好地理解这些问题。在

网友

2楼 · 编辑于 2024-05-17 01:18:14

这是基于我认为你想要的，但是没有适当的数据样本是不可能说的。我猜不出它是怎么构成的。在您的描述中，数据听起来是不规则的，这在您的示例中是不可能看到的。在

from bs4 import BeautifulSoup
from itertools import izip

html = """<div class="yui-u first yfi-start-content">
    <div class="yfi_quote_summary">
        <div id="yfi_quote_summary_data" class="rtq_table">
            <table id="table1">
                <tr>
                    <th scope="row" width="48%">Target Point:</th>
                    <td class="yfnc_tabledata1">200.22</td>
                </tr>
                <tr>
                    <th scope="row" width="48%">Target Point:</th>
                    <td class="yfnc_tabledata1">200.22</td>
                </tr>
                <tr>
                    <th scope="row" width="48%">Target Point:</th>
                    <td class="yfnc_tabledata1">200.22</td>
                </tr>
            </table>
        </div>
    </div>
</div>"""

bs = BeautifulSoup(html)

result = {}

ths = bs.findAll("th")
tds = bs.findAll("td")
elements = izip(ths, tds)

result = []

for x, y in elements:
    result.append((x.text, y.text))

print result

编辑：

Yahoo API解决方案，请考虑使用此解决方案：

^{pr2}$

这将打印：

565.07
561.78

以下是股票的可用数据：

AfterHoursChangeRealtime
AnnualizedGain
Ask
AskRealtime
AverageDailyVolume
Bid
BidRealtime
BookValue
Change
Change_PercentChange
ChangeFromFiftydayMovingAverage
ChangeFromTwoHundreddayMovingAverage
ChangeFromYearHigh
ChangeFromYearLow
ChangeinPercent
ChangePercentRealtime
ChangeRealtime
Commission
Currency
DaysHigh
DaysLow
DaysRange
DaysRangeRealtime
DaysValueChange
DaysValueChangeRealtime
DividendPayDate
DividendShare
DividendYield
EarningsShare
EBITDA
EPSEstimateCurrentYear
EPSEstimateNextQuarter
EPSEstimateNextYear
ErrorIndicationreturnedforsymbolchangedinvalid
ExDividendDate
FiftydayMovingAverage
HighLimit
HoldingsGain
HoldingsGainPercent
HoldingsGainPercentRealtime
HoldingsGainRealtime
HoldingsValue
HoldingsValueRealtime
LastTradeDate
LastTradePriceOnly
LastTradeRealtimeWithTime
LastTradeTime
LastTradeWithTime
LowLimit
MarketCapitalization
MarketCapRealtime
MoreInfo
Name
Notes
OneyrTargetPrice
Open
OrderBookRealtime
PEGRatio
PERatio
PERatioRealtime
PercebtChangeFromYearHigh
PercentChange
PercentChangeFromFiftydayMovingAverage
PercentChangeFromTwoHundreddayMovingAverage
PercentChangeFromYearLow
PreviousClose
PriceBook
PriceEPSEstimateCurrentYear
PriceEPSEstimateNextYear
PricePaid
PriceSales
SharesOwned
ShortRatio
StockExchange
symbol
Symbol
TickerTrend
TradeDate
TwoHundreddayMovingAverage
Volume
YearHigh
YearLow
YearRange

编辑

编辑2

相关问题更多 >

编程相关推荐

热门问题

热门文章