使用靓汤和Pandas从网页上获取表格

import requests import pandas as pd from bs4 import BeautifulSoup # url_addr = "https://www.uspto.gov/web/offices/ac/ido/oeip/taf/mclsstc/mcls1.htm" url_addr = "https://www.cefconnect.com/closed-end-funds-daily-pricing" html_text = requests.get(url_addr).content bs_obj = BeautifulSoup(html_text) tables = bs_obj.findAll('table') dfs = list() for table in tables: df = pd.read_html(str(table))[0] dfs.append(df) print(df)

1条回答

网友

1楼 · 发布于 2024-04-19 17:35:22

第二个URL用Javascript填充表。如果您使用wget或查看Google Chrome中的网络选项卡，您将看到这是最初发送的表（即，这就是beautiful soup看到的）：

        <div id="data-container" class="row-fluid">
            <div class="span12">                    
                <table class="cefconnect-table-1 daily-pricing table table-striped table-condensed" id="daily-pricing" width="100%" cellpadding="5" cellspacing="0" border="0" summary="">
                    <thead>
                        <tr>
                            <th class="ticker">Ticker</th>
                            <th class="fund-name">Fund Name</th>
                            <th class="strategy">Strategy</th>
                            <th class="closing-price">Closing<br />Price</th>
                            <th class="price-change">Price<br />Change</th>
                            <th class="nav">NAV</th>
                            <th class="premium-discount">Premium/<br />Discount</th>
                            <th class="distribution-rate">Distribution<br />Rate<sup>&dagger;</sup></th>
                            <th class="distribution-rate-on-nav">Distribution<br />Rate on NAV</th>
                            <th class="return-on-nav">1 Yr Rtn<br />on NAV</th>
                        </tr>
                    </thead>
                    <tbody></tbody>
                </table>
            </div>
        </div>

然后一些Javascript填充表。这里有两个选项，要么使用headless browser（像PhantomJS、Selenium，有很多相对容易使用的选项）并在解析之前运行Javascript，要么尝试找出如何访问页面用于添加数据的API。你知道吗

另一个选择，我总是喜欢提到，是联系网站的所有者，并制定一个安排，以更直接的方式获取数据。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章