使用PythonBS4从网页中刮取所有表

2条回答

网友

1楼 · 编辑于 2024-05-16 15:15:17

我不确定您希望您的csv文件的格式，但您可以尝试以下方法将表放入csv文件：

from bs4 import BeautifulSoup
from requests import get
from csv import writer

url = 'https://www.fpi.nsdl.co.in/web/Reports/Latest.aspx'

r = get(url)
soup = BeautifulSoup(r.text, 'lxml')


# get all tables
tables = soup.find_all('table')

# loop over each table
for num, table in enumerate(tables, start=1):

    # create filename
    filename = 'table-%d.csv' % num

    # open file for writing
    with open(filename, 'w') as f:

        # store rows here
        data = []

        # create csv writer object
        csv_writer = writer(f)

        # go through each row
        rows = table.find_all('tr')
        for row in rows:

            # write headers if any
            headers = row.find_all('th')
            if headers:
                csv_writer.writerow([header.text.strip() for header in headers])

            # write column items
            columns = row.find_all('td')
            csv_writer.writerow([column.text.strip() for column in columns])

以下是表1.csv：

^{pr2}$

和表2.csv：

Daily Trends in FPI Derivative Trades on 07-Aug-2018

Reporting Date,Derivative Products,Buy,Sell

Open Interest at theend of the date

No. of Contracts,Amount in Crore,No. of Contracts,Amount in Crore,No. of Contracts,Amount in Crore

07-Aug-2018,Index Futures,16899.00,1560.45,17802.00,1706.72,298303.00,26117.55
Index Options,505226.00,51512.43,526331.00,53460.93,654904.00,58508.63
Stock Futures,165411.00,11454.08,158928.00,11105.55,1108615.00,82830.85
Stock Options,84583.00,6297.87,86777.00,6441.33,108437.00,8272.44
Interest Rate Futures,0.00,0.00,0.00,0.00,2530.00,47.60
The above report is compiled on the basis of reports submitted to depositories by NSE and BSE on 07-Aug-2018 and constitutes  FPIs/FIIs trading / position of the previous trading day.

网友

2楼 · 编辑于 2024-05-16 15:15:17

对于第二个表，实际的tr、th和{}元素在table标记下没有结构化。因此，刮掉所有tr、th、和{}标记将生成所需的数据，并且通过应用itertools.groupby，可以获得原始的表结构。在

import requests, itertools
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.fpi.nsdl.co.in/web/Reports/Latest.aspx').text, 'html.parser')
table_data = [[j.text for j in (lambda x:i.find_all('td') if not x else x)(i.find_all('th'))] for i in d.find_all('tr')] 
final_table = [list(b) for _, b in itertools.groupby(table_data, key=lambda x:x[0].startswith('Daily Trends'))]
table1, table2 = [final_table[i]+final_table[i+1] for i in range(0, len(final_table), 2)]

输出：

table：

^{pr2}$

table2：

[['Daily Trends in FPI Derivative Trades on 08-Aug-2018'], ['Reporting Date', 'Derivative Products', 'Buy', 'Sell', 'Open Interest at the'], ['Open Interest at the'], ['No. of Contracts', 'Amount in Crore', 'No. of Contracts', 'Amount in Crore', 'No. of Contracts', 'Amount in Crore'], ['08-Aug-2018', 'Index Futures', '18797.00', '1732.24', '16696.00', '1600.94', '303684.00', '26636.51'], ['Index Options', '495820.00', '50403.69', '512765.00', '52075.29', '673371.00', '60394.18'], ['Stock Futures', '176472.00', '11999.53', '178301.00', '12020.70', '1116162.00', '83275.79'], ['Stock Options', '98471.00', '6949.88', '101906.00', '7204.18', '116286.00', '8824.33'], ['Interest Rate Futures', '0.00', '0.00', '0.00', '0.00', '2530.00', '47.57'], ['The above report is compiled on the basis of reports submitted to depositories by NSE and BSE on 08-Aug-2018 and constitutes  FPIs/FIIs trading / position of the previous trading day.']]

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用PythonBS4从网页中刮取所有表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >