使用beautifulsoup删除Html表

2024-04-19 00:26:19 发布

您现在位置：Python中文网/ 问答频道 /正文

6771

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试从SEC填充10-K的表，我认为它进行得很好，除了pandas将其转换为数据帧的部分，因为我对数据帧是新的，所以我认为在索引时出错，请帮助我，因为我得到以下错误“索引器错误：索引2超出轴0大小2的界限”

我正在使用这个程序

import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.sec.gov/Archives/edgar/data/1022344/000155837017000934/spg-20161231x10k.htm#Item8FinancialStatementsandSupplementary'
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc, 'lxml')
table = soup.find_all('table')[0]
new_table = pd.DataFrame(columns=range(0,2), index = [0])
row_marker = 0
    for row in table.find_all('tr'):
    column_marker = 0
    columns = row.find_all('td')
    for column in columns:
        new_table.iat[row_marker,column_marker] = column.get_text()
        column_marker += 1

new_table

如果数据帧问题无法解决，请建议其他替代方法，如将数据写入csv/excel，同时任何一次提取多个表的建议都将非常有用

Tags： columns 数据 import pandas new 错误 table column

0条回答

目前没有回答

使用beautifulsoup删除Html表

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用beautifulsoup删除Html表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >