我想检索以下网站上的表并将它们存储在熊猫数据框中:https://www.acf.hhs.gov/orr/resource/ffy-2012-13-state-of-colorado-orr-funded-programs
但是,页面上的第三个表返回一个空数据框,其中所有表的数据都存储在元组中作为列标题:
Empty DataFrame
Columns: [(Service Providers, State of Colorado), (Cuban - Haitian Program, $0), (Refugee Preventive Health Program, $150,000.00), (Refugee School Impact, $450,000), (Services to Older Refugees Program, $0), (Targeted Assistance - Discretionary, $0), (Total FY, $600,000)]
Index: []
有没有办法将元组头“展平”为头+值,然后将其附加到由所有四个表组成的数据帧中?下面是我的代码——它在其他类似的页面上工作过,但由于此表的格式设置,它一直处于中断状态。谢谢
funds_df = pd.DataFrame()
url = 'https://www.acf.hhs.gov/programs/orr/resource/ffy-2011-12-state-of-colorado-orr-funded-programs'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
year = url.split('ffy-')[1].split('-orr')[0]
tables = page.content
df_list = pd.read_html(tables)
for df in df_list:
df['URL'] = url
df['YEAR'] = year
funds_df = funds_df.append(df)
beautifulsoup
或requests
<table>
创建一个DataFrames
列表李>dfl[0]
dfl[1]
dfl[2]
dfl[3]
相关问题 更多 >
编程相关推荐