如何使用python/Beauty Soup从Wikipedia表中提取特定列

2024-04-19 12:50:41 发布

您现在位置:Python中文网/ 问答频道 /正文

在这件事上我真的被难住了一阵子

链接到表=https://en.wikipedia.org/wiki/List_of_Manchester_United_F.C._seasons

我想从下面红色高亮的列中提取数据

enter image description here

然后把它放在这样的数据框中

enter image description here

这是我的密码

import urllib.request
url = "https://en.wikipedia.org/wiki/List_of_Manchester_United_F.C._seasons"
page = urllib.request.urlopen(url)
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")
# print(soup.prettify())


my_table = soup.find('table', {'class':'wikitable sortable'})

season = []
data = []
for row in my_table.find_all('tr'):
    s = row.find('th')
    season.append(s)
    d = row.find('td')
    data.append(d)


import pandas as pd
c = {'Season': season, 'Data': data}
df = pd.DataFrame(c)

df

这是我的输出。我完全不知道如何得到上面这个简单的5列表。谢谢 enter image description here


Tags: ofhttpsorgimportdatawikitablefind
1条回答
网友
1楼 · 发布于 2024-04-19 12:50:41

你就快到了,虽然你并不需要美丽的团队;只有熊猫

试试这个:

url = "https://en.wikipedia.org/wiki/List_of_Manchester_United_F.C._seasons"
resp = requests.get(url)

tables = pd.read_html(resp.text)

target = tables[2].iloc[:,[0,2,3,4,5]]
target

输出:

    Season      P       W       D       L        
    Season      League  League  League  League   
0   1886–87     NaN     NaN     NaN     NaN      
1   1888–89[9]  12      8       2       2        
2   1889–90     22      9       2       11       

等等,你可以从那里开始

相关问题 更多 >