将表拆分为多个数据帧

2024-05-15 22:36:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前从beautifulsoup获得了这个表,并希望将其拆分为多个数据帧,我希望每次出现绿色标题元素时都将其拆分

以下是网页: http://www.greyhound-data.com/d?page=stadia&st=1011&land=au&stadiummode=3

这就是我现在所拥有的,因为我无法理解,我习惯于这些问题只是分开的表格

url = "http://www.greyhound-data.com/d?page=stadia&st=1011&land=au&stadiummode=3"
req = requests.get(url).text
soup = BeautifulSoup(req, 'lxml')


table = soup.find_all("table", attrs={'id': "green"})
table = table[-1]

df = pd.read_html(str(table))[0]

output:

               Year quarter  ...                  Set on
    Distance: 331 m / 362 y  ... Distance: 331 m / 362 y
0                  2020 2nd  ...             15 JUN 2020
1                  2020 1st  ...             23 JAN 2020
2                  2019 4th  ...              6 OCT 2019
3                  2019 3rd  ...              1 SEP 2019
4                  2019 2nd  ...             28 APR 2019
..                      ...  ...                     ...
319                2002 3rd  ...              5 SEP 2002
320                2002 2nd  ...              6 JUN 2002
321                2001 4th  ...             18 OCT 2001
322                2001 3rd  ...             16 AUG 2001
323                2001 2nd  ...             14 JUN 2001

[324 rows x 7 columns]

enter image description here


Tags: comhttpurldatawwwpagetablereq
1条回答
网友
1楼 · 发布于 2024-05-15 22:36:54

此脚本将表拆分为几个数据帧:

import requests
from bs4 import BeautifulSoup
import pandas as pd


url = "http://www.greyhound-data.com/d?page=stadia&st=1011&land=au&stadiummode=3"
req = requests.get(url).text
soup = BeautifulSoup(req, 'lxml')

table = soup.find_all("table", attrs={'id': "green"})[-1]

trs, dfs, all_data = table.select('tr'), [], []
header = [th.get_text(strip=True) for th in trs[0].select('th')]

for tr in trs[2:]:
    if tr.td:
        all_data.append([td.get_text(strip=True) for td in tr.select('td')])
    else:
        dfs.append(pd.DataFrame(all_data, columns=header))
        all_data = []
dfs.append(pd.DataFrame(all_data, columns=header))

# print all DataFrames in list:
for df in dfs:
    print(df)
    print('-' * 160)

印刷品:

   Year quarter running dif.dogs average time avg win time best time        Set by       Set on
0      2020 2nd              226        19.63        19.18     18.79     Data Base  15 JUN 2020
1      2020 1st              255        19.68        19.14     18.58     Wazza Who  23 JAN 2020
..          ...              ...          ...          ...       ...           ...          ...
39     2010 3rd              286        19.85        19.34     18.90  Royal Surfer  15 SEP 2010
40     2010 2nd               92        20.01        19.57     19.28      Paw Form  16 JUN 2010

[41 rows x 7 columns]
                                                                                
   Year quarter running dif.dogs average time avg win time best time           Set by       Set on
0      2020 2nd              217        23.40        22.79     22.25     Canya Cruise   3 JUN 2020
1      2020 1st              285        23.35        22.85     22.47     Dawn's Dream  22 JAN 2020
..          ...              ...          ...          ...       ...              ...          ...
65     2004 1st                3        23.54        23.25     23.25    Seismic Shock   9 JAN 2004
66     2003 4th               16        23.67        23.33     23.29  Far Away Places  17 OCT 2003

[67 rows x 7 columns]
                                                                                
   Year quarter running dif.dogs average time avg win time best time     Set by       Set on
0      2020 2nd              264        30.68        30.13     29.56  Oh Mickey  23 APR 2020
1      2020 1st              224        30.70        30.12     29.41  Sennachie  10 JAN 2020
..          ...              ...          ...          ...       ...        ...          ...
76     2001 2nd               13        30.50        30.37     30.16      Korda  27 APR 2001
77     2001 1st                3        30.72        30.72     30.55   Fly Fast   0 MAR 2001

[78 rows x 7 columns]
                                                                                
   Year quarter running dif.dogs average time avg win time best time            Set by       Set on
0      2020 2nd               76        35.71        35.14     34.65  Frieda Las Vegas  28 MAY 2020
1      2020 1st               76        35.77        35.21     34.72  Velocity Bettina  23 JAN 2020
..          ...              ...          ...          ...       ...               ...          ...
73     2001 2nd                1        35.49        35.49     35.49     Kissin Bobbie  24 MAY 2001
74     2001 1st                1        36.10        36.10     36.10    Brampton Blues  23 MAR 2001

[75 rows x 7 columns]
                                                                                
   Year quarter running dif.dogs average time avg win time best time             Set by       Set on
0      2020 2nd               33        42.73        42.08     41.62            Rasheda  28 MAY 2020
1      2020 1st               16        42.38        41.93     41.83      What About It  20 FEB 2020
..          ...              ...          ...          ...       ...                ...          ...
57     2001 3rd                2        42.57        42.53     42.53  Universal Tears *  16 AUG 2001
58     2001 2nd                4        42.24        42.27     42.15    Hotshow Vintage  14 JUN 2001

[59 rows x 7 columns]
                                                                                

编辑:要获取距离列,请执行以下操作:

import requests
from bs4 import BeautifulSoup
import pandas as pd


url = "http://www.greyhound-data.com/d?page=stadia&st=1011&land=au&stadiummode=3"
req = requests.get(url).text
soup = BeautifulSoup(req, 'lxml')

table = soup.find_all("table", attrs={'id': "green"})[-1]

trs, dfs, all_data, th = table.select('tr'), [], [], ''
header = ['Distance'] + [th.get_text(strip=True) for th in trs[0].select('th')]

for tr in trs[1:]:
    if tr.td:
        all_data.append([th] + [td.get_text(strip=True) for td in tr.select('td')])
    else:
        th = tr.th.get_text(strip=True)
        if all_data:
            dfs.append(pd.DataFrame(all_data, columns=header))
            all_data = []

dfs.append(pd.DataFrame(all_data, columns=header))

# print all DataFrames in list:
for df in dfs:
    print(df)
    print('-' * 160)

印刷品:

                   Distance Year quarter running dif.dogs average time avg win time best time        Set by       Set on
0   Distance: 331 m / 362 y     2020 2nd              226        19.63        19.18     18.79     Data Base  15 JUN 2020
1   Distance: 331 m / 362 y     2020 1st              255        19.68        19.14     18.58     Wazza Who  23 JAN 2020
..                      ...          ...              ...          ...          ...       ...           ...          ...
39  Distance: 331 m / 362 y     2010 3rd              286        19.85        19.34     18.90  Royal Surfer  15 SEP 2010
40  Distance: 331 m / 362 y     2010 2nd               92        20.01        19.57     19.28      Paw Form  16 JUN 2010

[41 rows x 8 columns]
                                                                                
                   Distance Year quarter running dif.dogs average time avg win time best time           Set by       Set on
0   Distance: 395 m / 432 y     2020 2nd              217        23.40        22.79     22.25     Canya Cruise   3 JUN 2020
1   Distance: 395 m / 432 y     2020 1st              285        23.35        22.85     22.47     Dawn's Dream  22 JAN 2020
..                      ...          ...              ...          ...          ...       ...              ...          ...
65  Distance: 395 m / 432 y     2004 1st                3        23.54        23.25     23.25    Seismic Shock   9 JAN 2004
66  Distance: 395 m / 432 y     2003 4th               16        23.67        23.33     23.29  Far Away Places  17 OCT 2003

[67 rows x 8 columns]
                                                                                
                   Distance Year quarter running dif.dogs average time avg win time best time     Set by       Set on
0   Distance: 520 m / 569 y     2020 2nd              264        30.68        30.13     29.56  Oh Mickey  23 APR 2020
1   Distance: 520 m / 569 y     2020 1st              224        30.70        30.12     29.41  Sennachie  10 JAN 2020
..                      ...          ...              ...          ...          ...       ...        ...          ...
76  Distance: 520 m / 569 y     2001 2nd               13        30.50        30.37     30.16      Korda  27 APR 2001
77  Distance: 520 m / 569 y     2001 1st                3        30.72        30.72     30.55   Fly Fast   0 MAR 2001

[78 rows x 8 columns]
                                                                                
                   Distance Year quarter running dif.dogs average time avg win time best time            Set by       Set on
0   Distance: 600 m / 656 y     2020 2nd               76        35.71        35.14     34.65  Frieda Las Vegas  28 MAY 2020
1   Distance: 600 m / 656 y     2020 1st               76        35.77        35.21     34.72  Velocity Bettina  23 JAN 2020
..                      ...          ...              ...          ...          ...       ...               ...          ...
73  Distance: 600 m / 656 y     2001 2nd                1        35.49        35.49     35.49     Kissin Bobbie  24 MAY 2001
74  Distance: 600 m / 656 y     2001 1st                1        36.10        36.10     36.10    Brampton Blues  23 MAR 2001

[75 rows x 8 columns]
                                                                                
                   Distance Year quarter running dif.dogs average time avg win time best time             Set by       Set on
0   Distance: 710 m / 776 y     2020 2nd               33        42.73        42.08     41.62            Rasheda  28 MAY 2020
1   Distance: 710 m / 776 y     2020 1st               16        42.38        41.93     41.83      What About It  20 FEB 2020
..                      ...          ...              ...          ...          ...       ...                ...          ...
57  Distance: 710 m / 776 y     2001 3rd                2        42.57        42.53     42.53  Universal Tears *  16 AUG 2001
58  Distance: 710 m / 776 y     2001 2nd                4        42.24        42.27     42.15    Hotshow Vintage  14 JUN 2001

[59 rows x 8 columns]
                                                                                

相关问题 更多 >