从网页浏览结果创建Pandas数据帧

2024-04-28 20:39:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从espn抓取一个表,并将数据发送到熊猫数据框,以便将其导出到excel。我已经完成了大部分的抓取工作,但是在如何将每个'td'标记发送到for循环中唯一的数据帧单元时,我陷入了困境。(代码如下)有什么想法吗?谢谢!在

import requests
import urllib.request
from bs4 import BeautifulSoup
import re
import os
import csv
import pandas as pd

def make_soup(url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata

soup = make_soup("http://www.espn.com/nba/statistics/player/_/stat/scoring-
per-game/sort/avgPoints/qualified/false")

regex = re.compile("^[e-o]")

for record in soup.findAll('tr', {"class":regex}):
    for data in record.findAll('td'):
        print(data)

Tags: 数据importreurlformakerequesturllib
1条回答
网友
1楼 · 发布于 2024-04-28 20:39:32

事实上,我最近正在为一个班级制作一个每日幻想体育算法的体育网站。这是我写的剧本。也许这种方法对你有用。编一本字典。将其转换为数据帧。在

    url = http://www.footballdb.com/stats/stats.html?lg=NFL&yr={0}&type=reg&mode={1}&limit=all

    result = requests.get(url)
    c = result.content

    # Set as Beautiful Soup Object
    soup = BeautifulSoup(c)

    # Go to the section of interest
    tables = soup.find("table",{'class':'statistics'})

    data = {}
    headers = {}
    for i, header in enumerate(tables.findAll('th')):
        data[i] = {}
        headers[i] = str(header.get_text())

    table = tables.find('tbody')
    for r, row in enumerate(table.select('tr')):
        for i, cell in enumerate(row.select('td')):
            try:
                data[i][r] = str(cell.get_text())
            except:
                stat = strip_non_ascii(cell.get_text())
                data[i][r] = stat

    for i, name in enumerate(tables.select('tbody .left .hidden-xs a')):
        data[0][i] = str(name.get_text())

    df = pd.DataFrame(data=data)

相关问题 更多 >