我正在努力搜集数据

2024-06-07 15:51:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用Python将play by play表从basketball-reference example刮到CSV文件中。你知道吗

当我运行这段代码时,表被截断,许多单元格丢失。我是一个编程n00b和任何帮助将不胜感激。你知道吗

from bs4 import BeautifulSoup
from urllib2 import urlopen
import csv

bref = "http://www.basketball-reference.com"
print "Enter game code:"
game = raw_input("> ")

def make_soup(url):
    return BeautifulSoup(urlopen(url), "lxml")

def get_pbp(pbp):
    soup = make_soup(bref + "/boxscores/pbp/" + game + ".html")
    table = soup.find("table", "no_highlight stats_table")
    rows = [row.find_all("td") for row in table.find_all("tr")]

    data = []
    for row in rows:
        values = []
        for value in row:
            if value.string is None:
                values.append(u"")
            else:
                values.append(value.string.replace(u"\xa0", u""))
        data.append(values)
    return data

if __name__ == '__main__':

    print "Writing data for game " + game

    with open(game + '.csv', 'w') as f:
        writer = csv.writer(f)
        writer.writerows(get_pbp(game))

    print game + " has been successfully scraped."

Tags: csvinimportgamefordatavaluetable
1条回答
网友
1楼 · 发布于 2024-06-07 15:51:19

您需要跳过空单元格:

table = soup.find("table", class_="no_highlight stats_table")
rows = [[cell.text.replace(u"\xa0", u"").strip() for cell in row.find_all("td") if cell.text.strip()]
        for row in table.find_all("tr")[2:]]

with open(game + '.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerows(rows)

相关问题 更多 >

    热门问题