下面是一个web scraper,它成功地从团队的website中提取花名册信息并将其导出到CSV文件中。如您所见,每个团队网站都有类似的url模式。你知道吗
http://m.redsox.mlb.com/roster/
http://m.yankees.mlb.com/roster/
我试图创建一个循环,将通过每个球队的网站循环,刮每个球员的名册信息,并写入一个CSV文件。在代码的开头,我创建了一个团队名称字典,并将其格式化为url以请求一个页面。然而,这种策略是有效的,代码只是循环遍历我在字典中列出的最后一页。有人知道如何修改这个代码,使它在team\u list字典的所有页面中循环吗?提前谢谢!你知道吗
import requests
import csv
from bs4 import BeautifulSoup
team_list={'yankees','redsox'}
for team in team_list:
page = requests.get('http://m.{}.mlb.com/roster/'.format(team))
soup = BeautifulSoup(page.text, 'html.parser')
soup.find(class_='nav-tabset-container').decompose()
soup.find(class_='column secondary span-5 right').decompose()
roster = soup.find(class_='layout layout-roster')
names = [n.contents[0] for n in roster.find_all('a')]
ids = [n['href'].split('/')[2] for n in roster.find_all('a')]
number = [n.contents[0] for n in roster.find_all('td', index='0')]
handedness = [n.contents[0] for n in roster.find_all('td', index='3')]
height = [n.contents[0] for n in roster.find_all('td', index='4')]
weight = [n.contents[0] for n in roster.find_all('td', index='5')]
DOB = [n.contents[0] for n in roster.find_all('td', index='6')]
team = [soup.find('meta',property='og:site_name')['content']] * len(names)
with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
f = csv.writer(fp)
f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))
我相信用一个列表代替你的字典,你应该可以解决这个问题:
相关问题 更多 >
编程相关推荐