迭代beauthulsoup resultset python

2024-04-18 00:36:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在从一个网站(http://sports.yahoo.com/nfl/players/8800/)上获取数据,为此我使用了urllib2和BeautifulSoup。我现在的代码如下:

site=  'http://sports.yahoo.com/nfl/players/8800/'
response = urllib2.urlopen(site)
html = response.read()
soup = BeautifulSoup(html)
rushing=[]
passing=[]
receiving=[]

#here is where my problem arises
for elem in soup.find_all('th', text=re.compile('2008')):
        passing = elem.parent.find_all('td', class_=re.compile('10'))
        rushing = elem.parent.find_all('td', class_=re.compile('20'))
        receiving = elem.parent.find_all('td', class_=re.compile('30'))

有三个实例汤。找到所有(…'2008'))部分存在于此页上,当该部分单独打印时,每个部分都会出现。不过,运行这个for循环只运行一次循环。如何确保循环运行三次?在


Tags: recomhttpallfindurllib2yahooclass
1条回答
网友
1楼 · 发布于 2024-04-18 00:36:18

据我所知,您需要^{}循环之前定义的列表:

rushing = []
passing = []
receiving = []

for elem in soup.find_all('th', text=re.compile('2008')):
    passing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('10'))])
    rushing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('20'))])
    receiving.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('30'))])

print passing
print rushing
print receiving

印刷品:

^{pr2}$

相关问题 更多 >

    热门问题