用Beautifulsoup循环元素

2024-05-15 15:29:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图存储一些从网站上抓取的数据。我需要的数据是元素中的文本,然后存储在csv中以供以后查询。在

在下面的代码中,我找到了所有对“vip”类的引用。然后我想循环这些,去掉不必要的HTML,只获取文本数据。最后用utf编码到csv中。在

# parse the page and store in var soup
soup = BeautifulSoup(page, 'html.parser')

# find the title
title_box = soup.findAll('a', attrs={'class': 'vip'}}

print title_box

# loop through each iteration
for each in title_box:
    if each.find('title_box'):
        title = title_box.text.strip().encode('utf-8')

# print the result
print title

但是,每当我打印“title”的结果时,都会出现以下错误:

^{pr2}$

据我所知,title超出了范围。如何从循环中检索数据并将其写入打印调用?在

对于上下文,这只是print title_box的一个结果:

<a class="vip" href="http://www.ebay.co.uk/itm/KITCHENAID-CLASSIC-MIXER-5K45SS-ATTACHMENTS-AND-INSTRUCTIONS-/302468759209?hash=item466c8afea9:g:2PIAAOSwCi9Zvk2D" title="Click this link to access KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS">KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS</a>]

Tags: andthe数据文本boxtitleattachmentseach
3条回答

步骤如下:

  1. title_box = soup.findAll('a', attrs={'class': 'vip'}} 这一行查找所有具有标记“a”的html,并使用所需的类vip对其进行进一步过滤。在
  2. 无法执行if each.find('title_box'):,因为没有名为title_box的html标记
  3. 你可以使用

    for each in soup: print(each.text.strip().encode('utf-8'))

无需进一步使用引用上述摘录的任何条件语句

正如我在评论中所说,使用each.find('title_box')不会为您获取任何内容,因为没有title_box标记。在

由于您需要a元素和class属性为vip的元素,因此您应该检查以下内容:

if 'vip' in each['class']:

另外,当这行代码运行时:

^{pr2}$

title_box列表中已经填充了a元素,这些元素的class属性为vip。因此,您不必在for循环中再次检查相同的条件。在

这是您应该尝试的代码:

for each in title_box:
    title = each.text.strip().encode('utf-8')
    print title

当然,您可以不必将文本全部指定给变量,直接打印:

print each.text.strip().encode('utf-8')

我制作了一个HTML文件,其中包含五个a元素的副本,并将其命名为温度.htm':

<a class="vip" href="http://www.ebay.co.uk/itm/KITCHENAID-CLASSIC-MIXER-5K45SS-ATTACHMENTS-AND-INSTRUCTIONS-/302468759209?hash=item466c8afea9:g:2PIAAOSwCi9Zvk2D" title="Click this link to access KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS">KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS</a>
<a class="vip" href="http://www.ebay.co.uk/itm/KITCHENAID-CLASSIC-MIXER-5K45SS-ATTACHMENTS-AND-INSTRUCTIONS-/302468759209?hash=item466c8afea9:g:2PIAAOSwCi9Zvk2D" title="Click this link to access KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS">KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS</a>
<a class="vip" href="http://www.ebay.co.uk/itm/KITCHENAID-CLASSIC-MIXER-5K45SS-ATTACHMENTS-AND-INSTRUCTIONS-/302468759209?hash=item466c8afea9:g:2PIAAOSwCi9Zvk2D" title="Click this link to access KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS">KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS</a>
<a class="vip" href="http://www.ebay.co.uk/itm/KITCHENAID-CLASSIC-MIXER-5K45SS-ATTACHMENTS-AND-INSTRUCTIONS-/302468759209?hash=item466c8afea9:g:2PIAAOSwCi9Zvk2D" title="Click this link to access KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS">KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS</a>
<a class="vip" href="http://www.ebay.co.uk/itm/KITCHENAID-CLASSIC-MIXER-5K45SS-ATTACHMENTS-AND-INSTRUCTIONS-/302468759209?hash=item466c8afea9:g:2PIAAOSwCi9Zvk2D" title="Click this link to access KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS">KITCHENAID CLASSIC MIXER 5K45SS - ATTACHMENTS AND INSTRUCTIONS</a>

然后我运行这个代码来获取这些链接中的文本:

^{pr2}$

你可能还需要对这些文本进行编码,以便存入你的csv文件中。在

相关问题 更多 >