BeautifulSoup - 查找LOGO

2 投票

3 回答

1588 浏览

提问于 2025-05-01 01:11

我正在用Python 3和BeautifulSoup做一个自动化程序，目的是识别网站的logo。第一步，我想找出那些名字里包含“logo”的图片。这个方法效果还不错。不过，我想进一步扩展一下，找一些图片名里可能包含“image”的，或者在某个链接里有个类名、ID或者属性是“logo”的，甚至是更深层次的，比如在一个包含“logo”类的div里的链接。比如：

<div id="logo">
    <a href="http://www.mexgrocer.com/">
        <img src="http://ep.yimg.com/ca/I/mex-grocer_2269_22595" width="122" height="72" border="0" hspace="0" vspace="0" alt="Mexican Food">
    </a>
</div>

我现在的代码是：

img = soup.find("img",src=re.compile(r'logo',re.I))

我该怎么做才能扩展搜索到所有父标签的属性呢？

暂无标签

3 个回答

你可以使用 find_all(tag, attribute) 这个方法，比如：

from bs4 import Beautifulsoup
soup = BeautifulSoup(f)

var =soup.find_all("font",color="#990000") //all <font color=#990000></font> 
var2 = soup.find_all("a",class_="LinkIndex") // all <a class="LinkIndex"></a>

回答于 2025-05-01 由 Python大师

分享举报

这个问题的答案需要更新为：

from bs4 import BeautifulSoup
from urllib.request import urlopen
import pandas as pd


def getLogoSrc(url):
  soup = BeautifulSoup(urlopen('your_url').read())
  for x in soup.find_all(id='logo'):
      try:
          if x.name == 'img':
              print(x['src'])
      except:
          pass

回答于 2025-05-01 由 Python大师

分享举报

使用 find_all 可以在整个文档中找到所有特定的标签。你可以这样尝试：

from bs4 import Beautifulsoup
import urllib2
soup = BeautifulSoup(urllib2.urlopen('your_url').read())
for x in soup.find_all(id='logo'):
    try:
        if x.name == 'img':
            print x['src']
    except:pass

如果你想根据类名来搜索，只需使用 class='logo'。

回答于 2025-05-01 由 Python大师

分享举报

BeautifulSoup - 查找LOGO

3 个回答

撰写回答