为什么我总是得到'NoneType'对象在Django app中没有属性'a'?

2024-04-16 20:42:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我不明白我为什么

'NoneType' object has no attribute 'a' 

这是我正在抓取的html结构

^{pr2}$

在 在

在我的Django应用程序中。如果我这么做

 html = requests.get(vlad_url)
        soup = BeautifulSoup(html.text, 'html.parser')
        divs = soup.find('section', 'videos')

        img = divs.find('img').get('src')
        text = divs.strong.a.text
        link = divs.a.get('href')

context = {
    "ref": link,
    "src": img,
    "txt": text,
}

在我看来。这个在我的模板里

{{ref}}
{{src}}
{{txt}}

我会得到一个单一的结果。但是当我试着像这样循环的时候

def get_vlad(url):
        html = requests.get(url, headers=headers)
        soup = BeautifulSoup(html.text, 'html.parser')
        divs = soup.findAll('section', 'box')

        entries = [{'text': div.strong.a.text,
                    'link': div.a.get('href'),
                    'img': div.find('img').get('src')
                       } for div in divs]
        return entries

我得到了一个非常奇怪的非类型错误,因为它确实存在。这也很奇怪,因为我有另一个类似于这个循环的循环

    def get_data(uri):

        html = requests.get(uri, headers=headers)
        soup = BeautifulSoup(html.text, "html.parser")
        divs = soup.findAll('div', 'thumbnail')
        entries = [{'text': div.text,
                    'href': div.find('a').get('href'),
                    'src': div.find('img').get('src')
                    } for div in divs][:6]
        return entries

这就是它工作的html结构

 <div class="col-xs-12 col-md-4" id="split">
      <div class="thumbnail thumb">

             <h6 id="date">May 6, 2016</h6>

            <img src="http://www.paraguayhits.com/wp-content/uploads/2015/11/Almighty-Ft.-N%CC%83engo-Flow-Por-Si-Roncan-660x330.jpg" class="img-responsive post">


        <div style="border-bottom: thin solid lightslategray; padding-bottom: 15px;"></div>

        <div class="caption" id="cap">
            <a href="/blog/almighty-por-si-roncan-ft-nengo-flow-official-video/">
                <h5 class="post-title" id="title">Almighty - Por Si Roncan (ft. Ñengo Flow) [Official Video]</h5>
            </a>





            <p>
                <a href="/blog/76/delete/" class="btn" role="button">delete</a>
                <a href="/blog/almighty-por-si-roncan-ft-nengo-flow-official-video/edit/" class="btn" role="button">edit</a>
            </p>

        </div>
    </div>

两者有什么区别?如何循环查看结果


Tags: textdivsrcidimggethtmlfind
1条回答
网友
1楼 · 发布于 2024-04-16 20:42:19

html坏了,节标记一团糟,我已经成功地使用html5lib用bs4解析严重损坏的html:

In [21]: h = """<section class="videos"
   ....: <section class="box">
   ....: <a href="/videos/video.php?v=wshhH0xVL2LP4hFb0liu" class="video-box">
   ....:     <img src="http://hw-static.exampl.net/.jpg" width="222" height="125" alt="">
   ....: </a>
   ....: <strong class="title"><a href="/videos/video.php?v=wshhH0xVL2LP4hFb0liu">Teen "Allegedly" </a></strong>
   ....: <div>
   ....:     <span class="views">11,323</span>
   ....:     <span class="comments"><a href="http://www.example.net/v" data-disqus-identifier="94137">44</a></span>
   ....: </div>"""

In [22]: from bs4 import BeautifulSoup

In [23]: soup = BeautifulSoup(h, 'html5lib')

In [24]: divs = soup.select_one('section.videos')

In [25]: img = divs.find('img').get('src')

In [26]: text = divs.strong.a.text

In [27]: link = divs.a.get('href')

In [28]: img
Out[28]: u'http://hw-static.exampl.net/.jpg'

In [29]: text
Out[29]: u'Teen "Allegedly" '

In [30]: link
Out[30]: u'/videos/video.php?v=wshhH0xVL2LP4hFb0liu'

正确的html格式:

^{pr2}$

相关问题 更多 >