我使用Python3.4和beautifulsoup4获取rssxml提要的一些数据。
一切似乎都很正常,但有时它的行为却不像预期的那样,因为它没有从列表中的至少一个项中获取<description>
标记中的所有数据。
例如,这是给我带来问题的项目:
<item>
<title>Google’s first DeepMind AI health project is missing something</title>
<link>http://thenextweb.com/google/2016/02/25/googles-first-deepmind-ai-health-project-is-missing-something/</link>
<comments>http://thenextweb.com/google/2016/02/25/googles-first-deepmind-ai-health-project-is-missing-something/#respond</comments>
<pubDate>Thu, 25 Feb 2016 11:36:56 +0000</pubDate>
<dc:creator><![CDATA[Kirsty Styles]]></dc:creator>
<category><![CDATA[Google]]></category>
<category><![CDATA[Insider]]></category>
<category><![CDATA[Deepmind]]></category>
<category><![CDATA[doctor]]></category>
<category><![CDATA[healthcare]]></category>
<category><![CDATA[NHS]]></category>
<category><![CDATA[UK]]></category>
<guid isPermaLink="false">http://thenextweb.com/?p=957096</guid>
<description><![CDATA[<img width="520" height="245" src="http://cdn1.tnwcdn.com/wp-content/blogs.dir/1/files/2014/04/doctor-crop-520x245.jpg" alt="Doctors Seek Higher Fees From Health Insurers" title="Google's first DeepMind AI health project is missing something" data-id="750745" /><br />Having been down at Google’s DeepMind office earlier this week its man vs AI machine gaming competition preview, I was tipped off that a potentially-more-serious healthcare announcement would follow soon. That it has, but contrary to what the company’s remit might suggest, this project doesn’t actually contain any artificial intelligence at launch. “To date, no machine learning has been involved in these projects,” the company said. “While there is obvious potential in applying machine learning to these kinds of complex challenges, any decision to do so will led by clinicians.” DeepMind has announced an acquisition in the shape of an Imperial College London… <br><br><a href="http://thenextweb.com/google/2016/02/25/googles-first-deepmind-ai-health-project-is-missing-something/?utm_source=social&utm_medium=feed&utm_campaign=profeed">This story continues</a> at The Next Web]]></description>
<wfw:commentRss>http://thenextweb.com/google/2016/02/25/googles-first-deepmind-ai-health-project-is-missing-something/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
<enclosure url="http://cdn1.tnwcdn.com/wp-content/blogs.dir/1/files/2014/04/doctor-crop-520x245.jpg" type="image/jpeg" length="0" />
</item>
我用这段代码来解析数据:
^{pr2}$当我运行脚本时,一切都正常,但是这个特定的项目失败了。
代码中的注释行用于拆分img
,并添加一个<p>
标记来对内容进行排序。在
我从这个项目得到的结果是:
’s DeepMind office earlier this week its man vs AI machine gaming competition preview, I was tipped off that a potentially-more-serious healthcare announcement would follow soon. That it has, but contrary to what the company’s remit might suggest, this project doesn’t actually contain any artificial intelligence at launch. “To date, no machine learning has been involved in these projects,” the company said. “While there is obvious potential in applying machine learning to these kinds of complex challenges, any decision to do so will led by clinicians.” DeepMind has announced an acquisition in the shape of an Imperial College London… <br><br><a href="http://thenextweb.com/google/2016/02/25/googles-first-deepmind-ai-health-project-is-missing-something/?utm_source=social&utm_medium=feed&utm_campaign=profeed">This story continues</a> at The Next Web
我不知道发生了什么。
如果有人能帮助我或者引导我找到一种提取<img>
标签的方法,我将非常感谢。在
为什么不在for循环中搜索
description
标记,如下所示:相关问题 更多 >
编程相关推荐