我试图从https://www.truthorfiction.com/are-americans-annually-healthcare-undocumented/中提取评级,以便从HTML代码中提取“ratingValue”和“alternateName”字段:
<script type=application/ld+json>{
"@context": "http://schema.org",
"@type": "ClaimReview",
"datePublished": "2019-01-03 ",
"url": "https://www.truthorfiction.com/are-americans-annually-healthcare-undocumented/",
"author": {
"@type": "Organization",
"url": "https://www.truthorfiction.com/",
"image": "https://dn.truthorfiction.com/wp-content/uploads/2018/10/25032229/truth-or-fiction-logo-tagline.png",
"sameAs": "https://twitter.com/whatstruecom"
},
"claimReviewed": "More Americans die every year from a lack of affordable healthcare than by terrorism or at the hands of undocumented immigrants.",
"reviewRating": {
"@type": "Rating",
"ratingValue": -1,
"worstRating":-1,
"bestRating": -1,
"alternateName": "True"
},
"itemReviewed": {
"@type": "CreativeWork",
"author": {
"@type": "Person",
"name": "Person",
"jobTitle": "",
"image": "",
"sameAs": [
""
]
},
"datePublished": "",
"name": ""
}
}</script>
我尝试使用以下代码来实现这一点:
import json
from bs4 import BeautifulSoup
slink = 'https://www.truthorfiction.com/are-americans-annually-healthcare-undocumented/'
response = http.request('GET', slink)
soup = BeautifulSoup(response.data)
tmp = json.loads(soup.find('script', type='application/ld+json').text)
然而,tmp却显示了一个字典,其中包含了“application/ld+json”项,它来自于我要提取的分级之前的位,我想知道如何循环或循环到存储分级的脚本的相关部分。你知道吗
您需要使用键访问元素。你知道吗
或者
它有2个
<script type=application/ld+json>
您可以从find_all()
中选择第二个索引或者循环并搜索是否包含字符串
相关问题 更多 >
编程相关推荐