我正在写一个程序来迭代一个食谱网站,生命的煎锅,并提取每个食谱和存储在一个CSV文件。我已经设法提取链接存储的目的,但我有困难提取的网页上的元素。网站链接是https://thewoksoflife.com/baked-white-pepper-chicken-wings/。我想要达到的元素是名字、烹饪时间、配料、卡路里、说明等等
def parse_recipe(link):
#hardcoded link for now until i get it working
page = requests.get("https://thewoksoflife.com/baked-white-pepper-chicken-wings/")
soup = BeautifulSoup(page.content, 'html.parser')
for i in soup.findAll("script", {"class": "yoast-schema-graph yoast-schema-graph--main"}):
print(i.get("name")) #should print "Baked White Pepper Chicken Wings" but prints "None"
作为参考,当我打印(I)时,我得到:
<script class="yoast-schema-graph yoast-schema-graph--main" type="application/ld+json">
{"@context":"https://schema.org","@graph":
[{"@type":"Organization","@id":"https://thewoksoflife.com/#organization","name":"The Woks of
Life","url":"https://thewoksoflife.com/","sameAs":
["https://www.facebook.com/thewoksoflife","https://twitter.com/thewoksoflife"],"logo":
{"@type":"ImageObject","@id":"https://thewoksoflife.com/#logo","url":"https://thewoksoflife.com/wp-
content/uploads/2019/05/Temporary-Logo-e1556728319201.png","width":365,"height":364,"caption":"The
Woks of Life"},"image":{"@id":"https://thewoksoflife.com/#logo"}}{"@type":"WebSite","@id":"https://thewoksoflife.com/#website","url":"https://thewoksoflife.com/","name":
"The Woks of Life","description":"a culinary genealogy","publisher":
{"@id":"https://thewoksoflife.com/#organization"},"potentialAction":
{"@type":"SearchAction","target":"https://thewoksoflife.com/?s={search_term_string}","query-
input":"required name=search_term_string"}},
{"@type":"ImageObject","@id":"https://thewoksoflife.com/baked-white-pepper-chicken-
wings/#primaryimage","url":"https://thewoksoflife.com/wp-content/uploads/2019/11/white-pepper-
chicken-wings-9.jpg","width":600,"height":836,"caption":"Crispy Baked White Pepper Chicken Wings,
thewoksoflife.com"},{"@type":"WebPage","@id":"https://thewoksoflife.com/baked-white-pepper-
chicken-wings/#webpage","url":"https://thewoksoflife.com/baked-white-pepper-chicken-
wings/","inLanguage":"en-US","name":"Baked White Pepper Chicken Wings | The Woks of
Life", .................. #continues onwards
我正在尝试访问位于上述代码段末尾的“name”(以及其他类似的不可访问元素),但无法访问。 任何帮助都将不胜感激强>
数据是JSON格式的,因此在找到
<script>
标记后,可以使用JSON模块对其进行解析。例如:印刷品:
相关问题 更多 >
编程相关推荐