用BeautifulSoup索引到配方时遇到问题

2024-06-16 14:54:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在写一个程序来迭代一个食谱网站,生命的煎锅,并提取每个食谱和存储在一个CSV文件。我已经设法提取链接存储的目的,但我有困难提取的网页上的元素。网站链接是https://thewoksoflife.com/baked-white-pepper-chicken-wings/。我想要达到的元素是名字、烹饪时间、配料、卡路里、说明等等

def parse_recipe(link):
    #hardcoded link for now until i get it working
    page = requests.get("https://thewoksoflife.com/baked-white-pepper-chicken-wings/")
    soup = BeautifulSoup(page.content, 'html.parser')
    for i in soup.findAll("script", {"class": "yoast-schema-graph yoast-schema-graph--main"}):
        print(i.get("name")) #should print "Baked White Pepper Chicken Wings" but prints "None"

作为参考,当我打印(I)时,我得到:

<script class="yoast-schema-graph yoast-schema-graph--main" type="application/ld+json"> 
   {"@context":"https://schema.org","@graph": 
   [{"@type":"Organization","@id":"https://thewoksoflife.com/#organization","name":"The Woks of 
    Life","url":"https://thewoksoflife.com/","sameAs": 
   ["https://www.facebook.com/thewoksoflife","https://twitter.com/thewoksoflife"],"logo": 
{"@type":"ImageObject","@id":"https://thewoksoflife.com/#logo","url":"https://thewoksoflife.com/wp- 
content/uploads/2019/05/Temporary-Logo-e1556728319201.png","width":365,"height":364,"caption":"The 
Woks of Life"},"image":{"@id":"https://thewoksoflife.com/#logo"}}{"@type":"WebSite","@id":"https://thewoksoflife.com/#website","url":"https://thewoksoflife.com/","name": 
   "The Woks of Life","description":"a culinary genealogy","publisher": 
   {"@id":"https://thewoksoflife.com/#organization"},"potentialAction": 
   {"@type":"SearchAction","target":"https://thewoksoflife.com/?s={search_term_string}","query- 
   input":"required name=search_term_string"}}, 
   {"@type":"ImageObject","@id":"https://thewoksoflife.com/baked-white-pepper-chicken- 
   wings/#primaryimage","url":"https://thewoksoflife.com/wp-content/uploads/2019/11/white-pepper- 
   chicken-wings-9.jpg","width":600,"height":836,"caption":"Crispy Baked White Pepper Chicken Wings, 
   thewoksoflife.com"},{"@type":"WebPage","@id":"https://thewoksoflife.com/baked-white-pepper- 
   chicken-wings/#webpage","url":"https://thewoksoflife.com/baked-white-pepper-chicken- 
   wings/","inLanguage":"en-US","name":"Baked White Pepper Chicken Wings | The Woks of 
   Life", .................. #continues onwards

我正在尝试访问位于上述代码段末尾的“name”(以及其他类似的不可访问元素),但无法访问。 任何帮助都将不胜感激


Tags: namehttpscomidurlschematypebaked
1条回答
网友
1楼 · 发布于 2024-06-16 14:54:43

数据是JSON格式的,因此在找到<script>标记后,可以使用JSON模块对其进行解析。例如:

import json
import requests
from bs4 import BeautifulSoup

url = 'https://thewoksoflife.com/baked-white-pepper-chicken-wings/'

soup = BeautifulSoup(requests.get(url).text, 'html.parser')

data = json.loads( soup.select_one('script.yoast-schema-graph.yoast-schema-graph main').text )
# print(json.dumps(data, indent=4))  # <  uncomment this to print all data

recipe = next((g for g in data['@graph'] if g.get('@type', '') == 'Recipe'), None)
if recipe:
    print('Name        =', recipe['name'])
    print('Cook Time   =', recipe['cookTime'])
    print('Ingredients =', recipe['recipeIngredient'])
    # ... etc.

印刷品:

Name        = Baked White Pepper Chicken Wings
Cook Time   = PT40M
Ingredients = ['3 pounds whole chicken wings ((about 14 wings))', '1-2 tablespoons white pepper powder ((divided))', '2 teaspoons salt ((divided))', '1 teaspoon Sichuan peppercorn powder ((optional))', '2 teaspoons vegetable oil ((plus more for brushing))', '1/2 cup all purpose flour', '1/4 cup cornstarch']

相关问题 更多 >