Python scrape-schema-recipe包_程序模块 - PyPI

从https://schema.org/recipe格式的html结构化数据中提取烹饪配方。

scrape-schema-recipe的Python项目详细描述

刮模式配方

将htmlhttps://schema.org/Recipe（microdata/json-ld）中的配方刮到python字典中。

安装

pip install scrape-schema-recipe

要求

python版本3.5+

这个库在很大程度上依赖于extruct。

其他要求：

等日期（>；=0.5.1）
请求
验证器（>；=12.4）。

联机示例

>>>importscrape_schema_recipe>>>url='https://www.foodnetwork.com/recipes/alton-brown/honey-mustard-dressing-recipe-1939031'>>>recipe_list=scrape_schema_recipe.scrape_url(url,python_objects=True)>>>len(recipe_list)1>>>recipe=recipe_list[0]# Name of the recipe>>>recipe['name']'Honey Mustard Dressing'# List of the Ingredients>>>recipe['recipeIngredient']['5 tablespoons medium body honey (sourwood is nice)','3 tablespoons smooth Dijon mustard','2 tablespoons rice wine vinegar']# List of the Instructions>>>recipe['recipeInstructions']['Combine all ingredients in a bowl and whisk until smooth. Serve as a dressing or a dip.']# Author>>>recipe['author'][{'@type':'Person','name':'Alton Brown','url':'https://www.foodnetwork.com/profiles/talent/alton-brown'}]

“@type”：“person”是一个https://schema.org/Person对象

# Preparation Time>>>recipe['prepTime']datetime.timedelta(0,300)# The library pendulum can give you something a little easier to read.>>>importpendulum# for pendulum version 1.0>>>pendulum.Interval.instanceof(recipe['prepTime'])<Interval[5minutes]># for version 2.0 of pendulum>>>pendulum.Duration(seconds=recipe['prepTime'].total_seconds())<Duration[5minutes]>

如果将python_objects设置为False，则将返回表示持续时间的字符串iso8611，'PT5M'

pendulum's library website。

# Publication date>>>recipe['datePublished']datetime.datetime(2016,11,13,21,5,50,518000,tzinfo=<FixedOffset'-05:00'>)>>>str(recipe['datePublished'])'2016-11-13 21:05:50.518000-05:00'# Identifying this is http://schema.org/Recipe data (in LD-JSON format)>>>recipe['@context'],recipe['@type']('http://schema.org','Recipe')# Content's URL>>>recipe['url']'https://www.foodnetwork.com/recipes/alton-brown/honey-mustard-dressing-recipe-1939031'# all the keys in this dictionary>>>recipe.keys()dict_keys(['recipeYield','totalTime','dateModified','url','@context','name','publisher','prepTime','datePublished','recipeIngredient','@type','recipeInstructions','author','mainEntityOfPage','aggregateRating','recipeCategory','image','headline','review'])

来自文件的示例（可选表示）

也适用于本地保存的HTML example file。

>>>filelocation='test_data/google-recipe-example.html'>>>recipe_list=scrape_schema_recipe.scrape(filelocation,python_objects=True)>>>recipe=recipe_list[0]>>>recipe['name']'Party Coffee Cake'>>>repcipe['datePublished']datetime.date(2018,3,10)# Recipe Instructions using the HowToStep>>>recipe['recipeInstructions'][{'@type':'HowToStep','text':'Preheat the oven to 350 degrees F. Grease and flour a 9x9 inch pan.'},{'@type':'HowToStep','text':'In a large bowl, combine flour, sugar, baking powder, and salt.'},{'@type':'HowToStep','text':'Mix in the butter, eggs, and milk.'},{'@type':'HowToStep','text':'Spread into the prepared pan.'},{'@type':'HowToStep','text':'Bake for 30 to 35 minutes, or until firm.'},{'@type':'HowToStep','text':'Allow to cool.'}]

当事情出错时会发生什么

如果网站上没有任何http://schema.org/Recipe格式的食谱。

>>>url='https://www.google.com'>>>recipe_list=scrape_schema_recipe.scrape(url,python_objects=True)>>>len(recipe_list)0

有些网站会导致HTTPError。

你可以通过加入一个替代的用户代理来避免403禁止的错误。通过变量user_agent_str。

功能

load()-从文件或类似文件的对象加载html schema.org/recipe结构化数据
loads()-从字符串加载html schema.org/recipe结构化数据
scrape_url()-为html schema.org/recipe结构化数据创建一个url
scrape()-从文件、类似文件的对象、字符串或url中加载html schema.org/recipe结构化数据

    Parameters
    ----------
    location : string or file-like object
        A url, filename, or text_string of HTML, or a file-like object.

    python_objects : bool, list, or tuple  (optional)
        when True it translates certain data types into python objects
          dates into datetime.date, datetimes into datetime.datetimes,
          durations as dateime.timedelta.
        when set to a list or tuple only converts types specified to
          python objects:
            * when set to either [dateime.date] or [datetime.datetimes] either will
              convert dates.
            * when set to [datetime.timedelta] durations will be converted
        when False no conversion is performed
        (defaults to False)

    nonstandard_attrs : bool, optional
        when True it adds nonstandard (for schema.org/Recipe) attributes to the
        resulting dictionaries, that are outside the specification such as:
            '_format' is either 'json-ld' or 'microdata' (how schema.org/Recipe was encoded into HTML)
            '_source_url' is the source url, when 'url' has already been defined as another value
        (defaults to False)

    migrate_old_schema : bool, optional
        when True it migrates the schema from older version to current version
        (defaults to True)

    user_agent_str : string, optional  ***only for scrape_url() and scrape()***
        overide the user_agent_string with this value.
        (defaults to None)

    Returns
    -------
    list
        a list of dictionaries in the style of schema.org/Recipe JSON-LD
        no results - an empty list will be returned

python控制台中的help()中也提供了这些功能。

示例函数

通过example_output()函数，可以快速访问用于原型设计和调试的数据。它接受与load（）相同的参数，但第一个参数name不同。

>>>fromscrape_schema_recipesimportexample_names,example_output>>>example_names('irish-coffee','google','tart','tea-cake','truffles')>>>recipes=example_output('truffles')>>>recipes[0]['name']'Rum & Tonka Bean Dark Chocolate Truffles'

文件

许可证：apache 2.0参见LICENSE

测试数据属性和许可：ATTRIBUTION.md

开发

单元测试可以由以下人员运行：

schema-recipe-scraper$ python3 test_scrape.py

mypy用于静态类型检查

从项目目录：

 schema-recipe-scraper$ mypy schema_recipe_scraper/scrape.py

如果从另一个目录运行mypy，则需要添加--ignore-missing-imports标志，因此$ mypy --ignore-missing-imports scrape.py

--ignore-missing-imports使用标志是因为大多数库都不包含静态类型信息在他们自己的代码或打字。

参考文档

以下是schema.org/recipe应该如何构造的一些参考资料：

https://schema.org/Recipe-官方规范
Recipe Google Search Guide-教开发人员如何使用模式的材料（重点是结构化数据如何影响搜索结果）

其他类似的python库

recipe_scrapers-库刮擦食谱使用HTML标签使用美化组。它每一个都有驱动程序支持的网站。这是一个很好的回退，当模式配方刮刀不能刮一块地。

欢迎加入QQ群-->： 979659372

scrape-schema-recipe 0.1.1

scrape-schema-recipe的Python项目详细描述

刮模式配方

安装

要求

联机示例

来自文件的示例（可选表示）

当事情出错时会发生什么

功能

示例函数

文件

开发

参考文档

其他类似的python库

推荐PyPI第三方库

bitcoin-spv-p

rdkit-to-params

labware-domain-models

astropyfr

collateral

zqy_math

pystonks

cronrepo

nesterxkw

cnn-colorflow

arpyino

run-fargate-task

tfbox

textclf

rayvision-sync

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

scrape-schema-recipe 0.1.1

scrape-schema-recipe的Python项目详细描述

刮模式配方

安装

要求

联机示例

来自文件的示例（可选表示）

当事情出错时会发生什么

功能

示例函数

文件

开发

参考文档

其他类似的python库

推荐PyPI第三方库

bitcoin-spv-p

rdkit-to-params

labware-domain-models

astropyfr

collateral

zqy_math

pystonks

cronrepo

nesterxkw

cnn-colorflow

arpyino

run-fargate-task

tfbox

textclf

rayvision-sync

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签