Python ImageResolver包_程序模块 - PyPI

在一篇文章中找到最有意义的图片。

ImageResolver的Python项目详细描述

在html内容中查找重要图像的imageresolver的python克隆请参见优秀的js版本：https://github.com/mauricesvay/ImageResolver

用法

import imageresolver
import sys

try:
        i = imageresolver.ImageResolver()
        i.register(imageresolver.FileExtensionResolver())
        i.register(imageresolver.ImgurPageResolver())
        i.register(imageresolver.WebpageResolver(load_images=True, parser='lxml',blacklist='easylist.txt'))
        url = sys.argv[1]

        print i.resolve(url)
except:
        print "An error occured"

与javascript版本的区别

方法返回而不是调用回调
webagersolver有很多新选项（见下文）
添加了一些调试功能
引发异常而不是回调错误函数

WebageResolver添加

规则语法现在基于adblockplus过滤器（https://adblockplus.org/en/filters）
无需编写解析器即可添加新规则
黑名单图像源和白名单
获取图像信息时，尽可能少地加载图像。如果找到尺寸或达到可设置的限制，则停止下载。
js版本的原始规则仍在实现中。（请参见选项）

imageresolver（）方法

初始（**kwargs）

关键字选项

max_read_size - set to the maximum amount of bytes to read to find the width and height of an image. Default 10240
chunk_size - set to the chunk size to read Default 1024
read_all - set to read the entire image and then detect its info. Option will override max_read_size. Default False
debug - set to enable debugging output (logger=”ImageResolver”). Default False

fetch（字符串url）

获取URL并返回响应数据。

获取图像信息（字符串url）

获取图像url并检查结果图像。返回由检测到的文件扩展名、图像的宽度和高度组成的元组。

寄存器（实例过滤器）

注册过滤器以检查图像。filter参数必须是具有resolve（）方法的类的实例。resolve（）必须接受字符串url，并且必须返回url或none

解析（字符串url）

循环遍历每个已注册的筛选器，直到其中一个解析了URL。如果找不到url，则返回none

fileextensionresolver（）方法

解析（字符串url）

如果扩展名与可能的图像匹配，则返回URL

imgurpageresolver（）方法

解析（字符串url）

如果url与imgur页面的模式匹配，则返回imgur图像url

webagersolver（）方法

这个模块的工作马。我们的用途主要围绕这个过滤器，因此它是大多数功能已完成并经过测试。

初始（**kwargs）

使用选项初始化类：

load_image - set to true to load the first 1k of images whose size is not set in HTML. Default False
use_js_ruleset - set to true to use the original rules from the Javascript version. Default False
use_adblock_filters - set to false to disable adblock filters. Default True
parser - set to a BeautifulSoup compatable parser (lxml is recommended). Default html.parser
blacklist - set to a file containing AdBlockPlus style filters used to lower an image’s score. Default blacklist.txt
whiltelist - set to a file containing AdBlockPlus style filters used to raise an image’s score. Default whitelist.txt
significant_surface - Amount of surface (width x height) of the image required to add additional scoring
boost_jpeg - add (int) boost score to JPEG files. Default 1
boost_gif - add (int) boost score to GIF files. Default 0
boost_png - add (int) boost score to PNG files. Default 0
skip_fetch_errors - Skip exceptions raised by fetch_image_info(). Exceptions are logged and the image will be skipped. Default True

beautifulsoup的默认解析器是html.parser，它内置于python中。我们强烈建议您安装lxml并传递parser=“lxml” 到webpageresolver（）。在我们的测试中，我们发现它更快更准确。

日志记录

使用名称“imageresolver”配置记录器。跳过的异常将记录到此记录器的错误输出中，并且在启用时，还会调试输出。

例外情况

imageinfoexception

如果无法读取图像或类型、宽度或高度属性返回未定义，则引发。默认情况下，将跳过并记录此异常，但可以使用WebageResolver中的“skip_fetch_errors=false”选项启用此异常

httpexception

如果无法从URL加载图像，则引发。默认情况下，将跳过并记录此异常，但可以使用WebageResolver中的“skip_fetch_errors=false”选项启用此异常

待办事项

仍然缺少以下解析器：

imguralbumsresolver（）
flickresolver（）
opengraphresolver（）
InstagramResolver（）

我没有计划实施一个9gag解析器。

需要实现更好的缓存。未来的计划是包括一个可配置的缓存m方法以便可以缓存跨会话看到的图像以获得更好的性能

作者

克里斯·布朗

错误

可能吧。如果您找到了，请给我们发送电子邮件或修补程序

版权/确认

最初的想法和基本设置来自maurice svayhttps://github.com/mauricesvay/ImageResolver

图像检测来自bfg pages项目https://code.google.com/p/bfg-pages/

正在读取从https://github.com/wildgarden/abpy派生的adblock plus筛选器

许可证

有些源库是使用bsd许可证授权的。为了避免许可证混乱，我们还选择将此软件作为bsd发布。 adblockplus提供的easylist.txt被授权为gpl，无论如何都应该定期更新。出于这些原因，我们选择不将文件包含在包中。您可以将其作为“黑名单”或“白名单”参数传递给WebageResolver

欢迎加入QQ群-->： 979659372

ImageResolver 0.3.0

ImageResolver的Python项目详细描述

用法

与javascript版本的区别

WebageResolver添加

imageresolver（）方法

fileextensionresolver（）方法

imgurpageresolver（）方法

webagersolver（）方法

日志记录

例外情况

待办事项

作者

错误

版权/确认

许可证

推荐PyPI第三方库

atropine

myenv

mdf-connect-client

potp

abo-generator

chaostoolkit-istio

de9im

xmi2odoo

mrpython

odoo8-addon-stock-lot-quantit

rio-clip

tmod

adafruit-circuitpython-irremote

nagare

bw2analyzer

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

ImageResolver 0.3.0

ImageResolver的Python项目详细描述

用法

与javascript版本的区别

WebageResolver添加

imageresolver（）方法

fileextensionresolver（）方法

imgurpageresolver（）方法

webagersolver（）方法

日志记录

例外情况

待办事项

作者

错误

版权/确认

许可证

推荐PyPI第三方库

atropine

myenv

mdf-connect-client

potp

abo-generator

chaostoolkit-istio

de9im

xmi2odoo

mrpython

odoo8-addon-stock-lot-quantit

rio-clip

tmod

adafruit-circuitpython-irremote

nagare

bw2analyzer

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签