Python crawlerdetect包_程序模块 - PyPI

crawlerdetect是一个python类，用于通过用户代理检测bots/crawler/spider。

crawlerdetect的Python项目详细描述

关于crawlerdetect

crawlerdetect是php类@CrawlerDetect的python版本。

它有助于通过用户代理和其他http头检测bots/crawler/spider。目前能够检测到1000个机器人/蜘蛛/爬虫。

安装

运行pip install crawlerdetect

使用量

变型1

fromcrawlerdetectimportCrawlerDetectcrawler_detect=CrawlerDetect()crawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')# true if crawler user agent detected

变型2

fromcrawlerdetectimportCrawlerDetectcrawler_detect=CrawlerDetect(user_agent='Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html)')crawler_detect.isCrawler()# true if crawler user agent detected

变型3

fromcrawlerdetectimportCrawlerDetectcrawler_detect=CrawlerDetect(headers={'DOCUMENT_ROOT':'/home/test/public_html','GATEWAY_INTERFACE':'CGI/1.1','HTTP_ACCEPT':'*/*','HTTP_ACCEPT_ENCODING':'gzip, deflate','HTTP_CACHE_CONTROL':'no-cache','HTTP_CONNECTION':'Keep-Alive','HTTP_FROM':'googlebot(at)googlebot.com','HTTP_HOST':'www.test.com','HTTP_PRAGMA':'no-cache','HTTP_USER_AGENT':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36','PATH':'/bin:/usr/bin','QUERY_STRING':'order=closingDate','REDIRECT_STATUS':'200','REMOTE_ADDR':'127.0.0.1','REMOTE_PORT':'3360','REQUEST_METHOD':'GET','REQUEST_URI':'/?test=testing','SCRIPT_FILENAME':'/home/test/public_html/index.php','SCRIPT_NAME':'/index.php','SERVER_ADDR':'127.0.0.1','SERVER_ADMIN':'webmaster@test.com','SERVER_NAME':'www.test.com','SERVER_PORT':'80','SERVER_PROTOCOL':'HTTP/1.1','SERVER_SIGNATURE':'','SERVER_SOFTWARE':'Apache','UNIQUE_ID':'Vx6MENRxerBUSDEQgFLAAAAAS','PHP_SELF':'/index.php','REQUEST_TIME_FLOAT':1461619728.0705,'REQUEST_TIME':1461619728})crawler_detect.isCrawler()# true if crawler user agent detected

输出匹配的bot的名称（如果有的话）

fromcrawlerdetectimportCrawlerDetectcrawler_detect=CrawlerDetect()crawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')# true if crawler user agent detectedcrawler_detect.getMatches()# Sosospider

贡献

如果发现crawler detect无法检测的bot/spider/crawler用户代理，请使用添加到providers/crawlers.py中的数组的regex模式提交一个pull请求，并将失败的用户代理添加到tests/crawlers.txt。

如果失败，只需使用您找到的用户代理创建一个问题，我们将从中解决：）

ES6库

要将此库与nodejs或任何基于es6应用程序的库一起使用，请签出es6-crawler-detect。

.net库

要在基于.NET标准（包括.NET核心）的项目中使用此库，请签出NetCrawlerDetect。

净值扩展

要将此库与nette框架一起使用，请签出NetteCrawlerDetect。

红宝石

要将这个库与ruby on rails或任何基于ruby的应用程序一起使用，请查看crawler_detectgem。

这个类的部分基于出色的MobileDetect

欢迎加入QQ群-->： 979659372

crawlerdetect 0.1.4

crawlerdetect的Python项目详细描述

关于crawlerdetect

安装

使用量

变型1

变型2

变型3

输出匹配的bot的名称（如果有的话）

贡献

ES6库

.net库

净值扩展

红宝石

推荐PyPI第三方库

imf

pretty-tables

selenium-librar

alkomp

openshift-release

tep

tensorfit

omicronscala

alphabet-soup-lambert

bitcoin-karishma

roman-nums

folklore

toastcord

django-dbml

tagup

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

crawlerdetect 0.1.4

crawlerdetect的Python项目详细描述

关于crawlerdetect

安装

使用量

变型1

变型2

变型3

输出匹配的bot的名称（如果有的话）

贡献

ES6库

.net库

净值扩展

红宝石

推荐PyPI第三方库

imf

pretty-tables

selenium-librar

alkomp

openshift-release

tep

tensorfit

omicronscala

alphabet-soup-lambert

bitcoin-karishma

roman-nums

folklore

toastcord

django-dbml

tagup

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签