Python scrapy-promise包_程序模块 - PyPI

Scrapy的Promisestyle工作流

scrapy-promise的Python项目详细描述

卑鄙的诺言

Promise API用于发出垃圾请求。在

用法和示例

fromscrapy_promiseimportfetch

这里的Promise与JavaScript中的Promise类似。如果你是一个新的承诺，一个伟大的起点将是MDN的 Promise API reference 以及Using Promises指南。在

创建和提出请求

fetch()接受scrapy.http.Request接受的所有参数，除了callback和{}

^{pr2}$

fetch()返回一个Promise对象，它是迭代器/生成器。您可以直接在start_requests中返回它，或者yield from它在现有回调中。在

添加处理程序

如果您只调用fetch()并从中屈服，那么它所做的就是在请求完成后存储响应：

request=fetch('https://httpbin.org/ip')yield fromrequest# When the request is done>>>request.is_fulfilledTrue>>>request.get()<200https://httpbin.org/ip>

fetch()返回Promise对象。调用它的.then()方法并提供一个可调用的，一旦有响应，Promise将调用它。在

.then()返回另一个Promise，您可以yield from：

defon_fulfill(response:TextResponse):# You can yield items from your handler# just like you would in a Scrapy callbackyieldItem(response)>>>yield fromfetch(...).then(on_fulfill)

您还可以使用.catch()附加一个错误处理程序，它将收到Twisted ^{}或异常：

defon_reject(exc:Union[Failure,Exception]):ifisinstance(exc,Failure):exc=exc.value...>>>yield fromfetch(...).then(on_fulfill).catch(on_reject)# will catch both exceptions during the request# and exceptions raised in on_fulfill

分支和链接

因为.then()和.catch()返回另一个Promise，所以可以链接其他处理程序。在

后续处理程序将接收上一个处理程序的return值。这与普通的垃圾回调不同， 如果返回值无效：

yield from(fetch('https://httpbin.org/ip').then(parse_json)# returns dict.then(create_item)# will be passed the dict from the previous handler.catch(lambdaexc:logging.getLogger().error(exc)))

Dynamic chaining：如果在处理程序中返回另一个fetch()请求，则该请求将被调度，下一个处理程序将用这个新请求的Response调用。这样可以安排多个按顺序请求。在

yield from(fetch('https://httpbin.org/ip')# A second Request is created from the response of the first one and is scheduled..then(lambdaresponse:fetch(json.loads(response.text)['origin'])).then(lambdaresponse:(yieldItem(response))).catch(lambdaexc:logging.getLogger().error(exc)))

请注意，只有您返回的请求才会连接到随后生成的处理程序Request 中间的处理程序将由Scrapy直接调度。在

您还可以将多个处理程序附加到一个请求，并且将按照它们之前的顺序对它们进行求值声明：

resource=fetch(...)resource.then(save_token)resource.then(parse_html).catch(log_error)resource.then(next_page).catch(stop_spider)yield fromresource# Evaluating any Promise in a chain/branch causes# the entire Promise tree to be evaluated.

承诺聚合函数

Promise提供了几个聚合函数，可以更好地控制请求的调度方式。在

fromnotcallbackimportPromise# dependency

Promise.all()只有在所有请求都成功发出时才会实现，并将在其中一个请求被成功发出时拒绝请求失败。如果所有请求都成功，处理程序将收到一个响应列表：

defparse_pages(responses:Tuple[TextResponse]):forrinresponses:...yield fromPromise.all(*[fetch(url)forurlinurls]).then(parse_pages)

一旦其中一个请求被满足/拒绝，Promise.race()将立即完成。在

defselect_fastest_cdn():yield from(Promise.race(*[fetch(url,method='HEAD')forurlincdn_list]).then(crawl_server))

Promise.all_settled()无论是否完成所有请求，都始终满足他们是成功的。处理程序将收到一个Promise的列表，其值（响应）可以被访问使用.get()方法：

defreport(promises:Tuple[Promise]):forpromiseinpromises:result=promise.get()ifisinstance(result,Response):log.info(f'Crawled {result.url}')else:log.warn(f'Encountered error {result}')yield fromPromise.all_settled(*[fetch(u)foruinurls]).then(report)

Promise.any()满足第一个请求，如果没有请求成功，则拒绝：

defdownload(response):...yield from(Promise.any(*[fetch(u)foruinurls]).then(download).catch(lambdaexc:log.warn('No valid URL!')))

有关Promise API的详细信息，请参阅notcallback

另请参见

在回调中安排请求的其他方法：

欢迎加入QQ群-->： 979659372

scrapy-promise 0.0.6

scrapy-promise的Python项目详细描述

卑鄙的诺言

用法和示例

创建和提出请求

添加处理程序

分支和链接

承诺聚合函数

另请参见

推荐PyPI第三方库

kotano

collective.blog

PyHDFS

pynasdaq

mailchimp

pg2avro

Henchman

freesixt

blanketdb

pathflowai

whatshap

csrbuilder

datastore.objects

smart

raptus.article.person

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

scrapy-promise 0.0.6

scrapy-promise的Python项目详细描述

卑鄙的诺言

用法和示例

创建和提出请求

添加处理程序

分支和链接

承诺聚合函数

另请参见

推荐PyPI第三方库

kotano

collective.blog

PyHDFS

pynasdaq

mailchimp

pg2avro

Henchman

freesixt

blanketdb

pathflowai

whatshap

csrbuilder

datastore.objects

smart

raptus.article.person

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签