如何获得残缺的失败网址？问题的回答

如何获得残缺的失败网址？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

来自@Talvalin和@alecxe的答案对我帮助很大，但它们似乎没有捕获不生成响应对象的下载程序事件（例如，<code>twisted.internet.error.TimeoutError</code>和<code>twisted.web.http.PotentialDataLoss</code>）。这些错误在运行结束时显示在stats转储中，但没有任何元信息。 正如我发现的<a href="https://groups.google.com/forum/#!forum/scrapy-users" rel="nofollow noreferrer">here</a>，错误由<a href="https://github.com/scrapy/scrapy/blob/master/scrapy/downloadermiddlewares/stats.py" rel="nofollow noreferrer">stats.py</a>中间件跟踪，捕获在<code>DownloaderStats</code>类<code>process_exception</code>方法中，特别是在<code>ex_class</code>变量中，该变量根据需要增加每个错误类型，然后在运行结束时转储计数。 要将这些错误与来自相应请求对象的信息相匹配，可以向每个请求添加一个唯一的id（通过<code>request.meta</code>），然后将其拉入<code>stats.py</code>的<code>process_exception</code>方法中： <pre><code>self.stats.set_value('downloader/my_errs/{0}'.format(request.meta), ex_class) </code></pre> 这将为每个基于下载程序的错误生成一个唯一的字符串，而不伴随响应。然后，您可以将修改后的<code>stats.py</code>另存为其他内容（例如<code>my_stats.py</code>），将其添加到downloadermiddleware（具有正确的优先级），并禁用stock <code>stats.py</code>： <pre><code>DOWNLOADER_MIDDLEWARES = { 'myproject.my_stats.MyDownloaderStats': 850, 'scrapy.downloadermiddleware.stats.DownloaderStats': None, } </code></pre> 运行结束时的输出如下所示（这里使用meta info，其中每个请求url都映射到一个组id和成员id，用斜线分隔，如<code>'0/14'</code>）： <pre><code>{'downloader/exception_count': 3, 'downloader/exception_type_count/twisted.web.http.PotentialDataLoss': 3, 'downloader/my_errs/0/1': 'twisted.web.http.PotentialDataLoss', 'downloader/my_errs/0/38': 'twisted.web.http.PotentialDataLoss', 'downloader/my_errs/0/86': 'twisted.web.http.PotentialDataLoss', 'downloader/request_bytes': 47583, 'downloader/request_count': 133, 'downloader/request_method_count/GET': 133, 'downloader/response_bytes': 3416996, 'downloader/response_count': 130, 'downloader/response_status_count/200': 95, 'downloader/response_status_count/301': 24, 'downloader/response_status_count/302': 8, 'downloader/response_status_count/500': 3, 'finish_reason': 'finished'....} </code></pre> <a href="https://stackoverflow.com/a/11071745/1599229">This answer</a>处理基于非下载程序的错误。

如何获得残缺的失败网址？

1 个回答

相关Python问题