Python scrapy-selenium包_程序模块 - PyPI

硒渣

scrapy-selenium的Python项目详细描述

#硒渣[pypi]（https://img.shields.io/pypi/v/scrapy selenium.svg）（https://pypi.python.org/pypi/scrapy-selenium）[！[构建状态]（https://travis-ci.org/clemfromspace/scrapy-selenium.svg？branch=master）（https://travis-ci.org/clemfromspace/scrapy-selenium）[！[测试覆盖率]（https://api.codeculate.com/v1/badges/5c737098dc38a835ff96/test-coverage）（https://codeculate.com/github/clemfromspace/scrapy-selenium/test-coverage）[！[可维护性]（https://api.codeculate.com/v1/badges/5c737098dc38a835ff96/maintability）（https://codeculate.com/github/clemfromspace/scrapy selenium/maintability）

scrapy中间件处理使用selenium的javascript页面。

=3.6**。
您还需要selenium[兼容浏览器]之一（http://www.seleniumhq.org/about/platforms.jsp）。

添加要使用的浏览器、驱动程序可执行文件的路径，以及要传递给可执行文件的参数到scrapy设置：
``python
from shuil import which

selenium驱动程序可执行程序路径=which（'geckodriver'）
selenium驱动程序参数=['-headless'].'--headless'如果使用chrome而不是firefox
```

或者，设置浏览器可执行文件的路径：
``python
selenium\u browser\u executable\u path=which（'firefox'）
````

2。将“seleniummiddleware”添加到下载程序中间软件：
``python
downloader\u middleware={
`scrapy\u selenium.seleniummiddleware'：800
}
``````
`usage
使用“scrapy\u selenium.selenium request”而不是下面这样的“scrapy内置”请求：
``python
selenium request

yield seleniumrequest（url，self.parse-result）
`````
请求将由selenium处理，请求将有一个名为“driver”的附加“meta”键，其中包含处理请求的selenium驱动程序。
``python
def parse-result（self，响应：
打印（response.request.meta['driver'].title）
````
有关可用驱动程序方法和属性的详细信息，请参阅[selenium python文档]（http://selenium python.readthedocs.io/api.html module selenium.webdriver.remote.webdriver）

selector响应属性正常工作（但包含由selenium驱动程序处理的html）。
``python
def parse廑result（self，响应：
print（response.selector.xpath（'///title/@text'））
````

``scrapy_selenium.seleniumrumrequest`接受4个附加参数：

` wait `//`wait `

br/>selenium使用时，将执行[显式等待]（http://selenium-python.readthedocs.io/waits/waits）selenium，seleni.html（显式等待）在返回对蜘蛛的响应之前。
``python
from selenium.webdriver.common.by import by
from selenium.webdriver.support import expected_conditions as ec

yield seleniumrequest（
url=url，
callback=self.parse_result，
wait_time=10，
wait_until=ec.element_to_be可单击（（by.id，‘someid’））
）

````

`
``截图`
``使用时，硒会截图一页，捕捉到的.png二进制数据会添加到响应``meta`：
```python
yielseleniumrequest（
url=url，
callbackback=self.parse=self.parse ` result，
screenshot=true
）

def parse `` result ````解析结果`
（自我，响应：
以open（'image.png'，'wb'）作为图像文件：
image_file.write（response.meta['screenshot']）
`````

````python
yield seleniumrequest（
url，
self.parse嫒result，
script=window.scrollto（0，document.body.scrollHeight）；'，
）

欢迎加入QQ群-->： 979659372

scrapy-selenium 0.0.7

scrapy-selenium的Python项目详细描述

推荐PyPI第三方库

ddtrace-graphql

highcompress

awsscripter

sample-sheet

apache-parser

borsdata-sdk

odoo9-addon-stock-quant-manual-assign

cwsp

django-gpxp

Events

deconvoluted

pyconcrete

supervisor-logging-gelf

lmso-algorithm

odoo8-addon-users-ldap-groups

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

scrapy-selenium 0.0.7

scrapy-selenium的Python项目详细描述

推荐PyPI第三方库

ddtrace-graphql

highcompress

awsscripter

sample-sheet

apache-parser

borsdata-sdk

odoo9-addon-stock-quant-manual-assign

cwsp

django-gpxp

Events

deconvoluted

pyconcrete

supervisor-logging-gelf

lmso-algorithm

odoo8-addon-users-ldap-groups

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签