一个简单的分布式网络爬虫
simplified-scrap的Python项目详细描述
简化肉屑
简化的scrapy,一个简单的网络爬虫
要求
- Python 2.7,3.0+
- 适用于Linux、Windows、Mac OSX、BSD
跑
from simplified_scrapy.simplified_main import SimplifiedMain
SimplifiedMain.startThread()
演示
自定义爬虫类需要扩展Spider类
^{pr2}$下面是一个收集数据的示例
from simplified_scrapy import Spider, SimplifiedDoc, SimplifiedMain
class DemoSpider(Spider):
name = 'demo-spider'
start_urls = ['http://quotes.toscrape.com/']
allowed_domains = ['www.toscrape.com']
def extract(self, url, html, models, modelNames):
doc = SimplifiedDoc(html)
lstA = doc.listA(url=url["url"])
return [{"Urls": lstA, "Data": None}]
SimplifiedMain.startThread(DemoSpider())
pip安装
pip install simplified-scrapy
法律问题
特别是,请注意
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
翻译成人类语言:
如果您使用本软件构成侵犯版权的依据,或您将本软件用于任何其他非法目的,作者不承担任何责任。在
我们只在这里发布代码,您将如何使用它由您自己决定。在
- 项目
标签: