Python skyscraper包_程序模块 - PyPI

基于yaml的轻量级爬虫

skyscraper的Python项目详细描述

基于yaml的轻量级爬虫程序

安装

pip install skyscraper

用法

每个网络爬虫程序都在yml文件中定义

# the name of the crawler
name: Python 3.x docs
# the number of parallel thread workers
threads: 3

# start urls
params:
  start_url: https://docs.python.org/3/index.html

# how/where the results are saved
results:
  type: Json
  file: "python.json"

# on each url labeled "result", results will be extracted using
# this scheme
result_extractor:
  fields:
  - name: title
    rules:
      select: h1
      text: yes
      single: true


# the first page is labeled "start" and for each extracted url, we label it
# accordingly. In this example, we extract the results directly from
# the first page
steps:
- name: start
  label: start
  extract:
  - type: ahrefs
    label: result
    rules:
      select: a.biglink

要运行爬虫程序，请执行

skyscraper run examples/python_docs.yaml

欢迎加入QQ群-->： 979659372

skyscraper 0.0.5

skyscraper的Python项目详细描述

安装

用法

推荐PyPI第三方库

Patan

dataswissknife

fastapi-cli

yeaseq

torchhelper

poetry-notebook

pyaugment

flask-easy-login

TracVisualization

itly-plugin-snowplow

django-elasticsearch-model-binder

dnk-distributions

pms-stats

dataprocessor

zheng

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

skyscraper 0.0.5

skyscraper的Python项目详细描述

安装

用法

推荐PyPI第三方库

Patan

dataswissknife

fastapi-cli

yeaseq

torchhelper

poetry-notebook

pyaugment

flask-easy-login

TracVisualization

itly-plugin-snowplow

django-elasticsearch-model-binder

dnk-distributions

pms-stats

dataprocessor

zheng

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签