刮痧
scrapyu的Python项目详细描述
刮痧
用户代理中间件
# settings.pyUSERAGENT_TYPE='firefox'DOWNLOADER_MIDDLEWARES={'scrapyu.UserAgentMiddleware':543,}
降价管道
^{pr2}$# items.pyimportscrapyclassMarkdownItem(scrapy.Item):html=scrapy.Field()filename=scrapy.Field()
FirefoxCookies中间件
# settings.pyGECKODRIVER_PATH='geckodriver'DOWNLOADER_MIDDLEWARES={'scrapyu.FirefoxCookiesMiddleware':543,}
MongoDBPipeline公司
# settings.pyMONGODB_URI='mongodb://localhost:27017'# or# MONGODB_HOST = 'localhost'# MONGODB_PORT = 27017MONGODB_DATABASE='scrapyu'MONGODB_COLLECTION='items'MONGODB_BUFFER_LENGTH=100MONGODB_UNIQUE_KEY='title name'# use only if no buffer# or# MONGODB_UNIQUE_KEY = ['title', 'name']# MONGODB_UNIQUE_KEY = ('title', 'name')ITEM_PIPELINES={'scrapyu.MongoDBPipeline':300,}
再重复过滤器
# settings.pyDUPEFILTER_CLASS='scrapyu.RedisDupeFilter'REDIS_DUPE_HOST='localhost'REDIS_DUPE_PORT=6379REDIS_DUPE_DATABASE=0REDIS_DUPE_PASSWORD='password'REDIS_DUPE_KEY='requests'REDIS_DUPE_IGNORE_URL=r'http://scrapytest.org/\d+'
根斯皮德
scrapyu genspider -l
结果:
Available templates: single single_splash
生成单个文件蜘蛛
scrapyu genspider python www.python.org -t single
生成单个文件spider,集成splash
scrapyu genspider python www.python.org -t single_splash
- 项目
标签: