Python Shell无法运行Scrapy
我在64位的Windows Vista上运行的是Python.org的2.7版本,目的是使用Scrapy这个工具。我有一些代码在命令行中运行得很好(除了命令行不识别非Unicode字符的问题),但是当我试图通过Python的IDLE运行这个脚本时,出现了以下错误信息:
Warning (from warnings module):
File "C:\Python27\mrscrap\mrscrap\spiders\test.py", line 24
class MySpider(BaseSpider):
ScrapyDeprecationWarning: __main__.MySpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)
导致这个错误的代码是:
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
import re
class MySpider(BaseSpider):
name = "wiki"
allowed_domains = ["wikipedia.org"]
start_urls = ["http://en.wikipedia.org/wiki/Asia"]
def parse(self, response):
titles = response.selector.xpath("normalize-space(//title)")
for titles in titles:
body = response.xpath("//p").extract()
body2 = "".join(body)
print remove_tags(body2)
首先,为什么在命令行中可以正常工作,而在IDLE中却出错呢?其次,当我按照错误提示的指示,把代码中所有的BaseSpider替换成'蜘蛛'(Spider)后,代码在Python的IDLE中运行了,但什么也没发生。没有错误,也没有任何日志输出,没有警告,什么都没有。
有没有人能告诉我,为什么这个修改后的代码在Python IDLE中没有输出结果呢?
谢谢
1 个回答
1
在你的导入部分加上 from scrapy.cmdline import execute
这一行。
然后在你的代码里放上 execute(['scrapy','crawl','wiki'])
,接着运行你的脚本。
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
import re
from scrapy.cmdline import execute
class MySpider(Spider):
name = "wiki"
allowed_domains = ["wikipedia.org"]
start_urls = ["http://en.wikipedia.org/wiki/Asia"]
def parse(self, response):
titles = response.selector.xpath("normalize-space(//title)")
for title in titles:
body = response.xpath("//p").extract()
body2 = "".join(body)
print remove_tags(body2)
execute(['scrapy','crawl','wiki'])