我正在尝试从youtube教程等创建一个网络爬虫
我面临一个问题,我有一个不接受参数的类(其他类接受参数,结构或多或少相同)
类(crawler.py)。我使用的是__init__
,这里有3个参数
class Crawler:
# class variables are shared among all crawler instances
project_name = ''
home_url = ''
site_domain = ''
# use set to speed up read/write process
file_queue = ''
queue = set()
# use set to speed up read/write process
file_crawled = ''
crawled = set()
def __init__(self, project_name, home_url, site_domain):
Crawler.project_name = project_name
Crawler.home_url = home_url
Crawler.site_domain = site_domain
Crawler.file_queue = Crawler.project_name + '/links_on_queue.txt'
Crawler.file_crawled = Crawler.project_name + '/links_crawled.txt'
self.starter(Crawler.site_domain)
self.crawl_page('first_crawler', Crawler.home_url)
调用它的位置,在最后一行(main.py)
import threading
from queue import Queue
# import from files
from crawler import Crawler
from domain_finder import *
from general_crawler_functions import *
# like multiple group members doing different parts, the program
# creates multiple thread that works simultaneously
PROJECT_NAME = 'Demoblaze'
HOME_URL = 'https://www.demoblaze.com/'
DOMAIN_NAME = get_domain(HOME_URL)
FILE_QUEUE = PROJECT_NAME + '/links_on_queue.txt'
FILE_CRAWLED = PROJECT_NAME + '/links_crawled.txt'
THREAD_COUNT = 4
# queue of threads
queue = Queue()
Crawler(PROJECT_NAME, HOME_URL, DOMAIN_NAME)
错误表示该类不接受任何参数,给出了3个参数
如果有帮助的话,我正在windows上使用PyCharm社区
在main.py中,您已经从Crawler导入了Crawler
您应该导入您自己实现的爬虫程序。 使用
而不是
相关问题 更多 >
编程相关推荐