从类中的Bs4继承方法

2021-11-29 23:32:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试启动OOP,并决定以这种方式重写一个脚本。一个网页有一个链接框,我想保存,所以我做以下代码

class webpage(BeautifulSoup):

    def __init__(self, link, html, links):

        self.link = link
        driver = webdriver.PhantomJS()
        driver.get(link)
        self.html = driver.page_source
        self.links = []

    def forty_pages(self):

        soup = BeautifulSoup(html, 'html.parser')
        link_box = soup.find('div', {'id': 'sliderBottom'})
        rest = link_box.find_all('a')
        forty_links = []

        for i in rest:
            try:
                link = i.get('href')
                forty_links.append(link)
            except:
                pass
        self.links.append(x for x in forty_links)

test = webpage(link=root)
test.forty_pages()

问题是它说

TypeError: module.__init__() takes at most 2 arguments (3 given)

当驱动程序返回包含html数据的字符串时,self.html应该会自动填充。有人能解释一下吗? EDIT:我被告知不需要合成,但是我不能从类内调用模块Bs4,所以我被困在如何实现这个。。。例如:

class rightmove_page(object):

    def __init__(self, link):
        self.link = link

    def forty_pages(self):
        driver = webdriver.PhantomJS()
        html = driver.get(self.link)
        soup = BeautifulSoup(html, 'html.parser')
        print(soup)

给出错误:

Traceback (most recent call last):
  File "/home/sn/Documents/Projects/House_Prices/class_pased.py", line 21, in <module>
    test.forty_pages()
  File "/home/sn/Documents/Projects/House_Prices/class_pased.py", line 17, in forty_pages
    soup = BeautifulSoup(html, 'html.parser')
TypeError: 'module' object is not callable