Python没有链接列表

from BeautifulSoup import BeautifulSoup import urllib2 from lxml import html import requests #Open site html_page = urllib2.urlopen("http://www.sitetoscrape.ch/somesite.aspx") #Inform BeautifulSoup soup = BeautifulSoup(html_page) #Search for the specific links for link in soup.findAll('a', href=re.compile('/d/part/of/thelink/ineed.aspx')): #print found links print link.get('href') #complete links complete_links = 'http://www.sitetoscrape.ch' + link.get('href') #print complete links print complete_links # #EVERYTHING WORKS FINE TO THIS POINT # page = requests.get(complete_links) tree = html.fromstring(page.text) #Details name = tree.xpath('//dl[@class="services"]') for i in name: print i.text_content()

1条回答

网友

1楼 · 发布于 2024-04-25 03:57:35

我认为您需要的是complete_links中的链接列表，而不是单个链接。正如@Pynchia和@lemonhead所说的，您正在覆盖first for循环的每个迭代。你知道吗

您需要两个更改：

将链接附加到列表，并使用它循环和废弃每个链接

# [...] Same code here

links_list = []
for link in soup.findAll('a', href=re.compile('/d/part/of/thelink/ineed.aspx')):
    print link.get('href')
    complete_links = 'http://www.sitetoscrape.ch' + link.get('href')
    print complete_links
    link_list.append(complete_links)  # append new link to the list

在另一个循环中废弃每个累积的链接

for link in link_list:
    page = requests.get(link)
    tree = html.fromstring(page.text)

    #Details
    name = tree.xpath('//dl[@class="services"]')

    for i in name:
        print i.text_content()

附言：我推荐scrapy framework这样的任务。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章