用Python实现HTML列表的Web抓取

2024-04-28 20:30:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用python刮取一个html列表以返回列表链接

以下是html:

<ul class="sub-menu">
<li class="menu-item-362992" id="menu-item-362993"><a href="https://www.test.com/">test</a></li>
<li class ="menu-item-362994" id="menu-item-362995"><a href="https://www.test2.com/">test2</a></li>
<li class ="menu-item-362995" id="menu-item-362996"><a href="https://www.test3.com/">test3</a></li>
</ul>

如何提取每个链接的href链接

quotes = []

table = table.find('ul', attrs={'class': 'sub-menu'})

for row in table.find_all('li', attrs={'class'}):
    quote = {}
    quote['url'] = row.a['href']
    quotes.append(quote)

for i in quotes:
   print(i)

如何在不指定单个id的情况下返回每个li类


Tags: httpscomid链接htmlwwwtableli
3条回答

试试看:

quotes = []

for row in table.find_all('li'):
    quote = {}
    quote['url'] = row.a['href']
    quotes.append(quote)

IIUC,你可以试试这个:

quotes = []

table = table.find('ul', attrs={'class': 'sub-menu'})

for row in table.find_all('li'):
    quote = {}
    quote[row.attrs['class'][0]] = row.a['href']
    quotes.append(quote)
#Same as: (list comprehension version) 
#table = table.find('ul', attrs={'class': 'sub-menu'})

#quotes=[{row.attrs['class'][0]:row.a['href']} for row in table.find_all('li')]

输出:

quotes
[{'menu-item-362992': 'https://www.test.com/'}, {'menu-item-362994': 'https://www.test2.com/'}, {'menu-item-362995': 'https://www.test3.com/'}]

或者,如果您想要单独的列表:

table = table.find('ul', attrs={'class': 'sub-menu'})

quotes=[{'url':row.a['href']} for row in table.find_all('li')]

ilclassnames=[row.attrs['class'][0] for row in table.find_all('li')]

输出:

quotes
[{'url': 'https://www.test.com/'}, {'url': 'https://www.test2.com/'}, {'url': 'https://www.test3.com/'}]

ilclassnames
['menu-item-362992', 'menu-item-362994', 'menu-item-362995']

这只是找到它们的方法,但我不知道你想如何标记它们或列出它们

soup = BeautifulSoup(htmlresponse, 'lxml')
links = soup.select('ul.sub-menu li a')
for link in links:
    print('url:', link.get('href'))
liclass = soup.select('ul.sub-menu li')
for lc in liclass:
    print(lc.get('class'))

相关问题 更多 >