使用美丽汤 4 Python 进行网络抓取

<table class="table table-condensed table-hover tenlaces tablesorter"> <thead> <tr> <th class="al">Language</th> <th class="ac">Link</th> </tr> </thead> <tbody> <tr> <td class="tdidioma"><span class="flag flag_0">0</span></td> <td class="tdenlace"><a class="btn btn-mini enlace_link" data-servidor="42" rel="nofollow" target="_blank" title="Ver..." href="LINK I WANT TO SAVE0"><i class="icon-play"></i>  Ver</a></td> </tr> <tr> <td class="tdidioma"><span class="flag flag_1">1</span></td> <td class="tdenlace"><a class="btn btn-mini enlace_link" data-servidor="42" rel="nofollow" target="_blank" title="Ver..." href="LINK I WANT TO SAVE1"><i class="icon-play"></i>  Ver</a></td> </tr> <tr> <td class="tdidioma"><span class="flag flag_2">2</span></td> <td class="tdenlace"><a class="btn btn-mini enlace_link" data-servidor="42" rel="nofollow" target="_blank" title="Ver..." href="LINK I WANT TO SAVE2"><i class="icon-play"></i>  Ver</a></td> </tr> </tbody> </table>

3条回答

网友

1楼 · 编辑于 2024-04-24 15:08:39

试试这样的方法：

result = None
for row in soup.tbody.find_all('tr'):
    lang, link = row.find_all('td')
    if lang.string == '1':
        result = link.a['href']
print result

网友

2楼 · 编辑于 2024-04-24 15:08:39

试着用这样的汤，也许你需要一些异常处理

trs = soup.select('tr') # here trs is a list of bs4.element.Tag type element

现在迭代列表

^{pr2}$

网友

3楼 · 编辑于 2024-04-24 15:08:39

我假设您想检查URL是否包含1，如果包含，请保存它。这是你想要的吗？在

您可以尝试使用以下代码：

soup = BeautifulSoup(YOUR_TEXT_HERE)
tbody_soup = soup.find('tbody')
links = tbody_soup.find_all('a')
links_to_save = []

for item in links:
    print item.attrs['href'] # prints the url
    print item.get_text() # prints the text of the link
    print item.attrs # prints a dictionary with all the attributes

    # check if 1 is in url?
    if '1' in item.attrs['href']:
        links_to_save.append(item.attrs['href'])

print links_to_save

相关问题更多 >

编程相关推荐

热门问题

热门文章