在python中循环以从URL获取标题标记

2024-05-12 20:09:06 发布

您现在位置:Python中文网/ 问答频道 /正文

import urllib.request as urllib2
from bs4 import BeautifulSoup

a = "https://player.vimeo.com/video/1234"

soup = BeautifulSoup(urllib2.urlopen(a))
print (a + soup.title.string)

我想通过一个循环获取URL的标题,每次获取URL标题时,循环都会添加一个数字

我得到了https://player.vimeo.com/video/1234的标题,然后是https://player.vimeo.com/video/1235等等


Tags: fromhttpsimportcomurl标题requestas
2条回答

如果你有更多的url,添加到lst。你得到了所有的标题。您可以尝试以下脚本:

import urllib.request as urllib2
from bs4 import BeautifulSoup

lst = ["https://player.vimeo.com/video/1234","https://player.vimeo.com/video/1235"]
title = []
for a in lst:    
    soup = BeautifulSoup(urllib2.urlopen(a), 'lxml')
    title.append(soup.title.string)

print(title)

输出将是:

['Diving catch from Chris Bodenner on Vimeo', 'Hit with box from Chris Bodenner on Vimeo']

import urllib.request as urllib2
from bs4 import BeautifulSoup

lst = ["https://player.vimeo.com/video/1234","https://player.vimeo.com/video/1235"]
title = []
for a in lst:    
    soup = BeautifulSoup(urllib2.urlopen(a), 'lxml')
    title.append(soup.title.string)
    print (a + " : " + soup.title.string)

输出将是:

https://player.vimeo.com/video/1234 : Diving catch from Chris Bodenner on Vimeo
https://player.vimeo.com/video/1235 : Hit with box from Chris Bodenner on Vimeo

您可以这样做:

import urllib.request as urllib2
from bs4 import BeautifulSoup

start_idx, end_idx = 1234, 1245

for idx in range(start_idx, end_idx):
  a = f"https://player.vimeo.com/video/{idx}"
  soup = BeautifulSoup(urllib2.urlopen(a))
  print (f"for url:{a}, title: {soup.title.string}")

根据需要正确设置start_idxend_idx

另外,您可能需要处理由于禁止访问某些URL而可能出现的HTTPError

相关问题 更多 >