如何在Python中从YouTube链接中提取视频ID?
我知道可以很简单地使用PHP的 parse_url
和 parse_str
函数来完成这个任务:
$subject = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1";
$url = parse_url($subject);
parse_str($url['query'], $query);
var_dump($query);
但是用Python怎么做呢?我可以使用 urlparse
,但接下来该怎么做呢?
14 个回答
15
这是Mikhail Kashkin的解决方案的Python3版本,里面增加了一些场景。
from urllib.parse import urlparse, parse_qs
from contextlib import suppress
# noinspection PyTypeChecker
def get_yt_id(url, ignore_playlist=False):
# Examples:
# - http://youtu.be/SA2iWivDJiE
# - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
# - http://www.youtube.com/embed/SA2iWivDJiE
# - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
query = urlparse(url)
if query.hostname == 'youtu.be': return query.path[1:]
if query.hostname in {'www.youtube.com', 'youtube.com', 'music.youtube.com'}:
if not ignore_playlist:
# use case: get playlist id not current video in playlist
with suppress(KeyError):
return parse_qs(query.query)['list'][0]
if query.path == '/watch': return parse_qs(query.query)['v'][0]
if query.path[:7] == '/watch/': return query.path.split('/')[2]
if query.path[:7] == '/embed/': return query.path.split('/')[2]
if query.path[:3] == '/v/': return query.path.split('/')[2]
# returns None for invalid YouTube url
# unit test
@pytest.mark.parametrize(
'url,expected_id',
(
('https://youtu.be/Dlxu28sQfkE', 'Dlxu28sQfkE'),
('https://www.youtube.com/watch?v=Dlxu28sQfkE&feature=youtu.be', 'Dlxu28sQfkE'),
('https://www.youtube.com/watch/Dlxu28sQfkE', 'Dlxu28sQfkE'),
('https://www.youtube.com/embed/Dlxu28sQfkE', 'Dlxu28sQfkE'),
('https://www.youtube.com/v/Dlxu28sQfkE', 'Dlxu28sQfkE'),
('https://www.youtube.com/playlist?list=PLRbcUrcJVEmX_eaAsubNOWfE4SlhGqjW4', 'PLRbcUrcJVEmX_eaAsubNOWfE4SlhGqjW4'),
),
)
def test_yt_id(url, expected_id):
assert get_yt_id(url) == expected_id
65
我创建了一个可以解析YouTube视频ID的工具,而且没有使用正则表达式:
import urlparse
def video_id(value):
"""
Examples:
- http://youtu.be/SA2iWivDJiE
- http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
- http://www.youtube.com/embed/SA2iWivDJiE
- http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
"""
query = urlparse.urlparse(value)
if query.hostname == 'youtu.be':
return query.path[1:]
if query.hostname in ('www.youtube.com', 'youtube.com'):
if query.path == '/watch':
p = urlparse.parse_qs(query.query)
return p['v'][0]
if query.path[:7] == '/embed/':
return query.path.split('/')[2]
if query.path[:3] == '/v/':
return query.path.split('/')[2]
# fail?
return None
48
Python有一个库可以用来解析网址。
import urlparse
url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
query = urlparse.parse_qs(url_data.query)
video = query["v"][0]