如何在Python中从YouTube链接中提取视频ID?

26 投票
14 回答
27421 浏览
提问于 2025-04-16 08:02

我知道可以很简单地使用PHP的 parse_urlparse_str 函数来完成这个任务:

$subject = "http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1";
$url = parse_url($subject);
parse_str($url['query'], $query);
var_dump($query);

但是用Python怎么做呢?我可以使用 urlparse,但接下来该怎么做呢?

14 个回答

15

这是Mikhail Kashkin的解决方案的Python3版本,里面增加了一些场景。

from urllib.parse import urlparse, parse_qs
from contextlib import suppress


# noinspection PyTypeChecker
def get_yt_id(url, ignore_playlist=False):
    # Examples:
    # - http://youtu.be/SA2iWivDJiE
    # - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
    # - http://www.youtube.com/embed/SA2iWivDJiE
    # - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
    query = urlparse(url)
    if query.hostname == 'youtu.be': return query.path[1:]
    if query.hostname in {'www.youtube.com', 'youtube.com', 'music.youtube.com'}:
        if not ignore_playlist:
        # use case: get playlist id not current video in playlist
            with suppress(KeyError):
                return parse_qs(query.query)['list'][0]
        if query.path == '/watch': return parse_qs(query.query)['v'][0]
        if query.path[:7] == '/watch/': return query.path.split('/')[2]
        if query.path[:7] == '/embed/': return query.path.split('/')[2]
        if query.path[:3] == '/v/': return query.path.split('/')[2]
   # returns None for invalid YouTube url

# unit test
@pytest.mark.parametrize(
    'url,expected_id',
    (
        ('https://youtu.be/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/watch?v=Dlxu28sQfkE&feature=youtu.be', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/watch/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/embed/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/v/Dlxu28sQfkE', 'Dlxu28sQfkE'),
        ('https://www.youtube.com/playlist?list=PLRbcUrcJVEmX_eaAsubNOWfE4SlhGqjW4', 'PLRbcUrcJVEmX_eaAsubNOWfE4SlhGqjW4'),
    ),
)
def test_yt_id(url, expected_id):
    assert get_yt_id(url) == expected_id
65

我创建了一个可以解析YouTube视频ID的工具,而且没有使用正则表达式:

import urlparse

def video_id(value):
    """
    Examples:
    - http://youtu.be/SA2iWivDJiE
    - http://www.youtube.com/watch?v=_oPAwA_Udwc&feature=feedu
    - http://www.youtube.com/embed/SA2iWivDJiE
    - http://www.youtube.com/v/SA2iWivDJiE?version=3&hl=en_US
    """
    query = urlparse.urlparse(value)
    if query.hostname == 'youtu.be':
        return query.path[1:]
    if query.hostname in ('www.youtube.com', 'youtube.com'):
        if query.path == '/watch':
            p = urlparse.parse_qs(query.query)
            return p['v'][0]
        if query.path[:7] == '/embed/':
            return query.path.split('/')[2]
        if query.path[:3] == '/v/':
            return query.path.split('/')[2]
    # fail?
    return None
48

Python有一个库可以用来解析网址。

import urlparse
url_data = urlparse.urlparse("http://www.youtube.com/watch?v=z_AbfPXTKms&NR=1")
query = urlparse.parse_qs(url_data.query)
video = query["v"][0]

撰写回答