Python使用集合来降低复杂性

2024-04-23 14:38:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用来自spotifyAPI(wrapper spotipy,with sp.)的url_analysis工具来处理轨迹,使用以下代码:

def loudness_drops(track_ids):

names = set()
tids = set()
tracks_with_drop_name = set()
tracks_with_drop_id = set()

for id_ in track_ids:
    track_id = sp.track(id_)['uri']
    tids.add(track_id)
    track_name = sp.track(id_)['name']
    names.add(track_name)
    #get audio features
    features = sp.audio_features(tids)
    #and then audio analysis id
    urls = {x['analysis_url'] for x in features if x}
    print len(urls)
    #fetch analysis data
    for url in urls:
        # print len(urls)
        analysis = sp._get(url)
        #extract loudness sections from analysis
        x = [_['start'] for _ in analysis['segments']]
        print len(x)
        l = [_['loudness_max'] for _ in analysis['segments']]
        print len(l)
        #get max and min values
        min_l = min(l)
        max_l = max(l)
        #normalize stream
        norm_l = [(_ - min_l)/(max_l - min_l) for _ in l]
        #define silence as a value below 0.1
        silence = [l[i] for i in range(len(l)) if norm_l[i] < .1]
    #more than one silence means one of them happens in the middle of the track
    if len(silence) > 1:
        tracks_with_drop_name.add(track_name)
        tracks_with_drop_id.add(track_id)
return tracks_with_drop_id

代码可以工作,但是如果歌曲的数量I search设置为,比如说limit=20,那么处理所有audio segmentsxl所需的时间会使处理成本过高,例如:

time.time()打印452.175742149

问题:

我怎样才能大幅度降低这里的复杂性?你知道吗

我试过用sets代替lists,但是用setobjects禁止indexing。你知道吗

编辑:10urls

[u'https://api.spotify.com/v1/audio-analysis/5H40slc7OnTLMbXV6E780Z', u'https://api.spotify.com/v1/audio-analysis/72G49GsqYeWV6QVAqp4vl0', u'https://api.spotify.com/v1/audio-analysis/6jvFK4v3oLMPfm6g030H0g', u'https://api.spotify.com/v1/audio-analysis/351LyEn9dxRxgkl28GwQtl', u'https://api.spotify.com/v1/audio-analysis/4cRnjBH13wSYMOfOF17Ddn', u'https://api.spotify.com/v1/audio-analysis/2To3PTOTGJUtRsK3nQemP4', u'https://api.spotify.com/v1/audio-analysis/4xPRxqV9qCVeKLQ31NxhYz', u'https://api.spotify.com/v1/audio-analysis/1G1MtHxrVngvGWSQ7Fj4Oj', u'https://api.spotify.com/v1/audio-analysis/3du9aoP5vPGW1h70mIoicK', u'https://api.spotify.com/v1/audio-analysis/6VIIBKYJAKMBNQreG33lBF']

Tags: nameinhttpscomapiidforlen
1条回答
网友
1楼 · 发布于 2024-04-23 14:38:14

这就是我看到的,对spotify了解不多:

for id_ in track_ids:
    # this runs N times, where N = len(track_ids)
    ...
    tids.add(track_id)  # tids contains all track_ids processed until now
    # in the end: len(tids) == N
    ...
    features = sp.audio_features(tids)
    # features contains features of all tracks processed until now
    # in the end, I guess: len(features) == N * num_features_per_track

    urls = {x['analysis_url'] for x in features if x}
    # very probably: len(urls) == len(features)

    for url in urls:
        # for the first track, this processes features of the first track only
        # for the seconds track, this processes features of 1st and 2nd
        # etc.
        # in the end, this loop repeats N * N * num_features_per_track times

你不应该有任何网址两次。你做到了,因为你把所有的磁道都保存在tids中,然后对每个磁道,你都在tids中处理所有的东西,这就把它的复杂性变成了O(n2)。你知道吗

通常,在试图降低复杂性时,总是在循环中寻找循环。你知道吗

我相信在这种情况下,如果audio_features需要一组ID,那么这应该是可行的:

# replace this: features = sp.audio_features(tids)
# with:
features = sp.audio_features({track_id})

相关问题 更多 >