如何抓取Spotify排行榜网站?
我正在尝试从Spotify Charts网站上抓取数据,以获取某个城市每周的热门歌曲,具体可以在这个链接查看:https://charts.spotify.com/charts/view/citytoptrack-barcelona-weekly/2024-02-29
我使用了以下脚本,希望能获取到热门歌曲的信息:
import requests
url = 'https://charts.spotify.com/charts/view/citytoptrack-barcelona-weekly/2022-02-29'
response = requests.get(url)
print(response.text)
但是,我收到的却不是歌曲信息,而是HTML代码:
<!DOCTYPE html><html lang="en" class="no-touchevents"><head><meta charSet="utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=edge"/><link rel="shortcut icon" href="/spotifycharts.svg"/><link rel="preload" as="font" crossorigin="crossorigin" type="font/woff2" href="https://encore.scdn.co/1.2.3/CircularSpotify-UI-Latin-OS2v3-Book.woff2"/><link rel="preload" as="font" crossorigin="crossorigin" type="font/woff2" href="https://encore.scdn.co/1.2.3/CircularSpotify-UI-Latin-OS2v3-Bold.woff2"/><link rel="preload" as="font" crossorigin="crossorigin" type="font/woff2" href="https://encore.scdn.co/1.2.3/CircularSpotify-UI-Latin-OS2v3-Black.woff2"/><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"/><title>Spotify Charts - Spotify Charts are made by fans</title><meta name="description" content="The new home for Spotify Charts. Dive into artist, genre, city and local pulse charts to see what music is moving fans around the world."/><meta property="og:title" content="Spotify Charts - Spotify Charts are made by fans"/><meta property="og:description" content="The new home for Spotify Charts. Dive into artist, genre, city and local pulse charts to see what music is moving fans around the world."/><meta property="og:image" content="https://charts-images.scdn.co/csc_assets/thumbnails/embed_thumbnail_1200x630.png"/><meta name="google-site-verification" content="ruWC_F0SbT2WKsLaxexOhIEBbc8MpAqW2mNLiIIEnMs"/><meta name="next-head-count" content="8"/><link rel="preload" href="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/css/d0a9f291d2c82e84.css" as="style"/><link rel="stylesheet" href="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/css/d0a9f291d2c82e84.css" data-n-g=""/><link rel="preload" href="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/css/703a0d12455970a3.css" as="style"/><link rel="stylesheet" href="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/css/703a0d12455970a3.css" data-n-p=""/><noscript data-n-css=""></noscript><script defer="" nomodule="" src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/polyfills-5cd94c89d3acac5f.js"></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/webpack-9b29c3c96fdf5444.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/framework-dc33c0b5493501f0.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/main-a10220fda36c7b79.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/pages/_app-0ec0d406787e20cc.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/75fc9c18-6268e8a2f6ae8a14.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/698-086819819b79a4da.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/759-dd38244401bcc767.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/660-dc430211873188fc.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/341-5c935b2bc4f7debe.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/320-4f7ad9ba7395cc61.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/94-75209f18322b7720.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/46-748776ea9ef77216.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/831-4587d0a6ceefd5a9.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/214-331a529acae1319d.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/chunks/pages/charts/view/%5Balias%5D/%5Bdate%5D-6239ccf34bc9ebed.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/llzNbGcUHoUB60ACr0WBF/_buildManifest.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/llzNbGcUHoUB60ACr0WBF/_ssgManifest.js" defer=""></script><script src="https://mrkt-web.scdn.co/charts-spotify-com/_next/static/llzNbGcUHoUB60ACr0WBF/_middlewareManifest.js" defer=""></script><style data-styled="" data-styled-version="5.3.5">.bucGtk{box-sizing:border-box;font-family:var(--font-family,spotify-circular),Helvetica,Arial,sans-serif;-webkit-tap-highlight-color:transparent;padding-inline-start:0;padding-inline-end:0;margin-block-start:0;margin-block-end:0;overflow-wrap:break-word;list-style-type:none;}/*!sc*/
data-styled.g1[id="List-sc-64p3hb-0"]{content:"bucGtk,"}/*!sc*/
.kKJlmK{box-sizing:border-box;font-family:var(--font-family,spotify-circular),Helvetica,Arial,sans-serif;-webkit-tap-highlight-color:transparent;list-style-type:none;}/*!sc*/
data-styled.g4[id="ListItem-sc-14osqn3-0"]{content:"kKJlmK,"}/*!sc*/
.lfGOlT{box-sizing:border-box;font-family:var(--font-family,spotify-circular),Helvetica,Arial,sans-serif;-webkit-tap-highlight-color:transparent;margin-block-start:0;margin-block-end:0;font-size:0.875rem;line-height:1.25rem;font-weight:400;color:inherit;}/*!sc*/
@media (min-width:768px){.lfGOlT{font-size:0.875rem;line-height:1.25rem;text-transform:none;-webkit-letter-spacing:normal;-moz-letter-spacing:normal;-ms-letter-spacing:normal;letter-spacing:normal;}}/*!sc*/
data-styled.g6[id="Type__TypeElement-goli3j-0"]{content:"lfGOlT,"}/*!sc*/
@media (min-width:768px){.kEOKET{position:fixed;-webkit-block-size:100%;-ms-flex-block-size:100%;block-size:100%;top:0;inline-size:0;left:0;z-index:1030;}[dir='rtl'] .Aside-sc-1wnswz1-0{left:unset;right:0;}}/*!sc*/
.docs-story .Aside-sc-1wnswz1-0{position:relative;}/*!sc*/
@media (min-width:768px){.docs-story .Aside-sc-1wnswz1-0{position:absolute;}}/*!sc*/
data-styled.g12[id="Aside-sc-1wnswz1-0"]{content:"kEOKET,"}/*!sc*/
.gSnYRE{position:-webkit-sticky;position:sticky;top:0;z-index:1030;}/*!sc*/
@media (max-width:767px){.gSnYRE{margin-inline-start:-24px;margin-inline-end:-24px;}}/*!sc*/
@media (min-width:768px){.gSnYRE{margin-inline-start:-32px;margin-inline-end:-32px;}}/*!sc*/
data-styled.g13[id="Banner-sc-1bnzyty-0"]{content:"gSnYRE,"}/*!sc*/
.krZEp{box-sizing:border-box;font-family:var(--font-family,spotify-circular),Helvetica,Arial,sans-serif;-webkit-tap-highlight-color:transparent;position:absolute;top:0;left:0;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;inline-size:100%;min-block-size:100%;background:var(--background-base,#ffffff);color:var(--text-base,#000000);overflow-wrap:break-word;}/*!sc*/
data-styled.g14[id="Container-c1ixcy-0"]{content:"krZEp,"}/*!sc*/
.jyvkLv{-webkit-flex:1;-ms-flex:1;flex:1;}/*!sc*/
data-styled.g15[id="Content-sc-1n5ckz4-0"]{content:"jyvkLv,"}/*!sc*/
.flXzSu{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-flex:1;-ms-flex:1;flex:1;}/*!sc*/
@media (max-width:767px){.flXzSu{padding-inline-start:24px;padding-inline-end:24px;}}/*!sc*/
@media (min-width:768px){.flXzSu{padding-inline-start:32px;padding-inline-end:32px;margin-inline-start:0;max-inline-size:100%;}}/*!sc*/
data-styled.g16[id="Main-tbtyrr-0"]{content:"flXzSu,"}/*!sc*/
.fdgJxn{box-sizing:border-box;font-family:var(--font-family,spotify-circular),Helvetica,Arial,sans-serif;-webkit-tap-highlight-color:transparent;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;min-inline-size:0;padding-block-start:24px;padding-block-end:24px;border-block-start:1px solid var(--decorative-subdued,#dedede);}/*!sc*/
@media (min-width:768px){.fdgJxn{padding-inline-start:8px;padding-inline-end:8px;}}/*!sc*/
.fdgJxn nav{min-inline-size:0;}/*!sc*/
.fdgJxn >:last-child{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;-webkit-box-pack:end;-webkit-justify-content:flex-end;-ms-flex-pack:end;justify-content:flex-end;-webkit-box-flex:1;-webkit-flex-grow:1;-ms-flex-positive:1;flex-grow:1;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}/*!sc*/
data-styled.g19[id="Container-sc-79vijq-0"]{content:"fdgJxn,"}/*!sc*/
.bVuthO{padding-block-end:0;-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;font-size:0.6875rem;line-height:1rem;font-weight:400;}/*!sc*/
@media (min-width:768px){.bVuthO{font-size:0.6875rem;line-height:1rem;text-transform:none;-webkit-letter-spacing:normal;-moz-letter-spacing:normal;-ms-letter-spacing:normal;letter-spacing:normal;}}/*!sc*/
@media (max-width:767px){}/*!sc*/
@media (min-width:768px){}/*!sc*/
data-styled.g20[id="Copyright-sc-15d7gge-0"]{content:"bVuthO,"}/*!sc*/
.eXPRFh{min-inline-size:0;color:var(--text-base,#000000);}/*!sc*/
@media (max-width:767px){.eXPRFh{padding-block-end:4px;padding-block-start:4px;}.eXPRFh:last-child{padding-block-end:0;}}/*!sc*/
@media (min-width:768px){.eXPRFh{padding-inline-end:48px;}.eXPRFh:last-child{-webkit-box-flex:1;-webkit-flex-grow:1;-ms-flex-positive:1;flex-grow:1;padding-inline-end:0;}}/*!sc*/
data-styled.g21[id="Item-sc-16ne99o-0"]{content:"eXPRFh,"}/*!sc*/
.hPPrfs{-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;}/*!sc*/
@media (min-width:768px){.hPPrfs{-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-box-pack:start;-webkit-justify-content:flex-start;-ms-flex-pack:start;justify-content:flex-start;}}/*!sc*/
data-styled.g22[id="List-vnil8g-0"]{content:"hPPrfs,"}/*!sc*/
.bGswfQ{box-sizing:border-box;font-family:var(--font-family,spotify-circular),Helvetica,Arial,sans-serif;-webkit-tap-highlight-color:transparent;color:var(--text-base,#000000);-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;-webkit-text-decoration:none;text-decoration:none;color:var(--text-base,#000000);display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;}/*!sc*/
.bGswfQ[href],.bGswfQ:hover:not([disabled]){-webkit-text-decoration:underline;text-decoration:underline;}/*!sc*/
.bGswfQ[href]:hover{-webkit-text-decoration:underline;text-decoration:underline;color:var(--text-base,#000000);}/*!sc*/
.bGswfQ[href]:focus{outline:none;box-shadow:0 3px 0 0;-webkit-transition:box-shadow 200ms ease-in;transition:box-shadow 200ms ease-in;}/*!sc*/
.bGswfQ[href]:focus.Link-k8gsk-0[href]:focus{-webkit-text-decoration:none;text-decoration:none;}/*!sc*/
.bGswfQ[href]:active{color:var(--text-bright-accent,#117a37);}/*!sc*/
.bGswfQ[disabled],.bGswfQ[href][disabled],.bGswfQ[aria-disabled='true']{color:var(--text-subdued,#6a6a6a);opacity:0.3;cursor:not-allowed;}/*!sc*/
.bGswfQ,.bGswfQ[href]{-webkit-text-decoration:none;text-decoration:none;}/*!sc*/
.bGswfQ[href]:hover,.bGswfQ[href]:hover:focus{-webkit-text-decoration:underline;text-decoration:underline;color:var(--text-subdued,#6a6a6a);}/*!sc*/
.bGswfQ[href]:focus{-webkit-text-decoration:none;text-decoration:none;color:var(--text-base,#000000);}/*!sc*/
.bGswfQ[href]:active{-webkit-text-decoration:underline;text-decoration:underline;color:var(--text-base,#000000);}/*!sc*/
.bGswfQ[disabled],.bGswfQ[href][disabled],.bGswfQ[aria-disabled='true']{-webkit-text-decoration:none;text-decoration:none;}/*!sc*/
data-styled.g23[id="Link-k8gsk-0"]{content:"bGswfQ,"}/*!sc*/
.cMCIWa{display:block;font-size:0.6875rem;line-height:1rem;font-weight:700;}/*!sc*/
@media (min-width:768px){.cMCIWa{font-size:0.6875rem;line-height:1rem;text-transform:none;-webkit-letter-spacing:normal;-moz-letter-spacing:normal;-ms-letter-spacing:normal;letter-spacing:normal;}}/*!sc*/
@media (max-width:767px){}/*!sc*/
@media (min-width:768px){}/*!sc*/
@media all and (-ms-high-contrast:none),(-ms-high-contrast:active){.cMCIWa{font-weight:400;}}/*!sc*/
@supports (-ms-ime-align:auto){.cMCIWa{font-weight:400;}}/*!sc*/
data-styled.g24[id="Link-fe80qw-0"]{content:"cMCIWa,"}/*!sc*/
*{box-sizing:border-box;}/*!sc*/
*::before,*::after{box-sizing:border-box;}/*!sc*/
body{font-family:spotify-circular,Helvetica,Arial,sans-serif;margin:0;}/*!sc*/
html,body{height:100%;}/*!sc*/
data-styled.g135[id="sc-global-cKDjoD1"]{content:"sc-global-cKDjoD1,"}/*!sc*/
@media (min-width:768px){.btVmFl{padding-left:48px;padding-right:48px;}}/*!sc*/
@media (max-width:767px){.btVmFl{padding-left:16px;padding-right:16px;}}/*!sc*/
data-styled.g224[id="ChartsFooter__StyledAppFooter-sc-68p0np-0"]{content:"btVmFl,"}/*!sc*/
@font-face{font-family:spotify-circular;src:url('https://encore.scdn.co/1.2.3/CircularSpotify-UI-Latin-OS2v3-Book.woff2') format('woff2'), url('https://encore.scdn.co/1.2.3/CircularSpotify-UI-Latin-OS2v3-Book.woff') format('woff');font-weight:400;font-style:normal;}/*!sc*/
@font-face{font-family:spotify-circular;src:url('https://encore.scdn.co/1.2.3/CircularSpotify-UI-Latin-OS2v3-Bold.woff2') format('woff2'), url('https://encore.scdn.co/1.2.3/CircularSpotify-UI-Latin-OS2v3-Bold.woff') format('woff');font-weight:700;font-style:normal;}/*!sc*/
@font-face{font-family:spotify-circular;src:url('https://encore.scdn.co/1.2.3/CircularSpotify-UI-Latin-OS2v3-Black.woff2') format('woff2'), url('https://encore.scdn.co/1.2.3/CircularSpotify-UI-Latin-OS2v3-Black.woff') format('woff');font-weight:900;font-style:normal;}/*!sc*/
data-styled.g225[id="sc-global-eWgKCs1"]{content:"sc-global-eWgKCs1,"}/*!sc*/
</style></head><body data-qa="spotify-charts-website"><div id="__next" data-reactroot=""><div class="encore-creator-light-theme"><div class="Container-c1ixcy-0 krZEp encore-base-set"><aside class="Aside-sc-1wnswz1-0 kEOKET"></aside><main class="Main-tbtyrr-0 flXzSu"><div class="Banner-sc-1bnzyty-0 gSnYRE"></div><div class="Content-sc-1n5ckz4-0 jyvkLv"></div><div><footer class="Container-sc-79vijq-0 fdgJxn ChartsFooter__StyledAppFooter-sc-68p0np-0 btVmFl"><nav><ul role="list" class="List-sc-64p3hb-0 List-vnil8g-0 bucGtk hPPrfs"><li class="ListItem-sc-14osqn3-0 Item-sc-16ne99o-0 kKJlmK eXPRFh"><a href="https://www.spotify.com/legal" target="_blank" rel="noopener noreferrer" class="Link-k8gsk-0 bGswfQ Link-fe80qw-0 cMCIWa">Legal</a></li><li class="ListItem-sc-14osqn3-0 Item-sc-16ne99o-0 kKJlmK eXPRFh"><a href="https://www.spotify.com/legal/privacy-policy/" target="_blank" rel="noopener noreferrer" class="Link-k8gsk-0 bGswfQ Link-fe80qw-0 cMCIWa">Privacy</a></li><li class="ListItem-sc-14osqn3-0 Item-sc-16ne99o-0 kKJlmK eXPRFh"><a href="https://www.spotify.com/legal/cookies-policy/" target="_blank" rel="noopener noreferrer" class="Link-k8gsk-0 bGswfQ Link-fe80qw-0 cMCIWa">Cookies</a></li><li class="ListItem-sc-14osqn3-0 Item-sc-16ne99o-0 kKJlmK eXPRFh"><a href="https://support.spotify.com/us/artists/article/charts/" target="_blank" rel="noopener noreferrer" class="Link-k8gsk-0 bGswfQ Link-fe80qw-0 cMCIWa">FAQ</a></li></ul></nav><div><small translate="no" class="Type__TypeElement-goli3j-0 lfGOlT Type__TypeElement-goli3j-0 lfGOlT Copyright-sc-15d7gge-0 bVuthO">© <!-- -->2024<!-- --> Spotify AB</small></div></footer></div></main></div></div></div><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{}},"page":"/charts/view/[alias]/[date]","query":{},"buildId":"llzNbGcUHoUB60ACr0WBF","assetPrefix":"https://mrkt-web.scdn.co/charts-spotify-com","nextExport":true,"autoExport":true,"isFallback":false,"scriptLoader":[]}</script></body></html>
Spotify并没有提供API接口来获取某个城市每周的热门歌曲。因此,我觉得抓取数据可能是唯一的选择。你知道我该如何获取歌曲信息,而不是这些HTML代码吗?
我对网页抓取还很陌生,如果我问的问题太简单,请多多包涵。
提前谢谢你!
1 个回答
1
这些图表只有在你登录后才能访问,所以你的 curl
命令是没法工作的。
顺便说一下,当我登录查看这些图表时,立刻收到了以下警告:
使用规则
在使用 Spotify Charts 之前,请务必阅读我们的使用条款,包括用户指南。以下是一些(但不是全部)需要记住的重要指南。
禁止爬虫和抓取 你不能使用任何自动化的方式来查看、访问或收集你在网站上看到的信息。
禁止复制 不要反向工程或修改这个网站,以创建衍生作品,也不要复制、再分发或公开展示其内容。
禁止绕过 不要规避我们用来保护这个网站及其内容的任何技术。