如何使用Python从网站中提取所有带有其名称的链接

2024-03-29 15:06:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我制作了一个简单的程序,从该站点获取mp3格式的链接,但是当我提取链接时,它们以HTML代码的形式出现在我面前,但是我希望它们以链接的形式出现,并且只显示它们的名称

此代码如下所示:

import requests
from bs4 import BeautifulSoup


headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}

url = "https://www.chosic.com/free-music/all/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for u in soup.find_all("a"):
    print("Downloading {}".format(u))
<a href="https://www.chosic.com/free-music/romantic/">
<noscript><img alt="" src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Romantic.jpg"/></noscript><img alt="" class="lazyload" data-src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Romantic.jpg" src="data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E"/>
<div class="tag-name">Romantic</div>
<span class="tag-count">68</span>
</a>
<a href="https://www.chosic.com/download-audio/27010/">Dark Forest</a>
<a class="artist-name" href="https://www.chosic.com/free-music/all/?keyword=McFunkypants&amp;artist" rel="nofollow">McFunkypants</a>
<a href="https://creativecommons.org/publicdomain/zero/1.0/" rel="license" target="_blank" title="This track is licensed under Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication"></a>
<a class="download-button track-download" href="https://www.chosic.com/download-audio/27010/">Download</a>
<a class="tag-cloud-link-names" href="https://www.chosic.com/free-music/games/">Games</a>
<a class="tag-cloud-link-names" href="https://www.chosic.com/free-music/suspense/">Suspense</a>
<a class="tag-cloud-link-names" href="https://www.chosic.com/free-music/cinematic/">Cinematic</a>
<a href="https://www.chosic.com/download-audio/27957/">Peacful water stream in forest</a>
<a class="artist-name" href="https://www.chosic.com/free-music/all/?keyword=Chosic&amp;artist" rel="nofollow">Chosic</a>
<a href="https://creativecommons.org/publicdomain/zero/1.0/" rel="license" target="_blank" title="This track is licensed under Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication"></a>
<a class="download-button track-download" href="https://www.chosic.com/download-audio/27957/">Download</a>
<a class="tag-cloud-link-names" href="https://www.chosic.com/free-music/nature/">Nature</a>
<a href="https://www.chosic.com/download-audio/25499/">A really dark alley</a>
<a class="artist-name" href="https://www.chosic.com/free-music/all/?keyword=Loyalty Freak Music&amp;artist" rel="nofollow">Loyalty Freak Music</a>
<a href="https://creativecommons.org/publicdomain/zero/1.0/" rel="license" target="_blank" title="This track is licensed under Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication"></a>
<a class="download-button track-download" href="https://www.chosic.com/download-audio/25499/">Download</a>
<a class="tag-cloud-link-names" href="https://www.chosic.com/free-music/horror/">Horror</a>
<a href="https://www.chosic.com/download-audio/25909/">Handel , Largo (from ‘Xerxes’)</a>
<a class="artist-name" href="https://www.chosic.com/free-music/all/?keyword=The London Baroque Orchestra&amp;artist" rel="nofollow">The London Baroque Orchestra</a>
<a href="https://creativecommons.org/publicdomain/mark/1.0/" rel="license" target="_blank" title="This track is licensed under Creative Commons Public Domain Mark 1.0"></a>
<a class="download-button track-download" href="https://www.chosic.com/download-audio/25909/">Download</a>
<a class="tag-cloud-link-names" href="https://www.chosic.com/free-music/cinematic/">Cinematic</a>
<a class="tag-cloud-link-names" href="https://www.chosic.com/free-music/sad/">Sad</a>
<a class="tag-cloud-link-names" href="https://www.chosic.com/free-music/classical/">Classical</a>
<a class="link-as-button" href="?attribution=no">View all public domain tracks →</a>
<a href="https://www.chosic.com/free-music/piano/">
<noscript><img alt="" src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Piano.jpg"/></noscript><img alt="" class="lazyload" data-src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Piano.jpg" src="data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E"/>
<div class="tag-name">Piano</div>
<span class="tag-count">122</span>
</a>
<a href="https://www.chosic.com/free-music/nature/">
<noscript><img alt="" src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Nature.jpg"/></noscript><img alt="" class="lazyload" data-src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Nature.jpg" src="data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E"/>
<div class="tag-name">Nature</div>
<span class="tag-count">25</span>
</a>
<a href="https://www.chosic.com/free-music/beats/">
<noscript><img alt="" src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Beats.jpg"/></noscript><img alt="" class="lazyload" data-src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Beats.jpg" src="data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E"/>
<div class="tag-name">Beats</div>
<span class="tag-count">85</span>
</a>
<a href="https://www.chosic.com/free-music/lofi/">
<noscript><img alt="" src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Lofi.jpg"/></noscript><img alt="" class="lazyload" data-src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Lofi.jpg" src="data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E"/>
<div class="tag-name">Lofi</div>
<span class="tag-count">40</span>
</a>
<a href="https://www.chosic.com/free-music/guitar/">
<noscript><img alt="" src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Guitar.jpg"/></noscript><img alt="" class="lazyload" data-src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Guitar.jpg" src="data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E"/>
<div class="tag-name">Guitar</div>
<span class="tag-count">97</span>
</a>
<a href="https://www.chosic.com/free-music/games/">
<noscript><img alt="" src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Games.jpg"/></noscript><img alt="" class="lazyload" data-src="https://www.chosic.com/wp-content/uploads/FreeMusicTagsImages/small/Games.jpg" src="data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E"/>
<div class="tag-name">Games</div>
<span class="tag-count">142</span>
</a>
<a aria-label="Action" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/action/" style="font-size: 22pt">Action</a>
<a aria-label="Angry" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/angry/" style="font-size: 22pt">Angry</a>
<a aria-label="Bright" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/bright/" style="font-size: 22pt">Bright</a>
<a aria-label="Calm" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/calm/" style="font-size: 22pt">Calm</a>
<a aria-label="Cute" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/cute/" style="font-size: 22pt">Cute</a>
<a aria-label="Dark" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/dark/" style="font-size: 22pt">Dark</a>
<a aria-label="Dramatic" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/dramatic/" style="font-size: 22pt">Dramatic</a>
<a aria-label="Energetic" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/energetic/" style="font-size: 22pt">Energetic</a>
<a aria-label="Epic" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/epic/" style="font-size: 22pt">Epic</a>
<a aria-label="Fast" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/fast/" style="font-size: 22pt">Fast</a>
<a aria-label="Funny" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/funny/" style="font-size: 22pt">Funny</a>
<a aria-label="Happy" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/happy/" style="font-size: 22pt">Happy</a>
<a aria-label="Horror" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/horror/" style="font-size: 22pt">Horror</a>
<a aria-label="Motivational" class="tag-cloud-link tag-link-1896 tag-link-position-38" href="https://www.chosic.com/free-music/motivational/" style="font-size: 22pt">Motivational</a>

如何以mp3格式提取HTML网页中标记名的所有链接及其名称

例如:

0- Happy Clappy
    https://www.chosic.com/download-audio/24390/
1- Sweet Dreams
    https://www.chosic.com/download-audio/26757/
2- Inspiring Optimistic Upbeat Energetic Guitar Rhythm
    https://www.chosic.com/download-audio/27120/
3- White Petals
    https://www.chosic.com/download-audio/27279/
4- A Christmas adventure (Part 2)
    https://www.chosic.com/download-audio/28675/
5- Circus Theme (Entry of the Gladiators) – Strings Version
    https://www.chosic.com/download-audio/28668/
6- It feels good to be alive too
    https://www.chosic.com/download-audio/28670/
7- You’re The Champion
    https://www.chosic.com/download-audio/28700/
8- Dark Forest
    https://www.chosic.com/download-audio/27010/
9- Peacful water stream in forest
    https://www.chosic.com/download-audio/27957/
10- A really dark alley
    https://www.chosic.com/download-audio/25499/
11- Handel , Largo (from ‘Xerxes’)
    https://www.chosic.com/download-audio/25909/

有可能的解决办法吗


Tags: httpssrccomfreeclouddownloadtagwww
2条回答

试试看:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}

url = "https://www.chosic.com/free-music/all/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = soup.find_all('div',attrs={'class':'trackF-title-inside'})

s = ''
for i, link in enumerate(data):
    s = s + str(i) + '- ' + str(link.find('a').text) + '\n\t' + str(link.find('a', href=True)['href']) + '\n'
    
with open('out.txt', 'w') as f:
    f.write(s)
f.close()

输出:

0- Happy Clappy
    https://www.chosic.com/download-audio/24390/
1- Sweet Dreams
    https://www.chosic.com/download-audio/26757/
2- Inspiring Optimistic Upbeat Energetic Guitar Rhythm
    https://www.chosic.com/download-audio/27120/
3- White Petals
    https://www.chosic.com/download-audio/27279/
4- A Christmas adventure (Part 2)
    https://www.chosic.com/download-audio/28675/
5- Circus Theme (Entry of the Gladiators) – Strings Version
    https://www.chosic.com/download-audio/28668/
6- It feels good to be alive too
    https://www.chosic.com/download-audio/28670/
7- You’re The Champion
    https://www.chosic.com/download-audio/28700/
8- Dark Forest
    https://www.chosic.com/download-audio/27010/
9- Peacful water stream in forest
    https://www.chosic.com/download-audio/27957/
10- A really dark alley
    https://www.chosic.com/download-audio/25499/
11- Handel , Largo (from ‘Xerxes’)
    https://www.chosic.com/download-audio/25909/

示例文件:enter image description here


或:

也许这就是你所需要的:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}

url = "https://www.chosic.com/free-music/all/"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = soup.find_all('div',attrs={'class':'track-info track'})

for i, element in enumerate(data):
    for link in element.find_all('div',attrs={'class':'waveform before'}):
        print(str(i) + '- ' + element.find('a').text)
        print('\t' + link['data-url'])

输出:

0- Happy Clappy
    https://www.chosic.com/wp-content/uploads/2020/06/John_Bartmann_-_09_-_Happy_Clappy-1.mp3
1- Sweet Dreams
    https://www.chosic.com/wp-content/uploads/2020/11/batchbug-sweet-dreams.mp3
2- Inspiring Optimistic Upbeat Energetic Guitar Rhythm
    https://www.chosic.com/wp-content/uploads/2021/01/fm-freemusic-inspiring-optimistic-upbeat-energetic-guitar-rhythm.mp3
3- White Petals
    https://www.chosic.com/wp-content/uploads/2021/02/keys-of-moon-white-petals.mp3
4- A Christmas adventure (Part 2)
    https://www.chosic.com/wp-content/uploads/2021/08/TRG_Banks_-_09_-_A_Christmas_adventure_Part_2.mp3
5- Circus Theme (Entry of the Gladiators) – Strings Version
    https://www.chosic.com/wp-content/uploads/2021/08/Circus-Theme-Entry-of-the-Gladiators-Strings-Version.mp3
6- It feels good to be alive too
    https://www.chosic.com/wp-content/uploads/2021/08/Loyalty_Freak_Music_-_04_-_It_feels_good_to_be_alive_too.mp3
7- You’re The Champion
    https://www.chosic.com/wp-content/uploads/2021/08/Youre-The-Champion.mp3
8- Dark Forest
    https://www.chosic.com/wp-content/uploads/2021/01/dark-forest.mp3
9- Peacful water stream in forest
    https://www.chosic.com/wp-content/uploads/2021/04/kvgarlic__largestreamoverloginforestmarch.mp3
10- A really dark alley
    https://www.chosic.com/wp-content/uploads/2020/06/Loyalty_Freak_Music_-_07_-_A_really_dark_alley.mp3
11- Handel , Largo (from ‘Xerxes’)
    https://www.chosic.com/wp-content/uploads/2020/07/Hndel-Xerxes-HWV-40.mp3

我想这符合你的要求

我不确定您想要什么样的数据,因为您提供的示例输出与站点上的数据不匹配。当然,您可以在这里调整解析逻辑,以提供不同的输出

import requests
from bs4 import BeautifulSoup


headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}

url = "https://www.chosic.com/free-music/all/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

required_data = []
count = 0

for x in soup.find_all('a'):
    if x.has_attr('href'):
        count += 1
        description = x.text.strip()
        url = x['href']
        required_data.append((count, description, url))

print(required_data)

相关问题 更多 >