<p>我正在尝试用python和requests库下载一系列经典音乐midi文件。不幸的是,我似乎不能真正下载midi文件本身。我只下载HTML文件。我搜索了SO并尝试了其他一些解决方案,例如<a href="https://stackoverflow.com/questions/44392748/python-crawler-does-not-work-properly">this post</a>和<a href="https://stackoverflow.com/questions/48239112/python3-download-file-from-url-by-button-click">this post</a>,但这两种解决方案都不适合我。你知道吗</p>
<p>以下是我编写的代码:</p>
<pre><code>from bs4 import BeautifulSoup
import requests
import re
url = 'http://www.midiworld.com/classic.htm'
headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
r = requests.get(url, headers=headers)
data = r.text
soup = BeautifulSoup(data, "html.parser")
links = []
for link in soup.find_all("a", href=re.compile("mid$")):
links.append(link['href'])
def get_filename(links):
filenames = []
"""
Will return a list of filenames for the files to be downloaded
"""
for link in links:
url = link
if url.find('/'):
f_name = url.rsplit('/', 1)[1]
print(url.rsplit('/', 1)[1])
filenames.append(f_name)
return filenames
def download_files(links, filenames):
for link, filename in zip(links, filenames):
r = requests.get(url, allow_redirects=True)
with open(filename, 'wb') as saveMidi:
saveMidi.write(r.content)
filenames = get_filename(links)
download_files(links, filenames)
</code></pre>
<p>我不明白为什么要把html文件退回。关于如何正确下载midi文件有什么想法吗?你知道吗</p>
<p>我不知道为什么,但这对我有用。你知道吗</p>
<pre><code>from urllib.request import urlopen
x = urlopen(links[0]).read()
with open(filenames[0], "wb") as f:
f.write(x)
</code></pre>