访问实习生的数据

2024-05-13 20:39:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我想使用python3自动访问这个文件。网站是https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls

当您手动将url输入explorer时,它会要求您下载该文件,但我希望在python中自动执行此操作,并将数据作为df加载。你知道吗

我得到下面的错误

URL错误:

from urllib.request import urlretrieve
import pandas as pd

# Assign url of file: url
url = 'https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls'

# Save file locally
urlretrieve(url, 'my-sheet.xls')

# Read file into a DataFrame and print its head
df=pd.read_excel('my-sheet.xls')
print(df.head())

URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>


Tags: 文件httpscomurldfwwwxlsdocuments
3条回答

您可以直接使用pandas和.read_excel方法来完成

df = pd.read_excel("https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls", sheet_name='Data', skiprows=5)

df.head(1)

Output

$ curl https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>307 Temporary Redirect</title>
</head><body>
<h1>Temporary Redirect</h1>
<p>The document has moved <a href="https://www.dax-indices.com/document/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls">here</a>.</p>
</body></html>

你只是被重定向了。有很多方法可以在代码中实现,但我只想将url改为“https://www.dax-indices.com/document/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls

我在jupyter环境中运行了你的代码,它成功了。没有提示错误,但数据帧只有NaN值。我检查了你试图读取的xls文件,它似乎不包含任何数据。。。你知道吗

还有其他方法可以检索xls数据,例如:downloading an excel file from the web in python

import requests
url = 'https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls'

resp = requests.get(url)

output = open('my-sheet.xls', 'wb')
output.write(resp.content)
output.close()

df=pd.read_excel('my-sheet.xls')
print(df.head())

相关问题 更多 >