下载.txt文件并提取文件名

2024-06-11 09:15:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图从urlhttps://marketdata.theocc.com/position-limits?reportType=change下载Python文件

我可以使用以下方法将其转换为数据帧:

df = pd.read_csv('https://marketdata.theocc.com/position-limits?reportType=change')

但是我想得到的是文件名。 因此,如果您直接从浏览器下载该文件,则获得的文件名为“POSITIONLIMITCHANGE_20201202.txt”

有人能推荐一种在Python中实现这一点的有效方法吗? 谢谢


Tags: 文件数据方法comdf文件名positionchange
2条回答

如果您正在使用requests库,则有关该文件的信息位于响应头(字典)中:

response = requests.get('https://marketdata.theocc.com/position-limits?reportType=change')
print(response.headers['content-disposition'])

输出:

attachment; filename=POSITIONLIMITCHANGE_20201202.txt

Python中的示例代码,用于从URL获取文件、提取文件名、保存到本地文件并导入到dataframe中

import io
import requests
import re
import pandas as pd

url = 'https://marketdata.theocc.com/position-limits?reportType=change'
r = requests.get(url)
# NOTE: filename is found in content-disposition HTTP response header
s = r.headers.get('content-disposition')

# use regexp with \w to match only safe characters in filename
# this will prevent accepting paths or drive letters as part of name
m = re.search(r'filename=(\w+)', s)
if m:
    filename = m.group(1)
else:
    # set default if filename not provided or name has bad characters
    filename = "out.csv"
print("filename:", filename)
text = r.text

# if you want to write out file with filename provided
with open(filename, 'w') as fp:
    fp.write(text)

# to read from string in-memory wrap with io.StringIO()
df = pd.read_csv(io.StringIO(text))
print(list(df.columns))

输出:

filename: POSITIONLIMITCHANGE_20201202.txt
['Equity_Symbol','   ','Start_Date','Start_Pos_Limit','End_Date','End_Pos_Limit','Action']

相关问题 更多 >