如何从url加载压缩的机器学习数据集?

2024-04-25 22:19:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从url加载一个压缩的、以制表符分隔的“MHEALTHDATASET”。 https://archive.ics.uci.edu/ml/machine-learning-databases/00319/

代码:

zipurl = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00319/MHEALTHDATASET.zip'
with urlopen(zipurl) as zipresp, NamedTemporaryFile() as tfile:
    tfile.write(zipresp.read())
    tfile.seek(0)
    unpack_archive(tfile.name, '/tmp/MHEALTHDATASET.zip', format='zip')
    dataset = np.loadtxt(urlopen(zipurl), dtype=str, delimiter="/t")
    for file in dataset:
        file = re.sub("mHealth_", "", file)

错误:

Traceback (most recent call last):
  File "C:\Users\User\PycharmProjects\algorithms\elbow.py", line 17, in <module>
    unpack_archive(tfile.name, '/tmp/MHEALTHDATASET.zip', format='zip')
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\shutil.py", line 1247, in unpack_archive
    func(filename, extract_dir, **dict(format_info[2]))
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\shutil.py", line 1151, in _unpack_zipfile
    raise ReadError("%s is not a zip file" % filename)
shutil.ReadError: C:\Users\User\AppData\Local\Temp\tmp_x_c1ejk is not a zip file

Tags: inpyhttpsformatlineziptmpfile