在python中读取联机.tbl数据文件

2024-05-15 22:23:27 发布

您现在位置:Python中文网/ 问答频道 /正文

正如标题所说,我正在尝试读取.tbl格式的在线数据文件。这是数据的链接:https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl

我尝试了以下代码

cosmos= pd.read_table('https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl')

运行此命令没有给我任何错误,但是当我编写print (cosmos.column)时,它没有给我一个单独列的列表,而是python将所有内容放在一起,并给我如下所示的输出:

Index(['|            ID|            RA|           DEC|  MAG_AUTO_ACS|       R_PETRO|        R_HALF|    CONC_PETRO|     ASYMMETRY|          GINI|           M20|   Axial Ratio|     AUTOCLASS|   CLASSWEIGHT|'], dtype='object').

我的主要目标是单独打印该表的列,然后打印cosmos['RA']。有人知道怎么做吗


Tags: httpstablesdataraedutblcosmosmorph
1条回答
网友
1楼 · 发布于 2024-05-15 22:23:27

您的文件有四个标题行,标题(|)和数据(空白)中有不同的分隔符。您可以使用read_tableskiprows参数读取数据

import requests
import pandas as pd

filename = 'cosmos_morph_cassata_1.1.tbl'
url = 'https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/' + filename
n_header = 4

## Download large file to disc, so we can reuse it...
table_file = requests.get(url)
open(filename, 'wb').write(table_file.content)


## Skip the first 4 header rows and use whitespace as delimiter
cosmos = pd.read_table(filename, skiprows=n_header, header=None, delim_whitespace=True)

## create header from first line of file
with open(filename) as f:
    header_line = f.readline()
    ## trim whitespaces and split by '|'
    header_columns = header_line.replace(' ', '').split('|')[1:-1]

cosmos.columns = header_columns

enter image description here

相关问题 更多 >