Pandas将txtfile读取为数据帧，当一列只有一位数据时出现问题

824334 4141.854 6100.175 11.040 -117.810 841013 2028.294 6221.566 10.913 -178.340 854890 4214.858 6322.255 10.645 -125.390 864353 4326.768 6389.329 10.815 -98.650 ? 864918 3187.398 6392.824 11.050 -91.250 867194 3230.288 6410.404 10.208 -190.380 1794 2926.630 8.900 18.564 -58.970 3041 2902.000 18.400 16.302 -63.770 3171 2912.040 19.660 12.905 -110.350

0 1 2 3 4 0 824334 4141.854 6100.175 11.040 -117.810 1 841013 2028.294 6221.566 10.913 -178.340 2 854890 4214.858 6322.255 10.645 -125.390 3 864353 4326.768 6389.329 10.815 -98.650 4 864918 3187.398 6392.824 11.050 -91.250 5 867194 3230.288 6410.404 10.208 -190.380 6 1794 2926.630 8.900 18.564 -58.970 7 3041 2902.000 18.400 16.302 -63.770 8 3171 2912.040 19.660 12.905 -110.350

1条回答

网友

1楼 · 发布于 2024-04-25 00:58:22

正如@jesrael在评论中所说的，最干净的方法是在使用read_fwf之前知道列数，并将其与names参数一起使用

正如我在评论中所说的，你在我的机器上发布的内容是有效的。那么，也许还有别的东西要检查

无论如何，如果列的数量因文件而异，您可以在使用read_fwf之前读取每个文件，以获得这样的列数（虽然不是很有效，但它可以完成这项工作）：

number_of_columns = 0
with open(file) as f:
    for line in f:
        items = line.split()
        number_of_columns_in_line = len(items)
        if number_of_columns_in_line > number_of_columns:
            number_of_columns = number_of_columns_in_line

和使用：

data = pd.read_fwf(file, dtype=None, header=None, names=range(number_of_columns))

或者，将read_csv与delim_whitespace=True和names一起使用：

data = pd.read_csv(file, header=None, delim_whitespace=True, names=range(number_of_columns))

如果我们不将names给read_csv，将出现错误（pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 4, saw 6），因为列数是从第一行推断出来的，而在导致问题的文本文件中，最后一列中没有数据

相关问题更多 >

编程相关推荐

热门问题

热门文章