以Datafram格式读取缺少信息的.txt文件

2024-04-25 01:42:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图读取一个.txt文件作为熊猫数据帧,但我得到了几个错误,数据没有加载。我发现的问题与数据的结构有关。你知道吗

file.txt:
    "Mark","Company","Country","Value","1","abcdef","ecu","1000","","","usa","30","","","col","200"....

那个文件.txt显示如下信息:

Mark          Company     Country     Value   ...
   1           abcdef         ecu      1000   ...
                              usa        30   ...
                              col       200   ...
   2           ghijk          jap        10   ...
                              eur       900   ...
                              lki             ...
   3           lmnop          wer        21   ...
                              uye             ...
                              urg       123   ...
   .               .            .         .     .     
   .               .            .         .     .     

我需要的是一个数据帧,其结构类似于:

Mark          Company     Country     Value   ...
   1           abcdef         ecu      1000   ...
   1           abcdef         usa        30   ...
   1           abcdef         col       200   ...
   2           ghijk          jap        10   ...
   2           ghijk          eur       900   ...
   2           ghijk          lki         0   ...
   3           lmnop          wer        21   ...
   3           lmnop          uye         0   ...
   3           lmnop          urg       123   ...
   .               .            .         .     .     
   .               .            .         .     .     

Tags: 文件数据txtvaluecol结构countrycompany
1条回答
网友
1楼 · 发布于 2024-04-25 01:42:53

更新:

df = pd.read_csv(fn,
                 encoding='utf-16',
                 na_values=['NA','NaN','nan','n.a.'],
                 low_memory=False)

# list here ALL columns that must be filled, using `ffill()` method:
cols = ['Mark','Company name','Cons. code','City']
df[cols] = df[cols].ffill()

# assuming that we have `ffilled` all required columns, we can simply `fillna(0)` for the rest of the columns
df = df.fillna(0)

旧答案:

您的文件看起来像一个固定宽度的文件,所以请尝试将pd.read_fwfDataFrame.ffill()结合使用

假设我们有以下TXT文件:

Mark          Company     Country     Value1  Value2 
   1           abcdef         ecu      1000      
                              usa        30      10  
                              col       200      20  
   2           ghijk          jap        10        
                              eur       900      30  
                              lki                40    
   3           lmnop          wer        21        
                              uye               50     
                              urg       123       

解决方案:

In [102]: fn = r'D:\temp\.data\002.txt'

In [103]: df = pd.read_fwf(fn)

In [123]: df.loc[:, df.columns.str.contains(r'^Value')] = df.filter(regex=r'^Value').fillna(0)

In [124]: df = df.ffill()

In [125]: df
Out[125]:
   Mark Company Country  Value1  Value2
0   1.0  abcdef     ecu  1000.0     0.0
1   1.0  abcdef     usa    30.0    10.0
2   1.0  abcdef     col   200.0    20.0
3   2.0   ghijk     jap    10.0     0.0
4   2.0   ghijk     eur   900.0    30.0
5   2.0   ghijk     lki     0.0    40.0
6   3.0   lmnop     wer    21.0     0.0
7   3.0   lmnop     uye     0.0    50.0
8   3.0   lmnop     urg   123.0     0.0

相关问题 更多 >