如何以well格式获取key和value,以及在pandas的末尾获取n/a值

2024-04-19 04:03:48 发布

您现在位置:Python中文网/ 问答频道 /正文

按升序对数据进行排序,最后需要打印不存在的键。你知道吗

请提出解决方案,并建议是否需要任何修改。你知道吗

输入.txt-

3=1388|4=1388|5=M|8=157.75|9=88929|1021=1500|854=n|388=157.75|394=157.75|474=157.75|1584=88929|444=20160713|459=93000546718000|461=7|55=93000552181000|22=89020|400=157.75|361=0.73|981=0|16=1468416600.6006|18=1468416600.6006|362=0.46
3=1388|4=1388|5=M|8=157.73|9=100|1021=0|854=p|394=157.73|474=157.749977558|1584=89029|444=20160713|459=93001362639104|461=26142|55=93001362849000|22=89120|361=0.71|981=0|16=1468416601.372|18=1468416601.372|362=0.45

程序代码-

import pandas as pd
import numpy as np 
from operator import itemgetter   
df = pd.read_csv("C:\",index_col=None, names=['text'])
s = df.text.str.split('|')
ds =[dict(w.split('=',1 ) for w in x) for x in s]
p = pd.DataFrame.from_records(ds)
p1 = p.replace(np.nan,'n/a', regex=True)
st = p1.stack(level=0,dropna=False)
dfs = [g for i, g in st.groupby(level=0)]
dfs_length = len(dfs)
i = 0
while i < len(dfs):    
    print '\nindex[%d]'%i
    for (_,k),v in dfs[i].iteritems():
        print k,'\t',v
    i = i + 1

输出(我得到):

index[0]
1021      1500      
1584      88929     
16        1468416600.6006
18        1468416600.6006
22        89020     
3         1388      
361       0.73      
362       0.46     
388       157.75    
394       157.75    
4         1388      
400       157.75    
444       20160713  
459       93000546718000
461       7         
474       157.75    
5         M       
55        93000552181000
8         157.75    
854       n         
9         88929     
981       0         

index[1]
1021      0         
1584      89029     
16        1468416601.372
18        1468416601.372
22        89120     
3         1388      
361       0.71      
362       0.45     
388       n/a       
394       157.73    
4         1388      
400       n/a       
444       20160713  
459       93001362639104
461       26142     
474       157.749977558
5         IBM       
55        93001362849000
8         157.73    
854       p         
9         100       
981       0         

预期产量

index[0]
3         1388
4         1388
5         M
8         157.75
9         88929
16        1468416600.6006
18        1468416600.6006
22        89020
55        93000552181000
361       0.73
388       157.75
394       157.75
400       157.75
444       20160714
459       93000546718000
461       7
474       157.75
854       n
981       0
1021      1500
1584      88929

index[1]
3         1388 
4         1388 
5         M 
8         157.73 
9         100      
16        1468416601.372
18        1468416601.372
22        89120 
55        9300136284900 
361       0.71      
362       0.45  
394       157.73 
444       20160713  
459       93001362639104
461       26142     
474       157.749977558 
854       p   
981       0    
1021      0         
1584      89029          
388       n/a       
400       n/a       

Tags: textinfromimportdfforindexas
1条回答
网友
1楼 · 发布于 2024-04-19 04:03:48

您可以使用^{}来创建Series,它被=分割成^{},并转换成DataFrame。然后将第一列强制转换为int,用set_index^{}覆盖第一列的索引:

temp=u"""3=1388|4=1388|5=M|8=157.75|9=88929|1021=1500|854=n|388=157.75|394=157.75|474=157.75|1584=88929|444=20160713|459=93000546718000|461=7|55=93000552181000|22=89020|400=157.75|361=0.73|981=0|16=1468416600.6006|18=1468416600.6006|"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='|',index_col=None, header=None)
df1 = df.stack().str.split('=', expand=True)
df1.iloc[:,0] = df1.iloc[:,0].astype(int)
df1 = df1.set_index(0).sort_index()
print (df1)
                    1
0                    
3                1388
4                1388
5                   M
8              157.75
9               88929
16    1468416600.6006
18    1468416600.6006
22              89020
55     93000552181000
361              0.73
388            157.75
394            157.75
400            157.75
444          20160713
459    93000546718000
461                 7
474            157.75
854                 n
981                 0
1021             1500
1584            88929

另一个带有^{}的解决方案:

df1= df.stack().str.split('=', expand=True)
df1.columns = ['a','b']
df1['a'] = df1['a'].astype(int)
df1 = df1.reset_index(drop=True).sort_values('a')
print (df1)
       a                b
0      3             1388
1      4             1388
2      5                M
3      8           157.75
4      9            88929
19    16  1468416600.6006
20    18  1468416600.6006
15    22            89020
14    55   93000552181000
17   361             0.73
7    388           157.75
8    394           157.75
16   400           157.75
11   444         20160713
12   459   93000546718000
13   461                7
9    474           157.75
6    854                n
18   981                0
5   1021             1500
10  1584            88929

相关问题 更多 >