排序u值错误

2024-04-29 05:58:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我不知道我的代码出了什么问题

import pandas as pd
import numpy as np
woe = [1.1147295474833758,0.364043491078754,-0.05525053172192353,-0.3950007109750665,-0.6784658191115104,-0.9522135140050229,-1.1441658353033486]
iv = [0.29078213954085946,0.29078213954085946,0.29078213954085946,0.29078213954085946,0.29078213954085946,0.29078213954085946,0.29078213954085946]
lis = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
fin = [lis,woe,iv]
fin = np.array(fin).T  
df_disc = pd.DataFrame(fin,columns=['Label','WoE','IV'])
print(df_disc)
df_disc = df_disc.sort_values(by=['WoE'])
df_disc = df_disc.reset_index(drop=True)
print(df_disc)

结果

  Label                   WoE                   IV
0     A    1.1147295474833758  0.29078213954085946
1     B     0.364043491078754  0.29078213954085946
2     C  -0.05525053172192353  0.29078213954085946
3     D   -0.3950007109750665  0.29078213954085946
4     E   -0.6784658191115104  0.29078213954085946
5     F   -0.9522135140050229  0.29078213954085946
6     G   -1.1441658353033486  0.29078213954085946
  Label                   WoE                   IV
0     C  -0.05525053172192353  0.29078213954085946
1     D   -0.3950007109750665  0.29078213954085946
2     E   -0.6784658191115104  0.29078213954085946
3     F   -0.9522135140050229  0.29078213954085946
4     G   -1.1441658353033486  0.29078213954085946
5     B     0.364043491078754  0.29078213954085946
6     A    1.1147295474833758  0.29078213954085946

我认为正确的应该是标签G、F、E、D、C、B、A,但结果似乎是错误的


Tags: 代码importdfasnplabelpddisc
3条回答

如上所述,该列包含字符串。要保持精度,请将序列转换为Decimal

from decimal import Decimal

# ...

df_disc['WoE'] = df_disc['WoE'].apply(Decimal)
df_disc = df_disc.sort_values(by='WoE')
print(df_disc)

印刷品:

  Label                   WoE                   IV
6     G   -1.1441658353033486  0.29078213954085946
5     F   -0.9522135140050229  0.29078213954085946
4     E   -0.6784658191115104  0.29078213954085946
3     D   -0.3950007109750665  0.29078213954085946
2     C  -0.05525053172192353  0.29078213954085946
1     B     0.364043491078754  0.29078213954085946
0     A    1.1147295474833758  0.29078213954085946

问题是在数据框中,列由对象填充,而不是数字

在代码中,如果转换字符串和数值,所有值都将转换为对象:

fin = np.array(fin).T  

解决方案是按列名称使用字典并传递到^{}

df_disc = (pd.DataFrame(fin,columns=['Label','WoE','IV'])
             .astype({'WoE':'float', 'IV':'float'}))
print(df_disc)

df_disc = df_disc.sort_values(by=['WoE'], ignore_index=True)
print(df_disc)
  Label       WoE        IV
0     G -1.144166  0.290782
1     F -0.952214  0.290782
2     E -0.678466  0.290782
3     D -0.395001  0.290782
4     C -0.055251  0.290782
5     B  0.364043  0.290782
6     A  1.114730  0.290782

如果将字典传递给DataFrame构造函数,则可以防止它:

df_disc = pd.DataFrame({'Label':lis,'WoE':woe,'IV':iv})
print(df_disc)
    
df_disc = df_disc.sort_values(by=['WoE'], ignore_index=True)
print(df_disc)
  Label       WoE        IV
0     G -1.144166  0.290782
1     F -0.952214  0.290782
2     E -0.678466  0.290782
3     D -0.395001  0.290782
4     C -0.055251  0.290782
5     B  0.364043  0.290782
6     A  1.114730  0.290782

您的列WoEIVdtype{}。需要将其转换为float以获得正确的sort

In [2723]: df_disc.dtypes
Out[2723]: 
Label    object
WoE      object
IV       object
dtype: object

In [2725]: df_disc.WoE = df_disc.WoE.astype(float)

In [2726]: df_disc.sort_values(by=['WoE'])
Out[2726]: 
  Label       WoE                   IV
6     G -1.144166  0.29078213954085946
5     F -0.952214  0.29078213954085946
4     E -0.678466  0.29078213954085946
3     D -0.395001  0.29078213954085946
2     C -0.055251  0.29078213954085946
1     B  0.364043  0.29078213954085946
0     A  1.114730  0.29078213954085946

相关问题 更多 >