我有一个数据集“banks”,如果我对一个列名称“jobs”进行分组,以检查每个类别中的计数,我可以找到以下内容:
我还添加了我正在使用的数据集的前3行: 年龄、工作、婚姻、教育、违约、余额、住房、贷款、联系方式、日期、月份、期限、活动、pdays、上一次、poutcome、y 30,失业,已婚,小学,号码1787,号码,号码,手机,19,10月,79,1,-1,0,未知,号码 33、服务、已婚、中学、否、4789、是、是、手机、5月11日、220、1339、4、失败、否 35,管理,单一,三级,否,1350,是,否,蜂窝,16,4月,185,1330,1,故障,否
我的意图是创建一个可以用于其他列的小函数,因此我尝试使用“dfply”包创建一个函数
import pandas as pd
import dfply
from dfply import *
#creating the function
@dfpipe
def woe_iv(df,variable):
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
#invoking the function
banks>>woe_iv(X.job)
但是,这段代码给了我一个错误,说明如下:
@dfpipe
def woe_iv(df,variable):
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
banks>>woe_iv(X.job)
Traceback (most recent call last):
File "<ipython-input-46-d851aeac1927>", line 7, in <module>
banks>>woe_iv(X.job)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 142, in __rrshift__
result = self.function(other_copy)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 149, in <lambda>
return pipe(lambda x: self.function(x, *args, **kwargs))
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 329, in __call__
return self.function(*args, **kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 282, in __call__
return self.function(df, *args, **kwargs)
File "<ipython-input-46-d851aeac1927>", line 5, in woe_iv
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 142, in __rrshift__
result = self.function(other_copy)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 149, in <lambda>
return pipe(lambda x: self.function(x, *args, **kwargs))
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 279, in __call__
args = self._recursive_arg_eval(df, args[1:])
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 241, in _recursive_arg_eval
return [
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 242, in <listcomp>
self._symbolic_to_label(df, a) if i in eval_as_label
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 231, in _symbolic_to_label
return self._evaluator_loop(df, arg, self._evaluate_label)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 225, in _evaluator_loop
return eval_func(df, arg)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 181, in _evaluate_label
arg = self._evaluate(df, arg)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 175, in _evaluate
arg = arg.evaluate(df)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 71, in evaluate
return self.function(context)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 74, in <lambda>
return Intention(lambda x: getattr(self.function(x), attribute),
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5139, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'variable'
如果我遗漏了什么,请告诉我
Shameek Mukherjee,这是对示例代码的正确解释和缩进吗?除了压痕,我找不到任何区别
第二个例子:
相关问题 更多 >
编程相关推荐