我有一个名为“new_emp_final_1”的数据帧。当我试图从cookTime和prepTime派生一列'difficity'时,通过从一个udf调用函数difficity,它给了我错误。在
新的_emp_final_1.d类型如下-
[('name', 'string'), ('ingredients', 'string'), ('url', 'string'), ('image', 'string'), ('cookTime', 'string'), ('recipeYield', 'string'), ('datePublished', 'strin
g'), ('prepTime', 'string'), ('description', 'string')]
新的_emp_final_1.schema的结果是-
^{pr2}$代码:
def difficulty(cookTime, prepTime):
if not cookTime or not prepTime:
return "Unkown"
total_duration = cookTime + prepTime
if total_duration > 3600:
return "Hard"
elif total_duration > 1800 and total_duration < 3600:
return "Medium"
elif total_duration < 1800:
return "Easy"
else:
return "Unkown"
func_udf = udf(difficulty, IntegerType())
new_emp_final_1 = new_emp_final_1.withColumn("difficulty", func_udf(new_emp_final_1.cookTime, new_emp_final_1.prepTime))
new_emp_final_1.show(20,False)
错误是-
File "/home/raghavcomp32915/mypycode.py", line 56, in <module>
func_udf = udf(difficulty, IntegerType())
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/udf.py", line 186, in wrapper
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/udf.py", line 166, in __call__
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/column.py", line 66, in _to_seq
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/column.py", line 54, in _to_java_column
TypeError: Invalid argument, not a string or column: <function difficulty at 0x7f707e9750c8> of type <type 'function'>. For column literals, use 'lit', 'array', 's
truct' or 'create_map' function.
我期望在现有的数据帧new emp_final_1中有一个名为“困难”的列,其值为Hard、Medium、Easy或Unknown。在
纵观udf(困难),我看到了两件事:
这个例子对我很有用:
您是否尝试过像这样发送cookTime和prepTime的文字值:
new_emp_final_1 = new_emp_final_1.withColumn("difficulty", func_udf(new_emp_final_1.lit(cookTime), new_emp_final_1.lit(prepTime)))
相关问题 更多 >
编程相关推荐