Spark TypeError中的UDF有问题：“Column”对象不可调用

import re def removeEmoji(text): regrex_pattern = re.compile(pattern = "[" u"\U0001F600-\U0001F64F" # emoticons u"\U0001F300-\U0001F5FF" # symbols & pictographs u"\U0001F680-\U0001F6FF" # transport & map symbols u"\U0001F1E0-\U0001F1FF" # flags (iOS) "]+", flags = re.UNICODE) return regrex_pattern.sub(r'',text)

TypeError Traceback (most recent call last) <ipython-input-29-e5d42d609b59> in <module>() ----> 1 new_df = new_df.withColumn("content", remove_punct(df_merge["content"])) 2 new_df.show(5) <ipython-input-21-dee888ef5b90> in remove_punct(text) 2 3 def remove_punct(text): ----> 4 return text.translate(str.maketrans('', '', string.punctuation)) 5 6 TypeError: 'Column' object is not callable

1条回答

网友
1楼 · 发布于 2024-05-14 20:52:39

堆栈跟踪表明您正在直接调用python方法，而不是udf
remove_punct是一个普通的Python函数，而punct_remove是一个udf，可以用作withColumn调用的第二个参数
解决此问题的一种方法是在withColumn调用中使用punct_remove而不是remove_punct
另一种减少Python函数与udf混淆的方法是使用@udf注释：
from pyspark.sql import functions as F from pyspark.sql import types as T @F.udf(returnType=T.StringType()) def remove_punct(text): return text.translate(str.maketrans('', '', string.punctuation)) df.withColumn("content", remove_punct(F.col("content"))).show()

相关问题更多 >

编程相关推荐

热门问题

热门文章