python结果列对象中的子字符串不可调用

df = spark.createDataFrame([ (324.456, "hi", "test"), (453.987, "hello", "python"), (768.66, "test", "java") ], ["col1", "col2", "col3"] ) new = df.withColumn( "col4", F.substring((df.col1).cast(StringType()),1,F.instr((df.col1).cast(StringType()),".")+2))

2条回答

网友

1楼 · 编辑于 2024-04-18 23:04:52

你要找的是一种截断小数的方法。我建议您使用pyspark.sql.functions.pow和一些巧妙的使用强制转换来LongType实现这一点。通过这种方式，您可以乘以10^{decimal_places}再除以相同的数，同时强制转换到long以消除中间的小数（浮点），例如：

df2.show()
+   -+  -+   +
|   col1| col2|  col3|
+   -+  -+   +
|324.456|   hi|  test|
|453.987|hello|python|
| 768.66| test|  java|
+   -+  -+   +


decimal_places = 2
truncated_value_column = f.pow(f.lit(10), decimal_places).cast('long')

df2.withColumn(
    "trunc", 
    ((f.col("col1") * truncated_value_column)).cast("long") / truncated_value_column
).show()
+   -+  -+   +   +
|   col1| col2|  col3| trunc|
+   -+  -+   +   +
|324.456|   hi|  test|324.45|
|453.987|hello|python|453.98|
| 768.66| test|  java|768.66|
+   -+  -+   +   +

注意：如果你想回到string，我建议你以后再这样做。希望这有帮助

网友

2楼 · 编辑于 2024-04-18 23:04:52

您还可以在此处使用带有regexp_extract的正则表达式：

df.withColumn('test',
              F.regexp_extract(F.col("col1").cast("string"),'\d+[.]\d{2}',0)).show()

或如@MohammadMurtazaHashmi在评论中所建议的，无需铸造：

df.withColumn('test',F.regexp_extract(F.col("col1"),'\d+[.]\d{2}',0)).show()

+   -+  -+   +   +
|   col1| col2|  col3|  test|
+   -+  -+   +   +
|324.456|   hi|  test|324.45|
|453.987|hello|python|453.98|
| 768.66| test|  java|768.66|
+   -+  -+   +   +

相关问题更多 >

编程相关推荐

热门问题

热门文章