我的数据集df如下所示:
ID date class
1 2020/01/02 [math,english]
1 2020/01/03 [math,english]
1 2020/01/04 [math,english]
2 2020/01/02 [math,english]
2 2020/01/03 [math,english,art]
2 2020/01/04 [math,english]
2 2020/01/05 [math,english,art]
2 2020/01/06 [math,art]
2 2020/01/07 [math,art]
2 2020/01/08 [math,english,art]
我目前的代码是:
df.withColumn("c_order", rank()\
.over(Window.partitionBy("ID","date")\
.orderBy("class")))\
我还尝试了densite_rank()和row_number(),但它们都不能提供所需的输出
df.withColumn("c_order", dense_rank()\
.over(Window.partitionBy("ID","date")\
.orderBy("class")))\
df.withColumn("c_order", row_number()\
.over(Window.partitionBy("ID","date")\
.orderBy("class")))\
我的当前输出如下所示:
ID date class c_order
1 2020/01/02 [math,english] 1
1 2020/01/03 [math,english] 1
1 2020/01/04 [math,english] 1
2 2020/01/02 [math,english] 1
2 2020/01/03 [math,english,art] 1
2 2020/01/04 [math,english] 1
2 2020/01/05 [math,english,art] 1
2 2020/01/06 [math,art] 1
2 2020/01/07 [math,art] 1
2 2020/01/08 [math,english,art] 1
我希望输出如下
ID date class c_order
1 2020/01/02 [math,english] 1
1 2020/01/03 [math,english] 1
1 2020/01/04 [math,english] 1
2 2020/01/02 [math,english] 1
2 2020/01/03 [math,english,art] 2
2 2020/01/04 [math,english] 3
2 2020/01/05 [math,english,art] 4
2 2020/01/06 [math,art] 5
2 2020/01/07 [math,art] 5
2 2020/01/08 [math,english,art] 6
只有当类发生更改时,订单才会增加。 知道我哪里做错了吗
谢谢大家!
你不能只做排名。您需要与上一行进行比较(使用
lag
)以检查类何时发生了更改相关问题 更多 >
编程相关推荐