我想用来自主源的值填充新的数据帧。如果ID不相同,我想用NEWCUSTOMER
填充条目。我尝试了跟踪,但它抛出了一个错误,即该列不可编辑
任务: 我有“老”客户和“新”客户。我的目标是将testC中没有customerID的“新”客户分类为“新客户”。如果客户(列车中)存在customerID,那么它应该为testC中的客户提供列车中customerCategory的值
train.show(1)
testC = testC.withColumn("customerCategory", F.when(testC.customerID.contains(train.customerID),\
F.col(train.customerCategory)).otherwise("NEWCUSTOMER"))
+-----------+----------+------------+------+----+-----+--------------+-----+----------+----------+-----------+------+------------+--------------+----------------+
|orderItemID| orderDate|deliveryDate|itemID|size|color|manufacturerID|price|customerID|salutation|dateOfBirth| state|creationDate|returnShipment|customerCategory|
+-----------+----------+------------+------+----+-----+--------------+-----+----------+----------+-----------+------+------------+--------------+----------------+
| 148|2012-04-01| 2012-04-04| 651| xl| blue| 46| 19.9| 1121| Mrs| ?|Berlin| 2012-04-01| 0| GREEN|
+-----------+----------+------------+------+----+-----+--------------+-----+----------+----------+-----------+------+------------+--------------+----------------+
TypeError: Column is not iterable
TypeError Traceback (most recent
call last)
<command-3715636189631646> in <module>
1 train.show(1)
2 testC = testC.withColumn("customerCategory",
F.when(testC.customerID.contains(train.customerID),\
----> 3 F.col(train.customerCategory)).otherwise("NEWCUSTOMER"))
起始数据集
testC
的结构不清楚,但IIUC可以使用左join
,然后仅在感兴趣的列上使用fillna
方法我想试试
LEFT JOIN
相关问题 更多 >
编程相关推荐