我想找到以下问题的解决方案:
import pandas as pd
rows = {'Id': ['xb01','nt02','tw02','dt92','tw03','we04','er04','ew06','re07','ti92'],
'DatasetName': ['first label','second label','third label','fourth label','third
label','third label','third label','fourth label','first label','last label'],
'Target': ['first label','second label','the third labels','fourth label
set','third label', 'third label','third label sets','fourth label sets','first
label','last labels']
}
df = pd.DataFrame(rows, columns = ['Id', 'DatasetName','Target'])
print (df)
数据帧如下所示:
Id DatasetName Target
xb01 first label first label
nt02 second label second label
tw02 third label the third labels
dt92 fourth label fourth label set
tw03 third label third label
we04 third label third label
er04 third label third label sets
ew06 fourth label fourth label sets
re07 first label first label
ti92 last label last labels
伪代码:
for i in len(range(df)):
if DatasetName[i].is_unique:
if DatasetName[i]!=Target[i]:
Target[i]=DatasetName[i]+ '|'+Target[i]
else:
loop through dataframe and find all labels that belongs to the same DatasetName
and append all those Target names together. (Note: if DatasetName is not same as
Target Name(s), the Dataset name should also append to the Target)
在这里我们可以看到:
DatasetName Appeared Target
first label 2 first label
second label 1 second label
third label 4 the third labels | third label | third label sets
fourth label 2 fourth label set | fourth label sets|fourth label
last label 1 last labels | last label
预期输出:
Id DatasetName Target
xb01 first label first label
nt02 second label second label
tw02 third label the third labels|third label|third label sets
dt92 fourth label fourth label set|fourth label sets |fourth label
tw03 third label the third labels|third label|third label sets
we04 third label the third labels|third label|third label sets
er04 third label the third labels|third label|third label sets
ew06 fourth label fourth label set|fourth label sets| fourth label
re07 first label first label
ti92 last label last labels|last label
注意:实际数据帧有100000行。这些字符串中可能仍然存在额外的空格(我已经实现了dataframe lower case(),删除了所有额外的标记,等等)。在这个问题上可能会有一些错误(打字错误)(我已经复制和粘贴了好几次),但希望你能了解我正在寻找的解决方案。谢谢大家!
让我们尝试使用^{} 值和^{} 返回的^{} :
输出:
相关问题 更多 >
编程相关推荐