我正在努力计算我匹配的两个字符串相似度高的项目之间的属性匹配率
我试过两个变量循环,但是有这样的错误 '索引器错误:单个位置索引器超出界限'
我尝试的代码是:
nuomlist = pd.DataFrame(dfn.columns, columns=['Col'])
nuomN = nuomlist[nuomlist['Col'].str.contains('-')].index.tolist()
for i in range(int(nuomN[-1]+1),int(dfn.columns.get_loc("sim_1"))) :
for j in dfn.index:
sum(dfn.iloc[j,i]==dfn.iloc[j+dfn.iloc[j,dfn.columns.get_loc('Max_row')],i])/
int(dfn.columns.get_loc("sim_1") - (nuomN[-1] + 1))
这是示例数据集
data = {'S_ITEMCODE':['', '81527800', '', '81527900'],
'N':['N', '','N', ''],
'ITEMCODE':['81527800', '81320323', '81527900', '81267337'],
'DESC':['Store Brand (Woongjin) SB Fresh Orange Drink Orange NO P.BTL 1.5lit', 'Store Brand (Woongjin) SB Fresh Orange Drink Orange NO P.BTL 1lit', 'Store Brand (Woongjin) SB Fresh Jeju Tang. Drink Tang. NO P.B 1.5lit', 'Store Brand (Woongjin) SB Fresh Jeju Tang. Drink Tang. NO P.B 1lit'],
'ATTR1':['1A', '1A', '1B', '1B'],
'ATTR2':['1A', '1C', '1B', '1B'],
'ATTR3':['1A', '1A', '1B', '1B'],
'ROW_INDEX_SIMILAR_ITEM':[1, -1, 1, 1]}
df = pd.DataFrame(data)
“N”列表示新项
我想计算属性匹配率 对于“N”==“N”的行 在新项目和Jaccard字符串相似性高项目(S\U itemcode)之间
(例如81527800(新项目)-81320323、81527900(新项目)-81267337)
这是我想要的结果
data1 = {'S_ITEMCODE':['', '81527800', '', '81527900'],
'N':['N', '','N', ''],
'ITEMCODE':['81527800', '81320323', '81527900', '81267337'],
'DESC':['Store Brand (Woongjin) SB Fresh Orange Drink Orange NO P.BTL 1.5lit', 'Store Brand (Woongjin) SB Fresh Orange Drink Orange NO P.BTL 1lit', 'Store Brand (Woongjin) SB Fresh Jeju Tang. Drink Tang. NO P.B 1.5lit', 'Store Brand (Woongjin) SB Fresh Jeju Tang. Drink Tang. NO P.B 1lit'],
'ATTR1':['1A', '1A', '1B', '1B'],
'ATTR2':['1A', '1C', '1B', '1B'],
'ATTR3':['1A', '1A', '1B', '1B'],
'ROW_INDEX_SIMILAR_ITEM':[1, -1, 1, 1]}
'ATTR_MATCHING_RATE':[2/3, '', 1, '']}
df = pd.DataFrame(data1)
请帮帮我。。。 我被困住了
这将为您提供所需的输出:
相关问题 更多 >
编程相关推荐