问题:
我从多个来源获取数据,寻找相同的数据点。我想更新我的主数据框,以反映这些补充文件中的可用性。每个补充文件都特定于在主数据框中找到的一个零售商。你知道吗
数据:
data = {
"sku": {
"0": "123",
"1": "321",
"2": "456",
"3": "678",
"4": "123",
"5": "321",
"6": "456",
"7": "101",
"8": "123",
"9": "101"
},
"retailer": {
"0": "BobStore",
"1": "BobStore",
"2": "BobStore",
"3": "BobStore",
"4": "SamStore",
"5": "SamStore",
"6": "RobStore",
"7": "SamStore",
"8": "RobStore",
"9": "BobStore"
},
"description": {
"0": "Jacket",
"1": "Jacket",
"2": "Jacket",
"3": "Jacket",
"4": "Jacket",
"5": "Jacket",
"6": "Jacket",
"7": "Jacket",
"8": "Jacket",
"9": "Jacket"
},
"price": {
"0": 19.99,
"1": 18.99,
"2": 12.99,
"3": 15.99,
"4": 12.99,
"5": 12.99,
"6": 11.99,
"7": 19.99,
"8": 16.99,
"9": 18.99
},
"shipping": {
"0": 6.99,
"1": 4.99,
"2": 6.99,
"3": 3.99,
"4": 6.99,
"5": 4.99,
"6": 6.99,
"7": 9.99,
"8": 1.99,
"9": 2.99,
},
"availability": {
"0": "True",
"1": "False",
"2": "",
"3": "",
"4": "False",
"5": "True",
"6": "",
"7": "",
"8": "",
"9": "True",
}
}
df = pd.DataFrame(data=data)
data = {
"sku": {
"0": "123",
"1": "101",
"2": "456",
"3": "879",
},
"availability": {
"0": "False",
"1": "True",
"2": "True",
"3": "True",
}
}
bobStore = pd.DataFrame(data=data)
data = {
"sku": {
"0": "123",
"1": "101",
},
"availability": {
"0": "False",
"1": "True",
}
}
samStore = pd.DataFrame(data=data)
尝试的解决方案:
df.loc[df['retailer']=='BobStore'].set_index('sku').update(bobStore.set_index('sku'))
df.loc[df['retailer']=='BobStore'].set_index('sku').update(bobStore.set_index('sku'))
期望输出:
data = {"availability":{"0":"False","1":"False","2":"True","3":"","4":"False","5":"True","6":"","7":"True","8":"","9":"True"},"description":{"0":"Jacket","1":"Jacket","2":"Jacket","3":"Jacket","4":"Jacket","5":"Jacket","6":"Jacket","7":"Jacket","8":"Jacket","9":"Jacket"},"price":{"0":19.99,"1":18.99,"2":12.99,"3":15.99,"4":12.99,"5":12.99,"6":11.99,"7":19.99,"8":16.99,"9":18.99},"retailer":{"0":"BobStore","1":"BobStore","2":"BobStore","3":"BobStore","4":"SamStore","5":"SamStore","6":"RobStore","7":"SamStore","8":"RobStore","9":"BobStore"},"shipping":{"0":6.99,"1":4.99,"2":6.99,"3":3.99,"4":6.99,"5":4.99,"6":6.99,"7":9.99,"8":1.99,"9":2.99},"sku":{"0":"123","1":"321","2":"456","3":"678","4":"123","5":"321","6":"456","7":"101","8":"123","9":"101"}}
do = pd.DataFrame(data=data)
加分:
帮助我理解为什么更新不能像我预期的那样工作?我可以在使用sku和零售商的多索引时切换值,但在使用布尔掩码选择要更新的切片时无法切换值。你知道吗
多索引解决方案:
df.set_index(['sku','retailer'], inplace=True)
bobStore['retailer'] = 'BobStore'
bobStore.set_index(['sku','retailer'],inplace=True)
df.update(bobStore)
ColdSpeed解决方案尝试:
test1 = bobStore[bobStore['sku'].isin(np.intersect1d(bobStore['sku'],df['sku']))].combine_first(df[df['retailer']=='BobStore']).combine_first(df)
这就省去了“BobStore”sku 321“availability”的错误
在没有预期输出的情况下无法确认,但我认为您可能正在查找
merge
+combine_first
。你知道吗一个有pd.concat公司以及基于新创建的键(即
输出:
这是通过使用Multi-Index:)(我对索引切片使用了两种方法)
相关问题 更多 >
编程相关推荐