我喜欢合并数据帧中长度不等的两列。你知道吗
我尝试过许多方法,包括合并、concat和join,但都没有效果。你知道吗
keyList = ["Clone", "Chain", "Fragment", "R0", "R1", "R2"]
dataDict = {key: [] for key in keyList}
# Example for different list length
plist1 = ["ABCD", "DJFZ", "DHRZ"]
plist2 = ["ABCD", "DJFZ", "DHRZ", "JGJZ"]
filelist = ["E2_VH_Fab_R0.fasta", "E2_VH_scFV_R0.fasta", "E2_VH_Fab_R1.fasta", "E2_VH_scFV_R1.fasta","E2_VH_Fab_R2.fasta" ]
# Subsets are:
# E1 || E2 with VH || VL with Fab || scFV with R0 || R1 || R2
for file in enumerate(filelist):
# Get List with emits from class function
peptidelist = clseq.processEmits()
# Split filename into 6 parameters, see keylist
fileparms = datafile.split('.')[0].split('_')
# Iterate through peptide list and add the subsets into the dict
for peptide in peptidelist:
dataDict.setdefault("Clone", []).append(sclone)
dataDict.setdefault("Chain", []).append(schain)
dataDict.setdefault("Fragment", []).append(sfragment)
# Set other Rounds as "NaN" to equal the length
if "R0" in sround:
dataDict.setdefault("R0", []).append(peptide)
dataDict.setdefault("R1", []).append("NaN")
dataDict.setdefault("R2", []).append("NaN")
elif "R1" in sround:
dataDict.setdefault("R0", []).append("NaN")
dataDict.setdefault("R1", []).append(peptide)
dataDict.setdefault("R2", []).append("NaN")
elif "R2" in sround:
dataDict.setdefault("R0", []).append("NaN")
dataDict.setdefault("R1", []).append("NaN")
dataDict.setdefault("R2", []).append(peptide)
else:
dataDict.setdefault("R0", []).append("NaN")
dataDict.setdefault("R1", []).append("NaN")
dataDict.setdefault("R2", []).append("NaN")
dtframe.merge(pd.DataFrame(dataDict), on=['Clone', 'Chain', 'Fragment'], how='inner')
问题是,我有不同的列表长度,我喜欢合并成一个数据帧,并用NaN填充其余的数据帧。你知道吗
这是:
0 E2 VH Fab r0 nan
1 E2 VH Fab r0 nan
2 E2 VH Fab r0 nan
3 E2 VH Fab r0 nan
4 E2 VH Fab r0 nan
5 E2 VH Fab r0 nan
还有这个:
0 E2 VH Fab nan r1
1 E2 VH Fab nan r1
2 E2 VH Fab nan r1
3 E2 VH Fab nan r1
4 E2 VH Fab nan r1
5 E2 VH Fab nan r1
6 E2 VH Fab nan r1
7 E2 VH Fab nan r1
应导致:
0 E2 VH Fab r0 r1
1 E2 VH Fab r0 r1
2 E2 VH Fab r0 r1
3 E2 VH Fab r0 r1
4 E2 VH Fab r0 r1
5 E2 VH Fab r0 r1
6 E2 VH Fab nan r1
7 E2 VH Fab nan r1
请注意,我的所有数据字段都是字符串。你知道吗
这是
combine_first
。我们需要将索引设置为要合并的三列,然后为具有许多不同组的实际数据创建额外的cumcount
级别。你知道吗df1
df2
相关问题 更多 >
编程相关推荐