我正在复制一个格式错误的Excel表格的摘录(带pd.read\u剪贴板)。这是大约120列宽,不同的列长度。在每第三列之后,下一列应该附加在第一列之后。所以我应该有三列
我设置了一个示例数据帧:
df = pd.DataFrame({
"1": np.random.randint(900000000, 999999999, size=5),
"2": np.random.choice( ["A","B","C", np.nan], 5),
"3": np.random.choice( [np.nan, 1], 5),
"4": np.random.randint(900000000, 999999999, size=5),
"5": np.random.choice( ["A","B","C", np.nan], 5),
"6": np.random.choice( [np.nan, 1], 5)
})
结果是这样的:
1 2 3 4 5 6
0 925846412 nan 1.0 994235729 nan NaN
1 991877917 B 1.0 970766032 nan NaN
2 931608603 B NaN 937096948 B NaN
3 977083128 A NaN 974190653 B 1.0
4 937344792 nan NaN 972948910 B 1.0
到目前为止,我的情况是:
col_counter = 0
df_neu = pd.DataFrame(columns=["A", "B", "C"])
for column in df.columns:
if col_counter == 3:
col_counter = 0
if col_counter == 0:
# set_trace()
df_neu["A"] = df_neu["A"].append(df[column]).reset_index(drop = True)
elif col_counter == 1:
df_neu["B"] = df_neu["B"].append(df[column]).reset_index(drop = True)
elif col_counter == 2:
df_neu["C"] = df_neu["C"].append(df[column]).reset_index(drop = True)
col_counter +=1
要求的结果是:
A B C
0 925846412 nan 1.0
1 991877917 B 1.0
2 931608603 B NaN
3 977083128 A NaN
4 937344792 nan NaN
5 994235729 nan NaN
6 970766032 nan NaN
7 937096948 B NaN
8 974190653 B 1.0
9 972948910 B 1.0
但我收到以下信息:
A B C
0 925846412 NaN NaN
1 991877917 NaN NaN
2 931608603 NaN NaN
3 977083128 NaN NaN
4 937344792 NaN NaN
所以只有第一次迭代的第一列被追加。忽略任何其他列
所以我的问题是:
您可以按整数在列中创建} 、^{} 和最后一个^{} 对remove
MultiIndex
,按列长度创建的数组进行模除,然后按^{MultiIndex
进行整形:如果附加到
Series
并由constructor最后创建DataFrame
,则解决方案有效:相关问题 更多 >
编程相关推荐