<p>编写了一个小函数来连接重复的列名。
函数关心排序如果原始数据帧未排序,则输出将是已排序的。</p>
<pre><code>def concat_duplicate_columns(df):
dupli = {}
# populate dictionary with column names and count for duplicates
for column in df.columns:
dupli[column] = dupli[column] + 1 if column in dupli.keys() else 1
# rename duplicated keys with °°° number suffix
for key, val in dict(dupli).items():
del dupli[key]
if val > 1:
for i in range(val):
dupli[key+'°°°'+str(i)] = val
else: dupli[key] = 1
# rename columns so that we can now access abmigous column names
# sorting in dict is the same as in original table
df.columns = dupli.keys()
# for each duplicated column name
for i in set(re.sub('°°°(.*)','',j) for j in dupli.keys() if '°°°' in j):
i = str(i)
# for each duplicate of a column name
for k in range(dupli[i+'°°°0']-1):
# concatenate values in duplicated columns
df[i+'°°°0'] = df[i+'°°°0'].astype(str) + df[i+'°°°'+str(k+1)].astype(str)
# Drop duplicated columns from which we have aquired data
df = df.drop(i+'°°°'+str(k+1), 1)
# resort column names for proper mapping
df = df.reindex_axis(sorted(df.columns), axis = 1)
# rename columns
df.columns = sorted(set(re.sub('°°°(.*)','',i) for i in dupli.keys()))
return df
</code></pre>