枚举具有相同前缀的列

df = pd.DataFrame({'A':list('abcd'), 'B':list('efgh'), 'Data_mean':[1,2,3,4], 'Data_std':[5,6,7,8], 'Data_corr':[9,10,11,12], 'Text_one':['foo', 'bar', 'foobar', 'barfoo'], 'Text_two':['bar', 'foo', 'barfoo', 'foobar'], 'Text_three':['bar', 'bar', 'barbar', 'foofoo']}) A B Data_mean Data_std Data_corr Text_one Text_two Text_three 0 a e 1 5 9 foo bar bar 1 b f 2 6 10 bar foo bar 2 c g 3 7 11 foobar barfoo barbar 3 d h 4 8 12 barfoo foobar foofoo

A B Data_mean1 Data_std2 Data_corr3 Text_one1 Text_two2 Text_three3 0 a e 1 5 9 foo bar bar 1 b f 2 6 10 bar foo bar 2 c g 3 7 11 foobar barfoo barbar 3 d h 4 8 12 barfoo foobar foofoo

def enumerate_cols(dataframe, prefix): cols = [] num = 1 for col in dataframe.columns: if col.startswith(prefix): cols.append(col + str(num)) num += 1 else: cols.append(col) return cols

3条回答

网友

1楼 · 编辑于 2024-05-26 11:56:40

您还可以使用defaultdict为每个前缀创建一个计数器。你知道吗

from collections import defaultdict

prefix_starting_location = 2
columns = df.columns[prefix_starting_location:]
prefixes = set(col.split('_')[0] for col in columns)

new_cols = []
dd = defaultdict(int)
for col in columns:
    prefix = col.split('_')[0]
    dd[prefix] += 1
    new_cols.append(col + str(dd[prefix]))
df.columns = df.columns[:prefix_starting_location].tolist() + new_cols
>>> df
   A  B  Data_mean1  Data_std2  Data_corr3 Text_one1 Text_two2 Text_three3
0  a  e           1          5           9       foo       bar         bar
1  b  f           2          6          10       bar       foo         bar
2  c  g           3          7          11    foobar    barfoo      barbar
3  d  h           4          8          12    barfoo    foobar      foofoo

如果前缀已知：

prefixes = ['Data', 'Text']
new_cols = []
dd = defaultdict(int)
for col in df.columns:
    prefix = col.split('_')[0]
    if prefix in prefixes:
        dd[prefix] += 1
        new_cols.append(col + str(dd[prefix]))
    else:
        new_cols.append(col)

如果分割字符_不在任何数据字段中：

new_cols = []
dd = defaultdict(int)
for col in df.columns:
    if '_' in col:
        prefix = col.split('_')[0]
        dd[prefix] += 1
        new_cols.append(col + str(dd[prefix]))
    else:
        new_cols.append(col)

df.columns = new_cols

网友

2楼 · 编辑于 2024-05-26 11:56:40

您可以使用rename，例如：

l_word = ['Data','Text']
df = df.rename(columns={ col:col+str(i+1) 
                         for word in l_word 
                         for i, col in enumerate(df.filter(like=word))})

网友

3楼 · 编辑于 2024-05-26 11:56:40

其思想是将具有相同前缀的列分组，并为它们建立一个cumcount。你知道吗

由于我们需要分别处理不带前缀的列，因此需要使用GroupBy.cumcount和np.where分两步进行：

cols = df.columns.str.split('_').str[0].to_series()

df.columns = np.where(
    cols.groupby(level=0).transform('count') > 1, 
    cols.groupby(level=0).cumcount().add(1).astype(str).radd(df.columns), 
    cols
)

df
   A  B  Data_mean1  Data_std2  Data_corr3 Text_one1 Text_two2 Text_three3
0  a  e           1          5           9       foo       bar         bar
1  b  f           2          6          10       bar       foo         bar
2  c  g           3          7          11    foobar    barfoo      barbar
3  d  h           4          8          12    barfoo    foobar      foofoo

一个更简单的解决方案是将不想添加后缀的列设置为索引。那么你可以简单地

df.set_index(['A', 'B'], inplace=True)
df.columns = (
    df.columns.str.split('_')
      .str[0]
      .to_series()
      .groupby(level=0)
      .cumcount()
      .add(1)
      .astype(str)
      .radd(df.columns))

df
     Data_mean1  Data_std2  Data_corr3 Text_one1 Text_two2 Text_three3
A B                                                                   
a e           1          5           9       foo       bar         bar
b f           2          6          10       bar       foo         bar
c g           3          7          11    foobar    barfoo      barbar
d h           4          8          12    barfoo    foobar      foofoo

相关问题更多 >

编程相关推荐

热门问题

热门文章