使用Pandas将整个数据帧从小写转换为大写

# Create an example dataframe about a fictional army raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'], 'company': ['1st', '1st', '2nd', '2nd'], 'deaths': ['kkk', 52, '25', 616], 'battles': [5, '42', 2, 2], 'size': ['l', 'll', 'l', 'm']} df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size'])

3条回答

网友

1楼 · 编辑于 2024-04-29 05:52:31

这可以通过以下applymap操作解决：

df = df.applymap(lambda s:s.lower() if type(s) == str else s)

网友

2楼 · 编辑于 2024-04-29 05:52:31

由于str仅适用于序列，因此可以将其分别应用于每个列，然后连接：

In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
Out[6]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M

编辑：性能比较

In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper())
100 loops, best of 3: 3.32 ms per loop

In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
100 loops, best of 3: 3.32 ms per loop

两个答案在一个小数据帧上的性能相同。

In [15]: df = pd.concat(10000 * [df])

In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
10 loops, best of 3: 104 ms per loop

In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper())
10 loops, best of 3: 130 ms per loop

在一个大数据帧上，我的回答稍微快一点。

网友

3楼 · 编辑于 2024-04-29 05:52:31

astype()将把每个序列强制转换为dtype对象（字符串），然后对转换后的序列调用str()方法以逐字获取字符串并对其调用函数upper()。注意，在此之后，所有列的数据类型都将更改为object。

In [17]: df
Out[17]: 
     regiment company deaths battles size
0  Nighthawks     1st    kkk       5    l
1  Nighthawks     1st     52      42   ll
2  Nighthawks     2nd     25       2    l
3  Nighthawks     2nd    616       2    m

In [18]: df.apply(lambda x: x.astype(str).str.upper())
Out[18]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M

稍后，您可以使用to_numeric()将“battles”列再次转换为数值：

In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper())

In [43]: df2['battles'] = pd.to_numeric(df2['battles'])

In [44]: df2
Out[44]: 
     regiment company deaths  battles size
0  NIGHTHAWKS     1ST    KKK        5    L
1  NIGHTHAWKS     1ST     52       42   LL
2  NIGHTHAWKS     2ND     25        2    L
3  NIGHTHAWKS     2ND    616        2    M

In [45]: df2.dtypes
Out[45]: 
regiment    object
company     object
deaths      object
battles      int64
size        object
dtype: object

相关问题更多 >

编程相关推荐

热门问题

热门文章