在pandas中按位置选择多个dataframe列

2024-05-20 00:55:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个(大)数据框。如何按位置选择特定列?e、 g.第1..3、5、6列

我不只是删除column4,而是尝试这样做,因为我的数据集中有很多行,我想按位置选择:

 df=df[df.columns[0:2,4:5]]

但这给了IndexError: too many indices for array

测向输入

 Col1     Col2     Col3       Col4        Col5       Col6
 1        apple    tomato     pear        banana     banana
 1        apple    grape      nan         banana     banana
 1        apple    nan        banana      banana     banana
 1        apple    tomato     banana      banana     banana
 1        apple    tomato     banana      banana     banana
 1        apple    tomato     banana      banana     banana
 1        avacado  tomato     banana      banana     banana
 1        toast    tomato     banana      banana     banana
 1        grape    tomato     egg         banana     banana

DF输出-所需

 Col1     Col2     Col3       Col5       Col6
 1        apple    tomato     banana     banana
 1        apple    grape      banana     banana
 1        apple    nan        banana     banana
 1        apple    tomato     banana     banana
 1        apple    tomato     banana     banana
 1        apple    tomato     banana     banana     
 1        avacado  tomato     banana     banana     
 1        toast    tomato     banana     banana     
 1        grape    tomato     banana     banana

Tags: 数据appledfnancol2col3col1banana
3条回答

你需要的是numpy ^{}

df.iloc[:,np.r_[0:2,4:5]]
Out[265]: 
   Col1     Col2    Col5
0     1    apple  banana
1     1    apple  banana
2     1    apple  banana
3     1    apple  banana
4     1    apple  banana
5     1    apple  banana
6     1  avacado  banana
7     1    toast  banana
8     1    grape  banana

使用pandas iloc方法:

df_filtered = df.iloc[:, [1,2,3,5,6]]

可以通过以下方式选择列0、1、4:

df.iloc[:, [0, 1, 4]]

您可以在Indexing and Selecting Data中阅读更多关于此的信息。

• iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing. (this conforms with python/numpy slice semantics). Allowed inputs are:

◦ An integer e.g. 5

◦ A list or array of integers [4, 3, 0]

◦ A slice object with ints 1:7

◦ A boolean array

◦ A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above)

相关问题 更多 >