如果NaN出现在多个列中的任何位置，则删除组

State Year Base_2007 Base_2011 County 0 AL 2012 NaN 14.0 Alabama_Country 1 AL 2013 12.0 20.0 Alabama_Country 2 AL 2014 13.0 NaN Alabama_Country 3 DC 2011 NaN 20.0 Trenton 4 DC 2012 19.0 NaN Trenton 5 DC 2013 20.0 21.0 Trenton 6 DC 2014 25.0 30.0 Trenton

{'State': {0: 'AL', 1: 'AL', 2: 'AL', 3: 'AL', 4: 'AL', 5: 'AL', 6: 'AL', 7: 'AL', 8: 'AL', 9: 'AL'}, 'County': {0: 'Autauga', 1: 'Autauga', 2: 'Autauga', 3: 'Autauga', 4: 'Autauga', 5: 'Autauga', 6: 'Autauga', 7: 'Autauga', 8: 'Autauga', 9: 'Autauga'}, 'FIPS code': {0: 1001, 1: 1001, 2: 1001, 3: 1001, 4: 1001, 5: 1001, 6: 1001, 7: 1001, 8: 1001, 9: 1001}, 'Year': {0: 1986, 1: 1987, 2: 1988, 3: 1989, 4: 1990, 5: 1991, 6: 1992, 7: 1993, 8: 1994, 9: 1995}, 'Annual_pct_change': {0: nan, 1: -2.17, 2: 3.24, 3: 4.16, 4: -0.35, 5: 2.69, 6: 2.85, 7: 3.34, 8: 4.33, 9: 3.48}, 'HPI': {0: 100.0, 1: 97.83, 2: 100.99, 3: 105.19, 4: 104.82, 5: 107.64, 6: 110.7, 7: 114.4, 8: 119.35, 9: 123.5}, 'HPI1990': {0: 95.4, 1: 93.33, 2: 96.35, 3: 100.36, 4: 100.0, 5: 102.69, 6: 105.61, 7: 109.14, 8: 113.86, 9: 117.82}, 'HPI2000': {0: 71.03, 1: 69.49, 2: 71.74, 3: 74.72, 4: 74.45, 5: 76.46, 6: 78.63, 7: 81.26, 8: 84.77, 9: 87.72}, 'CountyName': {0: 'Autauga County', 1: 'Autauga County', 2: 'Autauga County', 3: 'Autauga County', 4: 'Autauga County', 5: 'Autauga County', 6: 'Autauga County', 7: 'Autauga County', 8: 'Autauga County', 9: 'Autauga County'}}

3条回答

网友

1楼 · 编辑于 2024-04-19 01:01:15

在pandas中使用query检查null并找到{}县

county = data.query("Base_2011.isnull() or Base_2007.isnull()", engine='python').County.unique()

从列表中选择包含剩余县的所有行

^{pr2}$

State   Year    Base_2007   Base_2011   County
7   DM  2013    34.0    45.0    Dummy
8   DM  2012    34.0    45.0    Dummy

网友

2楼 · 编辑于 2024-04-19 01:01:15

只需使用

    df.dropna()

^{pr2}$

网友

3楼 · 编辑于 2024-04-19 01:01:15

我在下面的数据集上测试了这一点（这还要求将NA替换为np.nan作为df = df.replace('NA', np.nan)，如果它们是字符串）

print(df)

  State  Year  Base_2007  Base_2011           County
0    AL  2012        NaN       14.0  Alabama_Country
1    AL  2013       12.0       20.0  Alabama_Country
2    AL  2014       13.0        NaN  Alabama_Country
3    DC  2011        NaN       20.0          Trenton
4    DC  2012       19.0        NaN          Trenton
5    DC  2013       20.0       21.0          Trenton
6    DC  2014       25.0       30.0          Trenton
7    DM  2013       34.0       45.0            Dummy
8    DM  2012       34.0       45.0            Dummy

删除包含NaN的County可以使用以下方法：

^{pr2}$

我将很快更新解释。在

说明

The following finds any NaN rows based on subset of Base_2007 and Base_2011

df[['Base_2007','Base_2011']].isna().any(axis=1)
0     True
1    False
2     True
3     True
4     True
5    False
6    False
7    False
8    False

将上述输出作为布尔掩码，我们将^{}函数称为：

df.loc[df[['Base_2007','Base_2011']].isna().any(axis=1),'County']

它给出了：

0    Alabama_Country
2    Alabama_Country
3            Trenton
4            Trenton

注意我们只取**County列。原因是下一步。在

我们获取上面的输出，并通过使用^{}查找原始数据帧中County列中是否存在任何单元格

这对于County中的行返回True，该行存在于df.loc[]的输出中。在

然后我们用一个反转~来否定它们，它将所有True变成{}，反之亦然。在

~df.County.isin(df.loc[df[['Base_2007','Base_2011']].isna().any(axis=1),'County'])
0    False
1    False
2    False
3    False
4    False
5    False
6    False
7     True
8     True

一旦我们准备好了，我们将应用与df.loc[]相同的逻辑。在

最后我们得到一个数据帧，它只返回那些在Base_2007和Base_2011中没有NaN的县。在

注意：如果我们希望索引从0开始，而不是数据帧的片段，我们可以在代码末尾添加一个reset_index(drop=True)，如下所示：

df_new=df.loc[~df.County.isin(df.loc[df[['Base_2007','Base_2011']].isna().\                                 
                    any(axis=1),'County'])].reset_index(drop=True)

   State  Year  Base_2007  Base_2011 County
0    DM  2013       34.0       45.0  Dummy
1    DM  2012       34.0       45.0  Dummy

相关问题更多 >

编程相关推荐

热门问题

热门文章