使用map时Pandas警告:试图在DataFram的切片副本上设置值

2024-05-15 20:54:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的代码,它是有效的。这基本上是重命名列中的值,以便以后合并它们。

pop = pd.read_csv('population.csv')
pop_recent = pop[pop['Year'] == 2014]

mapping = {
        'Korea, Rep.': 'South Korea',
        'Taiwan, China': 'Taiwan'
}
f= lambda x: mapping.get(x, x)
pop_recent['Country Name'] = pop_recent['Country Name'].map(f)

Warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy pop_recent['Country Name'] = pop_recent['Country Name'].map(f)

我用谷歌搜索了这个!但似乎没有使用地图的例子,所以我不知所措。。。


Tags: csvthenamemappandasvaluepopcountry
2条回答

我建议您在pop_recent = pop[pop['Year'] == 2014]中重置索引。

如果要对dataframe的某列应用某些函数,请尝试使用dataframe API的函数^{}。简单演示:

 mapping = {
        'Korea, Rep.': 'South Korea',
        'Taiwan, China': 'Taiwan'
 }
 df = pandas.DataFrame({'Country':['Korea, Rep.', 'Taiwan, China', 'Japan', 'USA'], 'date':[2014, 2014, 2015, 2014]})
 df_recent = df[df['date'] == 2014].reset_index()
 df_recent['Country'] = df_recent['Country'].apply(lambda x: mapping.get(x, x))

输出:

>>> df_recent
index      Country  date
0      0  South Korea  2014
1      1       Taiwan  2014
2      3          USA  2014

问题在于chained indexing,实际上您要做的是将值设置为-pop[pop['Year'] == 2014]['Country Name']-这在大多数情况下都不起作用(在链接的文档中解释得很好),因为这是两个不同的调用,其中一个调用可能返回数据帧的副本(我相信布尔索引)返回数据帧的副本。

因此,当您尝试设置该副本的值时,它不会反映在原始数据帧中。示例-

In [6]: df
Out[6]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [7]: df[df['A']==1]['B'] = 10
/path/to/ipython-script.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

In [8]: df
Out[8]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

如前所述,您应该使用DataFrame.loc来索引要在单个调用中更新的行和列,而不是链式索引,从而避免此错误。示例-

pop.loc[(pop['year'] == 2014), 'Country Name'] = pop.loc[(pop['year'] == 2014), 'Country Name'].map(f)

或者,如果这对您来说太长,您可以预先创建一个掩码(boolean dataframe)并分配给一个变量,并在上面的语句中使用它。示例-

mask = pop['year'] == 2014
pop.loc[mask,'Country Name'] = pop.loc[mask,'Country Name'].map(f)

演示-

In [9]: df
Out[9]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [10]: mapping = { 1:2 , 3:4}

In [11]: f= lambda x: mapping.get(x, x)

In [12]: df.loc[(df['B']==2),'A'] = df.loc[(df['B']==2),'A'].map(f)

In [13]: df
Out[13]:
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9

使用掩码方法演示-

In [18]: df
Out[18]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [19]: mask = df['B']==2

In [20]: df.loc[mask,'A'] = df.loc[mask,'A'].map(f)

In [21]: df
Out[21]:
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9

相关问题 更多 >