如何在Pandas中处理SettingWithCopyWarning?

2024-04-26 18:12:21 发布

您现在位置:Python中文网/ 问答频道 /正文

背景

我刚把我的熊猫从0.11升级到0.13.0rc1。现在,应用程序弹出了许多新的警告。其中一个是这样的:

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE

我想知道这到底是什么意思?我需要改变什么吗?

如果我坚持使用quote_df['TVol'] = quote_df['TVol']/TVOL_SCALE,应该如何暂停警告?

产生错误的函数

def _decode_stock_quote(list_of_150_stk_str):
    """decode the webpage and return dataframe"""

    from cStringIO import StringIO

    str_of_all = "".join(list_of_150_stk_str)

    quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
    quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
    quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]
    quote_df['TClose'] = quote_df['TPrice']
    quote_df['RT']     = 100 * (quote_df['TPrice']/quote_df['TPCLOSE'] - 1)
    quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
    quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
    quote_df['STK_ID'] = quote_df['STK'].str.slice(13,19)
    quote_df['STK_Name'] = quote_df['STK'].str.slice(21,30)#.decode('gb2312')
    quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])

    return quote_df

更多错误消息

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
E:\FinReporter\FM_EXT.py:450: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
E:\FinReporter\FM_EXT.py:453: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])

Tags: offromdfvaluesliceextquotescale
3条回答

一般来说,SettingWithCopyWarning的目的是向用户(尤其是新用户)表明,他们可能在拷贝上操作,而不是他们认为的原始操作。有假阳性(如果你知道你在做什么,可能是好的)。一种可能是按照@Garrett的建议关闭(默认情况下warn)警告。

还有一个选择:

In [1]: df = DataFrame(np.random.randn(5, 2), columns=list('AB'))

In [2]: dfa = df.ix[:, [1, 0]]

In [3]: dfa.is_copy
Out[3]: True

In [4]: dfa['A'] /= 2
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  #!/usr/local/bin/python

您可以将is_copy标志设置为False,这将有效地关闭该对象的检查,

In [5]: dfa.is_copy = False

In [6]: dfa['A'] /= 2

如果显式复制,则不会出现进一步警告:

In [7]: dfa = df.ix[:, [1, 0]].copy()

In [8]: dfa['A'] /= 2

上面这个操作显示的代码虽然是合法的,而且可能我也做了些什么,但从技术上来说,它是这个警告的一个例子,而不是一个误报。另一种不发出警告的方法是通过reindex进行选择操作,例如

quote_df = quote_df.reindex(columns=['STK', ...])

或者

quote_df = quote_df.reindex(['STK', ...], axis=1)  # v.0.21

创建SettingWithCopyWarning是为了标记可能混淆的“链式”赋值,例如以下赋值,这些赋值并不总是按预期工作,特别是当第一个选择返回copy时。[背景讨论见GH5390GH5597。]

df[df['A'] > 2]['B'] = new_val  # new_val not set in df

警告建议重写如下:

df.loc[df['A'] > 2, 'B'] = new_val

但是,这不适合您的使用,相当于:

df = df[df['A'] > 2]
df['B'] = new_val

虽然很明显,您不关心写操作使其回到原始帧(因为您重写了对它的引用),但不幸的是,此模式无法与第一个链式赋值示例区分开来,因此出现(误报)警告。如果您想进一步阅读,那么docs on indexing中说明了误报的可能性。您可以通过以下分配安全地禁用此新警告。

pd.options.mode.chained_assignment = None  # default='warn'

How to deal with SettingWithCopyWarning in Pandas?

这篇文章是为读者准备的

  1. 想知道这个警告是什么意思
  2. 想了解压制这种警告的不同方法
  3. 希望了解如何改进他们的代码,并遵循良好的实践,以避免在将来出现这种警告。

设置

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('ABCDE'))
df
   A  B  C  D  E
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1

什么是SettingWithCopyWarning

要知道如何处理这一警告,首先必须了解它的含义和提出原因。

过滤数据帧时,根据内部布局和各种实现细节,可以对帧进行切片/索引,以返回视图副本。顾名思义,“视图”是原始数据的视图,因此修改视图可能会修改原始对象。另一方面,“拷贝”是原始数据的复制,修改拷贝对原始数据没有影响。

正如其他答案所提到的,创建SettingWithCopyWarning是为了标记“链式赋值”操作。考虑上面设置中的df。假设要选择“B”列中的所有值,其中“A”列中的值为5。熊猫允许你用不同的方式来做这件事,有些比其他更正确。例如

df[df.A > 5]['B']

1    3
2    6
Name: B, dtype: int64

而且

df.loc[df.A > 5, 'B']

1    3
2    6
Name: B, dtype: int64

它们返回相同的结果,因此如果您只读取这些值,则没有区别。那么,问题是什么?链式赋值的问题是,通常很难预测是否返回视图或副本,因此,当您试图重新赋值时,这在很大程度上成为一个问题。要在前面的示例的基础上构建,请考虑解释器如何执行此代码:

df.loc[df.A > 5, 'B'] = 4
# becomes
df.__setitem__((df.A > 5, 'B'), 4)

用一个__setitem__调用df。哦,考虑一下这个代码:

df[df.A > 5]['B'] = 4
# becomes
df.__getitem__(df.A > 5).__setitem__('B", 4)

现在,根据__getitem__返回的是视图还是副本,__setitem__操作可能不起作用。

一般来说,应该使用^{}作为基于标签的赋值,使用^{}作为基于整数/位置的赋值,因为规范保证它们始终在原始值上操作。另外,对于设置单个单元格,应该使用^{}^{}

更多信息可以在documentation中找到。

Note
All boolean indexing operations done with loc can also be done with iloc. The only difference is that iloc expects either integers/positions for index or a numpy array of boolean values, and integer/position indexes for the columns.

For example,

df.loc[df.A > 5, 'B'] = 4

Can be written nas

df.iloc[(df.A > 5).values, 1] = 4

And,

df.loc[1, 'A'] = 100

Can be written as

df.iloc[1, 0] = 100

And so on.


告诉我怎么抑制警告!

考虑对df的“a”列执行一个简单的操作。选择“A”并除以2将发出警告,但操作将起作用。

df2 = df[['A']]
df2['A'] /= 2
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

df2
     A
0  2.5
1  4.5
2  3.5

有几种方法可以直接消除此警告:

  1. 制作一个deepcopy

    df2 = df[['A']].copy(deep=True)
    df2['A'] /= 2
    
  2. 更改pd.options.mode.chained_assignment
    可以设置为None"warn",或"raise""warn"是默认值。None将完全抑制警告,并且"raise"将抛出一个SettingWithCopyError,阻止操作进行。

    pd.options.mode.chained_assignment = None
    df2['A'] /= 2
    

在注释中,@Peter Cotton提出了一种使用上下文管理器非侵入性地更改模式(从this gist修改)的好方法,仅在需要时设置模式,并在完成后将其重置回原始状态。

class ChainedAssignent:
    def __init__(self, chained=None):
        acceptable = [None, 'warn', 'raise']
        assert chained in acceptable, "chained must be in " + str(acceptable)
        self.swcw = chained

    def __enter__(self):
        self.saved_swcw = pd.options.mode.chained_assignment
        pd.options.mode.chained_assignment = self.swcw
        return self

    def __exit__(self, *args):
        pd.options.mode.chained_assignment = self.saved_swcw

用法如下:

# some code here
with ChainedAssignent():
    df2['A'] /= 2
# more code follows

或者,提出例外

with ChainedAssignent(chained='raise'):
    df2['A'] /= 2

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

“XY问题”:我做错了什么?

很多时候,用户试图寻找抑制这种异常的方法,而不完全理解为什么首先会出现这种异常。这是XY problem的一个很好的例子,用户试图解决一个问题“Y”,这实际上是一个根深蒂固的问题“X”的症状。将根据遇到此警告的常见问题提出问题,然后提出解决方案。

Question 1
I have a DataFrame

df
       A  B  C  D  E
    0  5  0  3  3  7
    1  9  3  5  2  4
    2  7  6  8  8  1

I want to assign values in col "A" > 5 to 1000. My expected output is

      A  B  C  D  E
0     5  0  3  3  7
1  1000  3  5  2  4
2  1000  6  8  8  1

错误的方法:

df.A[df.A > 5] = 1000         # works, because df.A returns a view
df[df.A > 5]['A'] = 1000      # does not work
df.loc[df.A  5]['A'] = 1000   # does not work

正确使用loc

df.loc[df.A > 5, 'A'] = 1000


Question 21
I am trying to set the value in cell (1, 'D') to 12345. My expected output is

   A  B  C      D  E
0  5  0  3      3  7
1  9  3  5  12345  4
2  7  6  8      8  1

I have tried different ways of accessing this cell, such as df['D'][1]. What is the best way to do this?

1. This question isn't specifically related to the warning, but it is good to understand how to do this particular operation correctly so as to avoid situations where the warning could potentially arise in future.

您可以使用以下任何方法来执行此操作。

df.loc[1, 'D'] = 12345
df.iloc[1, 3] = 12345
df.at[1, 'D'] = 12345
df.iat[1, 3] = 12345


Question 3
I am trying to subset values based on some condition. I have a DataFrame

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

I would like to assign values in "D" to 123 such that "C" == 5. I tried

df2.loc[df2.C == 5, 'D'] = 123

Which seems fine but I am still getting the SettingWithCopyWarning! How do I fix this?

这实际上可能是因为在你的管道中有更高的代码。你是不是用更大的东西,比如

df2 = df[df.A > 5]

是吗?在这种情况下,布尔索引将返回一个视图,因此df2将引用原始视图。您需要做的是将df2分配给副本:

df2 = df[df.A > 5].copy()
# Or,
# df2 = df.loc[df.A > 5, :]


Question 4
I'm trying to drop column "C" in-place from

   A  B  C  D  E
1  9  3  5  2  4
2  7  6  8  8  1

But using

df2.drop('C', axis=1, inplace=True)

Throws SettingWithCopyWarning. Why is this happening?

这是因为df2必须是从其他切片操作(如

df2 = df[df.A > 5]

这里的解决方案是将copy()变成df,或者像以前一样使用loc

相关问题 更多 >