如何以及何时在Python Pandas中使用链索引?

2024-04-25 16:35:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在上一门关于Python数据分析的数据科学课。在课程的某一点上,教授说:

You can chain operations together. For instance, we could have rewritten the query for all Store 1 costs as df.loc['Store 1']['Cost']. This looks pretty reasonable and gets us the result we wanted. But chaining can come with some costs and is best avoided if you can use another approach. In particular, chaining tends to cause Pandas to return a copy of the DataFrame instead of a view on the DataFrame. For selecting data, this is not a big deal, though it might be slower than necessary. If you are changing data though, this is an important distinction and can be a source of error.

后来,他将链索引描述为:

Generally bad, pandas could return a copy of a view depending upon NumPy

因此,他建议使用多轴索引(df.loc['a', '1'])。在

我在想,是否总是最好不要使用链式索引,或者是否有适合它的特定使用案例?

另外,如果它确实可以返回一个视图或一个视图的副本(取决于NumPy),它到底依赖于什么?我可以影响它以获得期望的结果吗?在

我发现this answer表示:

When you use df['1']['a'], you are first accessing the series object s = df['1'], and then accessing the series element s['a'], resulting in two __getitem__ calls, both of which are heavily overloaded (handle a lot of scenarios, like slicing, boolean mask indexing, and so on).

…这让人觉得链式索引总是不好。思想?在


Tags: andofthestoreyoudfforis