尝试映射具有重复值的序列时出现InvalidIndexError

2024-04-25 11:48:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图把医院的名字映射到他们的英国邮政编码上。我在那些医院有一个脊柱手术的csv(在英国被称为“信托”),csv是凯特_脊椎.csv你知道吗

我从中导入一个列(Trust)来简化事情。你知道吗

import pandas as pd
spine = pd.read_csv('~/Dropbox/Work/NNAP/Spine/Kate_W/kate_spine2.csv', usecols = ['Trust'])

显示导入:

spine.head()


Trust
0   THE WALTON CENTRE NHS FOUNDATION TRUST
1   CAMBRIDGE UNIVERSITY HOSPITALS NHS FOUNDATION ...
2   KING'S COLLEGE HOSPITAL NHS FOUNDATION TRUST
3   LEEDS TEACHING HOSPITALS NHS TRUST
4   NT424

这些是信任名称并有一个索引。 我的邮政编码都在csv里_全部.csv. 我导入的文件作为一列,也'信任'简化。 下面的表格格式不好,但有邮政编码。你知道吗

postcodes_all = pd.read_csv('all_all.csv', index_col = 'Trust')
postcodes_all.head()

    Unnamed: 0  postcode
Trust       
MANCHESTER UNIVERSITY NHS FOUNDATION TRUST  0   M13 9WL
SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST  1   SR4 7TP
WORCESTERSHIRE HEALTH AND CARE NHS TRUST    2   WR5 1JR
SOLENT NHS TRUST    3   SO19 8BR
SHROPSHIRE COMMUNITY HEALTH NHS TRUST   4   SY3 8XL

我正在尝试使用地图从14000的csv中获取大约200个代码。这是我的密码:

spine['Trust'].map(postcodes_all['postcode'])

错误是:

InvalidIndexError                         Traceback (most recent call last)
<ipython-input-6-25212fe14f16> in <module>
----> 1 spine['Trust'].map(postcodes_all['postcode'])

~/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in map(self, arg, na_action)
   3826         dtype: object
   3827         """
-> 3828         new_values = super()._map_values(arg, na_action=na_action)
   3829         return self._constructor(new_values, index=self.index).__finalize__(self)
   3830 

~/anaconda3/lib/python3.7/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action)
   1275                 values = self.values
   1276 
-> 1277             indexer = mapper.index.get_indexer(values)
   1278             new_values = algorithms.take_1d(mapper._values, indexer)
   1279 

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2983         if not self.is_unique:
   2984             raise InvalidIndexError(
-> 2985                 "Reindexing only valid with uniquely" " valued Index objects"
   2986             )
   2987 

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

信任列中的spine文件确实包含重复的值,因为每行描述信任中的单个医生手术活动,并且序列中最多有10个医生(因此有10个重复的信任名称)。在提取了唯一的信任名称之后,我想到了尝试这个方法。理想情况下,虽然我想能够做它的系列与它的副本。你知道吗


Tags: csvinselfmappandasindexall信任
1条回答
网友
1楼 · 发布于 2024-04-25 11:48:44

The spine file in the Trust column does contain duplicate values as each row describes the individual doctors surgical activity within the Trust and there will be up to 10 doctors (therefore 10 duplicate Trust names) in the series.

这就是问题所在。熊猫不知道索引重复时使用哪个值。请参见下面的示例。你知道吗

import pandas as pd

s = pd.Series(['cat', 'dog', 'rabbit', 'cat'])
s
## Out
0       cat
1       dog
2    rabbit
3       cat
dtype: object
s2 = pd.Series(['carnivore', 'omnivore', 'herbivore', 'carnivore'])
# Set the value of `s` as the index of `s2`, since map looks at the Series index.
s2.index = s
s2
## Out
cat       carnivore
dog        omnivore
rabbit    herbivore
cat       carnivore
dtype: object

由于在s2的索引中有两个cat的出现,熊猫不知道在将s2映射到s时使用哪一个值(你可以说猫的动物-进食行为有一对二的映射)。因此,现在尝试使用map会抛出InvalidIndexError

s.map(s2)
## Out
                                     -

InvalidIndexError                         Traceback (most recent call last)

<ipython-input-43-1950a0742767> in <module>()
  > 1 s.map(s2)


~/miniconda3/envs/ds/lib/python3.7/site-packages/pandas/core/series.py in map(self, arg, na_action)
   3826         dtype: object
   3827         """
-> 3828         new_values = super()._map_values(arg, na_action=na_action)
   3829         return self._constructor(new_values, index=self.index).__finalize__(self)
   3830 


~/miniconda3/envs/ds/lib/python3.7/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action)
   1275                 values = self.values
   1276 
-> 1277             indexer = mapper.index.get_indexer(values)
   1278             new_values = algorithms.take_1d(mapper._values, indexer)
   1279 


~/miniconda3/envs/ds/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2983         if not self.is_unique:
   2984             raise InvalidIndexError(
-> 2985                 "Reindexing only valid with uniquely" " valued Index objects"
   2986             )
   2987 


InvalidIndexError: Reindexing only valid with uniquely valued Index objects

您需要检查重复的值并决定使用哪一个。你可以这样做:

s2[s2.index.duplicated(keep=False)]
## Out
cat    carnivore
cat    carnivore
dtype: object

在本例中,cat的两个值都是相同的,我们可以去掉其中一个(您的描述表明在您的案例中是相同的)。如果它们不一样,你就得选择保留哪一个。你知道吗

# `~` negates/inverses the indexing
s2 = s2[~s2.index.duplicated()]
s2
## Out
cat       carnivore
dog        omnivore
rabbit    herbivore
dtype: object

s2现在有了动物与喂养行为的一对一映射,我们可以安全地将s2映射到s。你知道吗

s.map(s2)
## Out
0    carnivore
1     omnivore
2    herbivore
3    carnivore
dtype: object

相关问题 更多 >