pandas 连接/合并 '重建索引仅在唯一值索引有效

3 投票
1 回答
9879 浏览
提问于 2025-04-17 18:46

我在进行连接操作时遇到了一个很棘手的错误。我尝试用merge(left_index, right_index)来解决,但结果还是一样。

这两个索引是完全相同的(这是故意设计的),我通过index.is_unique(返回TRUE)和index.get_duplicates()(返回EMPTY)检查了这两个索引。

基本版本:

df1.join(series)
merge(df1, series_as_df, 

print tempres.index

<class 'pandas.tseries.index.DatetimeIndex'> [2013-01-14 17:04:45, ..., 2013-01-14 16:53:05] 长度:89,频率:无,时区:无


奇怪的是,打印出来的值是这样的: print tempres.index.values [1970-01-16 121:04:45 1970-01-16 121:04:35 1970-01-16 121:04:25 1970-01-16 121:04:15 1970-01-16 121:04:05 1970-01-16 121:03:55 1970-01-16 121:03:45 1970-01-16 121:03:35 1970-01-16 121:03:25 1970-01-16 121:03:15 1970-01-16 121:03:05 1970-01-16 121:02:55 1970-01-16 121:02:45 1970-01-16 121:02:35 1970-01-16 121:02:25 1970-01-16 121:02:15 1970-01-16 121:02:05 1970-01-16 121:01:55 1970-01-16 121:01:45 1970-01-16 121:01:35 1970-01-16 121:01:25 ...]

如果需要的话,我可以提供序列和数据框的pickle文件...

我使用的是最新的pandas版本0.10.x

谢谢,

Luc

我的代码(从更大的代码中剪切出来)

XYTparams (existing dataframe)
prep_functions[funcname] = [list of values, same length as XYTparams]

iSeries = Series(prep_functions[funcname], index = XYTparams.index, name = funcname)
XYTparams = XYTparams.join(iSeries)

回顾我的问题:

我在一个基本的数据框上连续使用merge和join。在某个时刻,我开始在尝试下一个merge/join时遇到错误。我无法在简单的测试中重现这个问题,但我在问题开始之前保存了数据框。

我找不到问题出在哪里。

base_df = load('SPOparams.pic')
lookup_df = load('lookup.pic')

print base_df
print lookup_df

print base_df.count()

print base_df['VKCSKEY1']
print lookup_df['traf_key']

# reset index does not change a thing
base_df = base_df.reset_index(drop=True)

print base_df.index
print base_df.index.get_duplicates()
print lookup_df.index
print lookup_df.index.get_duplicates()


# checking value matches
for k in lookup_df['traf_key']:
    print k, k in  base_df['VKCSKEY1'].values

# why does this merge is unsuccesfull ???
# in any combination of the parameters
df_result =merge(base_df, lookup_df, 
             how='left', 
             #how = 'outer',
             left_on ='VKCSKEY1', 
             right_on ='traf_key',
             #left_index=True, 
             #right_index = True,
             #sort=True, 
             #suffixes=('', '.m'), copy=True
             )
print df_result

输出:

1.6.1
0.10.1
<class 'pandas.core.frame.DataFrame'>
Int64Index: 89 entries, 0 to 88
Data columns:
T                        89  non-null values
X                        89  non-null values
Y                        89  non-null values
precip_quantity_1hour    89  non-null values
pressure                 89  non-null values
rel_humidity             89  non-null values
temp                     89  non-null values
temp_max                 0  non-null values
temp_min                 0  non-null values
wind_direction           89  non-null values
wind_speed               89  non-null values
BC_TRAF                  89  non-null values
closest                  89  non-null values
closest.m                89  non-null values
AGGP.P50_ID              89  non-null values
AGGP.FUNC_CLASS          89  non-null values
AGGP.SPEED_CAT           89  non-null values
LINK_ID                  89  non-null values
FUNC_CLASS               89  non-null values
SPEED_CAT                89  non-null values
AR_AUTO                  89  non-null values
AR_BUS                   89  non-null values
AR_TAXIS                 89  non-null values
AR_CARPOOL               89  non-null values
AR_PEDEST                89  non-null values
AR_TRUCKS                89  non-null values
STCA20_PCT               89  non-null values
VKC_LINKNR               89  non-null values
TRVIC150R1               89  non-null values
closest.m                89  non-null values
closest.m.m              89  non-null values
VKCP.LINK_ID             89  non-null values
VKCP.FUNC_CLASS          89  non-null values
VKCP.SPEED               89  non-null values
VKCP.LINKNR              89  non-null values
VKCP.TWIN_ID             89  non-null values
VKCSKEY1                 89  non-null values
dtypes: datetime64[ns](1), float64(13), int64(9), object(14)
<class 'pandas.core.frame.DataFrame'>
Index: 30 entries, (60744, 0) to (58314, 0)
Data columns:
traf_key      30  non-null values
weekday_nr    30  non-null values
linknr        30  non-null values
 weekday      30  non-null values
vr0           30  non-null values
vr1           30  non-null values
vr2           30  non-null values
vr3           30  non-null values
vr4           30  non-null values
vr5           30  non-null values
vr6           30  non-null values
vr7           30  non-null values
vr8           30  non-null values
vr9           30  non-null values
vr10          30  non-null values
vr11          30  non-null values
vr12          30  non-null values
vr13          30  non-null values
vr14          30  non-null values
vr15          30  non-null values
vr16          30  non-null values
vr17          30  non-null values
vr18          30  non-null values
vr19          30  non-null values
vr20          30  non-null values
vr21          30  non-null values
vr22          30  non-null values
vr23          30  non-null values
au0           30  non-null values
au1           30  non-null values
au2           30  non-null values
au3           30  non-null values
au4           30  non-null values
au5           30  non-null values
au6           30  non-null values
au7           30  non-null values
au8           30  non-null values
au9           30  non-null values
au10          30  non-null values
au11          30  non-null values
au12          30  non-null values
au13          30  non-null values
au14          30  non-null values
au15          30  non-null values
au16          30  non-null values
au17          30  non-null values
au18          30  non-null values
au19          30  non-null values
au20          30  non-null values
au21          30  non-null values
au22          30  non-null values
au23          30  non-null values
sn0           30  non-null values
sn1           30  non-null values
sn2           30  non-null values
sn3           30  non-null values
sn4           30  non-null values
sn5           30  non-null values
sn6           30  non-null values
sn7           30  non-null values
sn8           30  non-null values
sn9           30  non-null values
sn10          30  non-null values
sn11          30  non-null values
sn12          30  non-null values
sn13          30  non-null values
sn14          30  non-null values
sn15          30  non-null values
sn16          30  non-null values
sn17          30  non-null values
sn18          30  non-null values
sn19          30  non-null values
sn20          30  non-null values
sn21          30  non-null values
sn22          30  non-null values
sn23          30  non-null values
dtypes: float64(24), int64(50), object(2)
T                        89
X                        89
Y                        89
precip_quantity_1hour    89
pressure                 89
rel_humidity             89
temp                     89
temp_max                  0
temp_min                  0
wind_direction           89
wind_speed               89
BC_TRAF                  89
closest                  89
closest.m                89
AGGP.P50_ID              89
AGGP.FUNC_CLASS          89
AGGP.SPEED_CAT           89
LINK_ID                  89
FUNC_CLASS               89
SPEED_CAT                89
AR_AUTO                  89
AR_BUS                   89
AR_TAXIS                 89
AR_CARPOOL               89
AR_PEDEST                89
AR_TRUCKS                89
STCA20_PCT               89
VKC_LINKNR               89
TRVIC150R1               89
closest.m                89
closest.m.m              89
VKCP.LINK_ID             89
VKCP.FUNC_CLASS          89
VKCP.SPEED               89
VKCP.LINKNR              89
VKCP.TWIN_ID             89
VKCSKEY1                 89
0     (60744, 0)
1     (60744, 0)
2     (60744, 0)
3     (60750, 0)
4     (60768, 0)
5     (60768, 0)
6     (60758, 0)
7     (60758, 0)
8     (69223, 0)
9     (69223, 0)
10    (69223, 0)
11    (64265, 0)
12    (64265, 0)
13    (64265, 0)
14    (64265, 0)
15    (64265, 0)
16    (64265, 0)
17    (64265, 0)
18    (64265, 0)
19    (64265, 0)
20    (64216, 0)
21    (64216, 0)
22    (64216, 0)
23    (64216, 0)
24    (64216, 0)
25    (64216, 0)
26    (64216, 0)
27    (64216, 0)
28    (64216, 0)
29    (57085, 0)
30    (57085, 0)
31    (57085, 0)
32    (57085, 0)
33    (57085, 0)
34    (57085, 0)
35    (57014, 0)
36    (57033, 0)
37    (57033, 0)
38    (64065, 0)
39    (64065, 0)
40    (64065, 0)
41    (64065, 0)
42    (64065, 0)
43    (57070, 0)
44    (64062, 0)
45    (64062, 0)
46    (64062, 0)
47    (64062, 0)
48    (57070, 0)
49    (64061, 0)
50    (64061, 0)
51    (64061, 0)
52    (64061, 0)
53    (59849, 0)
54    (59415, 0)
55    (58487, 0)
56    (58054, 0)
57    (58054, 0)
58    (58054, 0)
59    (52551, 0)
60    (58054, 0)
61    (58054, 0)
62    (58054, 0)
63    (58054, 0)
64    (52551, 0)
65    (58054, 0)
66    (58488, 0)
67    (58488, 0)
68    (58028, 0)
69    (58464, 0)
70    (58028, 0)
71    (57989, 0)
72    (58595, 0)
73    (58027, 0)
74    (57989, 0)
75    (58595, 0)
76    (58595, 0)
77    (58019, 0)
78    (58595, 0)
79    (58595, 0)
80    (58019, 0)
81    (58595, 0)
82    (58595, 0)
83    (66715, 0)
84    (58595, 0)
85    (59295, 0)
86    (67614, 0)
87    (58314, 0)
88    (58314, 0)
Name: VKCSKEY1, Length: 89
VKCSKEY1
(60744, 0)    (60744, 0)
(60750, 0)    (60750, 0)
(60768, 0)    (60768, 0)
(60758, 0)    (60758, 0)
(69223, 0)    (69223, 0)
(64265, 0)    (64265, 0)
(64216, 0)    (64216, 0)
(57085, 0)    (57085, 0)
(57014, 0)    (57014, 0)
(57033, 0)    (57033, 0)
(64065, 0)    (64065, 0)
(57070, 0)    (57070, 0)
(64062, 0)    (64062, 0)
(64061, 0)    (64061, 0)
(59849, 0)    (59849, 0)
(59415, 0)    (59415, 0)
(58487, 0)    (58487, 0)
(58054, 0)    (58054, 0)
(52551, 0)    (52551, 0)
(58488, 0)    (58488, 0)
(58028, 0)    (58028, 0)
(58464, 0)    (58464, 0)
(57989, 0)    (57989, 0)
(58595, 0)    (58595, 0)
(58027, 0)    (58027, 0)
(58019, 0)    (58019, 0)
(66715, 0)    (66715, 0)
(59295, 0)    (59295, 0)
(67614, 0)    (67614, 0)
(58314, 0)    (58314, 0)
Name: traf_key
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88], dtype=int64)
[]
Index([(60744, 0), (60750, 0), (60768, 0), (60758, 0), (69223, 0), (64265, 0), (64216, 0), (57085, 0), (57014, 0), (57033, 0), (64065, 0), (57070, 0), (64062, 0), (64061, 0), (59849, 0), (59415, 0), (58487, 0), (58054, 0), (52551, 0), (58488, 0), (58028, 0), (58464, 0), (57989, 0), (58595, 0), (58027, 0), (58019, 0), (66715, 0), (59295, 0), (67614, 0), (58314, 0)], dtype=object)
[]
(60744, 0) True
(60750, 0) True
(60768, 0) True
(60758, 0) True
(69223, 0) True
(64265, 0) True
(64216, 0) True
(57085, 0) True
(57014, 0) True
(57033, 0) True
(64065, 0) True
(57070, 0) True
(64062, 0) True
(64061, 0) True
(59849, 0) True
(59415, 0) True
(58487, 0) True
(58054, 0) True
(52551, 0) True
(58488, 0) True
(58028, 0) True
(58464, 0) True
(57989, 0) True
(58595, 0) True
(58027, 0) True
(58019, 0) True
(66715, 0) True
(59295, 0) True
(67614, 0) True
(58314, 0) True
Traceback (most recent call last):
  File "L:\temp\pandas_join_bug.py", line 43, in <module>
    right_on ='traf_key',
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 36, in merge
    return op.get_result()
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 185, in get_result
    ldata, rdata = self._get_merge_data()
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 277, in _get_merge_data
    copydata=False)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 1194, in _maybe_rename_join
    to_rename = self.items.intersection(other.items)
  File "C:\Python27\lib\site-packages\pandas\core\index.py", line 666, in intersection
    indexer = self.get_indexer(other.values)
  File "C:\Python27\lib\site-packages\pandas\core\index.py", line 812, in get_indexer
    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

一旦出现错误,我就无法成功执行任何merge或join操作。起初我没有意识到这个错误与重复的merge/join操作有关。现在,最新的一组操作中的任何单个merge/join都能正常工作。但只要我需要另一个merge/join,就会出现同样的错误。

1 个回答

16

重复的列名会导致这个错误,试着去掉重复的列名。

撰写回答