pandas 连接/合并 '重建索引仅在唯一值索引有效
我在进行连接操作时遇到了一个很棘手的错误。我尝试用merge(left_index, right_index)来解决,但结果还是一样。
这两个索引是完全相同的(这是故意设计的),我通过index.is_unique(返回TRUE)和index.get_duplicates()(返回EMPTY)检查了这两个索引。
基本版本:
df1.join(series)
merge(df1, series_as_df,
print tempres.index
<class 'pandas.tseries.index.DatetimeIndex'> [2013-01-14 17:04:45, ..., 2013-01-14 16:53:05] 长度:89,频率:无,时区:无
奇怪的是,打印出来的值是这样的: print tempres.index.values [1970-01-16 121:04:45 1970-01-16 121:04:35 1970-01-16 121:04:25 1970-01-16 121:04:15 1970-01-16 121:04:05 1970-01-16 121:03:55 1970-01-16 121:03:45 1970-01-16 121:03:35 1970-01-16 121:03:25 1970-01-16 121:03:15 1970-01-16 121:03:05 1970-01-16 121:02:55 1970-01-16 121:02:45 1970-01-16 121:02:35 1970-01-16 121:02:25 1970-01-16 121:02:15 1970-01-16 121:02:05 1970-01-16 121:01:55 1970-01-16 121:01:45 1970-01-16 121:01:35 1970-01-16 121:01:25 ...]
如果需要的话,我可以提供序列和数据框的pickle文件...
我使用的是最新的pandas版本0.10.x
谢谢,
Luc
我的代码(从更大的代码中剪切出来)
XYTparams (existing dataframe)
prep_functions[funcname] = [list of values, same length as XYTparams]
iSeries = Series(prep_functions[funcname], index = XYTparams.index, name = funcname)
XYTparams = XYTparams.join(iSeries)
回顾我的问题:
我在一个基本的数据框上连续使用merge和join。在某个时刻,我开始在尝试下一个merge/join时遇到错误。我无法在简单的测试中重现这个问题,但我在问题开始之前保存了数据框。
我找不到问题出在哪里。
base_df = load('SPOparams.pic')
lookup_df = load('lookup.pic')
print base_df
print lookup_df
print base_df.count()
print base_df['VKCSKEY1']
print lookup_df['traf_key']
# reset index does not change a thing
base_df = base_df.reset_index(drop=True)
print base_df.index
print base_df.index.get_duplicates()
print lookup_df.index
print lookup_df.index.get_duplicates()
# checking value matches
for k in lookup_df['traf_key']:
print k, k in base_df['VKCSKEY1'].values
# why does this merge is unsuccesfull ???
# in any combination of the parameters
df_result =merge(base_df, lookup_df,
how='left',
#how = 'outer',
left_on ='VKCSKEY1',
right_on ='traf_key',
#left_index=True,
#right_index = True,
#sort=True,
#suffixes=('', '.m'), copy=True
)
print df_result
输出:
1.6.1
0.10.1
<class 'pandas.core.frame.DataFrame'>
Int64Index: 89 entries, 0 to 88
Data columns:
T 89 non-null values
X 89 non-null values
Y 89 non-null values
precip_quantity_1hour 89 non-null values
pressure 89 non-null values
rel_humidity 89 non-null values
temp 89 non-null values
temp_max 0 non-null values
temp_min 0 non-null values
wind_direction 89 non-null values
wind_speed 89 non-null values
BC_TRAF 89 non-null values
closest 89 non-null values
closest.m 89 non-null values
AGGP.P50_ID 89 non-null values
AGGP.FUNC_CLASS 89 non-null values
AGGP.SPEED_CAT 89 non-null values
LINK_ID 89 non-null values
FUNC_CLASS 89 non-null values
SPEED_CAT 89 non-null values
AR_AUTO 89 non-null values
AR_BUS 89 non-null values
AR_TAXIS 89 non-null values
AR_CARPOOL 89 non-null values
AR_PEDEST 89 non-null values
AR_TRUCKS 89 non-null values
STCA20_PCT 89 non-null values
VKC_LINKNR 89 non-null values
TRVIC150R1 89 non-null values
closest.m 89 non-null values
closest.m.m 89 non-null values
VKCP.LINK_ID 89 non-null values
VKCP.FUNC_CLASS 89 non-null values
VKCP.SPEED 89 non-null values
VKCP.LINKNR 89 non-null values
VKCP.TWIN_ID 89 non-null values
VKCSKEY1 89 non-null values
dtypes: datetime64[ns](1), float64(13), int64(9), object(14)
<class 'pandas.core.frame.DataFrame'>
Index: 30 entries, (60744, 0) to (58314, 0)
Data columns:
traf_key 30 non-null values
weekday_nr 30 non-null values
linknr 30 non-null values
weekday 30 non-null values
vr0 30 non-null values
vr1 30 non-null values
vr2 30 non-null values
vr3 30 non-null values
vr4 30 non-null values
vr5 30 non-null values
vr6 30 non-null values
vr7 30 non-null values
vr8 30 non-null values
vr9 30 non-null values
vr10 30 non-null values
vr11 30 non-null values
vr12 30 non-null values
vr13 30 non-null values
vr14 30 non-null values
vr15 30 non-null values
vr16 30 non-null values
vr17 30 non-null values
vr18 30 non-null values
vr19 30 non-null values
vr20 30 non-null values
vr21 30 non-null values
vr22 30 non-null values
vr23 30 non-null values
au0 30 non-null values
au1 30 non-null values
au2 30 non-null values
au3 30 non-null values
au4 30 non-null values
au5 30 non-null values
au6 30 non-null values
au7 30 non-null values
au8 30 non-null values
au9 30 non-null values
au10 30 non-null values
au11 30 non-null values
au12 30 non-null values
au13 30 non-null values
au14 30 non-null values
au15 30 non-null values
au16 30 non-null values
au17 30 non-null values
au18 30 non-null values
au19 30 non-null values
au20 30 non-null values
au21 30 non-null values
au22 30 non-null values
au23 30 non-null values
sn0 30 non-null values
sn1 30 non-null values
sn2 30 non-null values
sn3 30 non-null values
sn4 30 non-null values
sn5 30 non-null values
sn6 30 non-null values
sn7 30 non-null values
sn8 30 non-null values
sn9 30 non-null values
sn10 30 non-null values
sn11 30 non-null values
sn12 30 non-null values
sn13 30 non-null values
sn14 30 non-null values
sn15 30 non-null values
sn16 30 non-null values
sn17 30 non-null values
sn18 30 non-null values
sn19 30 non-null values
sn20 30 non-null values
sn21 30 non-null values
sn22 30 non-null values
sn23 30 non-null values
dtypes: float64(24), int64(50), object(2)
T 89
X 89
Y 89
precip_quantity_1hour 89
pressure 89
rel_humidity 89
temp 89
temp_max 0
temp_min 0
wind_direction 89
wind_speed 89
BC_TRAF 89
closest 89
closest.m 89
AGGP.P50_ID 89
AGGP.FUNC_CLASS 89
AGGP.SPEED_CAT 89
LINK_ID 89
FUNC_CLASS 89
SPEED_CAT 89
AR_AUTO 89
AR_BUS 89
AR_TAXIS 89
AR_CARPOOL 89
AR_PEDEST 89
AR_TRUCKS 89
STCA20_PCT 89
VKC_LINKNR 89
TRVIC150R1 89
closest.m 89
closest.m.m 89
VKCP.LINK_ID 89
VKCP.FUNC_CLASS 89
VKCP.SPEED 89
VKCP.LINKNR 89
VKCP.TWIN_ID 89
VKCSKEY1 89
0 (60744, 0)
1 (60744, 0)
2 (60744, 0)
3 (60750, 0)
4 (60768, 0)
5 (60768, 0)
6 (60758, 0)
7 (60758, 0)
8 (69223, 0)
9 (69223, 0)
10 (69223, 0)
11 (64265, 0)
12 (64265, 0)
13 (64265, 0)
14 (64265, 0)
15 (64265, 0)
16 (64265, 0)
17 (64265, 0)
18 (64265, 0)
19 (64265, 0)
20 (64216, 0)
21 (64216, 0)
22 (64216, 0)
23 (64216, 0)
24 (64216, 0)
25 (64216, 0)
26 (64216, 0)
27 (64216, 0)
28 (64216, 0)
29 (57085, 0)
30 (57085, 0)
31 (57085, 0)
32 (57085, 0)
33 (57085, 0)
34 (57085, 0)
35 (57014, 0)
36 (57033, 0)
37 (57033, 0)
38 (64065, 0)
39 (64065, 0)
40 (64065, 0)
41 (64065, 0)
42 (64065, 0)
43 (57070, 0)
44 (64062, 0)
45 (64062, 0)
46 (64062, 0)
47 (64062, 0)
48 (57070, 0)
49 (64061, 0)
50 (64061, 0)
51 (64061, 0)
52 (64061, 0)
53 (59849, 0)
54 (59415, 0)
55 (58487, 0)
56 (58054, 0)
57 (58054, 0)
58 (58054, 0)
59 (52551, 0)
60 (58054, 0)
61 (58054, 0)
62 (58054, 0)
63 (58054, 0)
64 (52551, 0)
65 (58054, 0)
66 (58488, 0)
67 (58488, 0)
68 (58028, 0)
69 (58464, 0)
70 (58028, 0)
71 (57989, 0)
72 (58595, 0)
73 (58027, 0)
74 (57989, 0)
75 (58595, 0)
76 (58595, 0)
77 (58019, 0)
78 (58595, 0)
79 (58595, 0)
80 (58019, 0)
81 (58595, 0)
82 (58595, 0)
83 (66715, 0)
84 (58595, 0)
85 (59295, 0)
86 (67614, 0)
87 (58314, 0)
88 (58314, 0)
Name: VKCSKEY1, Length: 89
VKCSKEY1
(60744, 0) (60744, 0)
(60750, 0) (60750, 0)
(60768, 0) (60768, 0)
(60758, 0) (60758, 0)
(69223, 0) (69223, 0)
(64265, 0) (64265, 0)
(64216, 0) (64216, 0)
(57085, 0) (57085, 0)
(57014, 0) (57014, 0)
(57033, 0) (57033, 0)
(64065, 0) (64065, 0)
(57070, 0) (57070, 0)
(64062, 0) (64062, 0)
(64061, 0) (64061, 0)
(59849, 0) (59849, 0)
(59415, 0) (59415, 0)
(58487, 0) (58487, 0)
(58054, 0) (58054, 0)
(52551, 0) (52551, 0)
(58488, 0) (58488, 0)
(58028, 0) (58028, 0)
(58464, 0) (58464, 0)
(57989, 0) (57989, 0)
(58595, 0) (58595, 0)
(58027, 0) (58027, 0)
(58019, 0) (58019, 0)
(66715, 0) (66715, 0)
(59295, 0) (59295, 0)
(67614, 0) (67614, 0)
(58314, 0) (58314, 0)
Name: traf_key
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88], dtype=int64)
[]
Index([(60744, 0), (60750, 0), (60768, 0), (60758, 0), (69223, 0), (64265, 0), (64216, 0), (57085, 0), (57014, 0), (57033, 0), (64065, 0), (57070, 0), (64062, 0), (64061, 0), (59849, 0), (59415, 0), (58487, 0), (58054, 0), (52551, 0), (58488, 0), (58028, 0), (58464, 0), (57989, 0), (58595, 0), (58027, 0), (58019, 0), (66715, 0), (59295, 0), (67614, 0), (58314, 0)], dtype=object)
[]
(60744, 0) True
(60750, 0) True
(60768, 0) True
(60758, 0) True
(69223, 0) True
(64265, 0) True
(64216, 0) True
(57085, 0) True
(57014, 0) True
(57033, 0) True
(64065, 0) True
(57070, 0) True
(64062, 0) True
(64061, 0) True
(59849, 0) True
(59415, 0) True
(58487, 0) True
(58054, 0) True
(52551, 0) True
(58488, 0) True
(58028, 0) True
(58464, 0) True
(57989, 0) True
(58595, 0) True
(58027, 0) True
(58019, 0) True
(66715, 0) True
(59295, 0) True
(67614, 0) True
(58314, 0) True
Traceback (most recent call last):
File "L:\temp\pandas_join_bug.py", line 43, in <module>
right_on ='traf_key',
File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 36, in merge
return op.get_result()
File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 185, in get_result
ldata, rdata = self._get_merge_data()
File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 277, in _get_merge_data
copydata=False)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 1194, in _maybe_rename_join
to_rename = self.items.intersection(other.items)
File "C:\Python27\lib\site-packages\pandas\core\index.py", line 666, in intersection
indexer = self.get_indexer(other.values)
File "C:\Python27\lib\site-packages\pandas\core\index.py", line 812, in get_indexer
raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
一旦出现错误,我就无法成功执行任何merge或join操作。起初我没有意识到这个错误与重复的merge/join操作有关。现在,最新的一组操作中的任何单个merge/join都能正常工作。但只要我需要另一个merge/join,就会出现同样的错误。
1 个回答
重复的列名会导致这个错误,试着去掉重复的列名。