我正在尝试对pandas中的groupby
对象应用转换。在
代码如下:
df = pd.DataFrame({
'id':['012', '013', '014', '014', '015', '015', '016', '016', '017', '017'],
'date': pd.to_datetime(
['2008-11-05', 'NaT', 'NaT', '2008-11-05', 'NaT', '2008-11-05',
'NaT', '2008-11-05', 'NaT', '2008-11-05']),
'grade': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan],
'length': [1, 2, 3, 4, 5, 6, 7, 8, np.nan, 10]})
df['uuid'] = np.nan
df
Out[7]:
id date grade length uuid
0 012 2008-11-05 NaN 1.0 NaN
1 013 NaT NaN 2.0 NaN
2 014 NaT NaN 3.0 NaN
3 014 2008-11-05 NaN 4.0 NaN
4 015 NaT NaN 5.0 NaN
5 015 2008-11-05 NaN 6.0 NaN
6 016 NaT NaN 7.0 NaN
7 016 2008-11-05 NaN 8.0 NaN
8 017 NaT NaN NaN NaN
9 017 2008-11-05 NaN 10.0 NaN
In[8]:
df.groupby(['id', 'date']).uuid.transform(lambda g: uuid.uuid4())
Out[9]:
...
...
ValueError: Length mismatch: Expected axis has 5 elements, new values have 10 elements
与this问题类似,我假设问题出在日期列中的NaT
,所以我尝试了df.fillna('nan')
。不幸的是,这抛出了相同的错误-这是因为date列将字符串'nan'
识别为np.nan
?在
我试着用一个字符串'nullv'
填充,结果得到了'ValueError: could not convert string to Timestamp'
。在
所以,我目前的解决方案是:
^{pr2}$当然还有别的方法,而不是转换成字符串再转换回来?在
似乎这是
groupby()
的一个公开问题,我上面介绍的方法确实是目前的解决方法,请参见here。在相关问题 更多 >
编程相关推荐