<p>可以使用<code>.pivot_table()</code>和<code>aggfunc=</code>简单<code>list</code>:</p>
<pre><code>example_df['combined'] = example_df[['x', 'y']].values.tolist()
example_df = example_df.pivot_table(index=['measurement_id', 'min', 'grp'], columns=['grp2'], values=['combined'], aggfunc=list)
example_df['res'] = example_df.values.tolist()
example_df = example_df.drop(columns=['combined'])
</code></pre>
<p>印刷品:</p>
<pre><code> res
grp2
measurement_id min grp
0 0 A [[[0.9303000896627107, 42.806752849742715], [-...
1 B [[[-18.605643711859955, 117.83261611194004], [...
2 A [[[-7.304055455430749, 18.06452177236371], [-1...
...
</code></pre>
<hr/>
<p>使用<code>timeit</code>进行基准测试:</p>
<pre><code>example_df = pd.DataFrame({'measurement_id': np.concatenate([[0] * 300, [1] * 300]),
'min': np.concatenate([np.repeat(range(0, 30), 10),
np.repeat(range(0, 30), 10)]),
'grp': list(np.repeat(['A', 'B'], 10)) * 30,
'grp2': list(np.random.choice([0, 1, 2], 10)) * 60,
'obj': np.array(list(range(0, 10)) * 60),
'x': np.random.normal(0.0, 10.0, 600),
'y': np.random.normal(50.0, 40.0, 600)})
def get_df():
return example_df.copy()
def solution_1():
def df_to_points(df):
points = []
for index, row in df.iterrows():
points.append(tuple(row))
return(points)
example_df = get_df()
res = example_df \
.groupby(['measurement_id', 'min', 'grp']) \
.apply(lambda x: [df_to_points(g[['x', 'y']]) for _, g in x.groupby('grp2')])
return res
def solution_2():
example_df = get_df()
example_df['combined'] = example_df[['x', 'y']].values.tolist()
example_df = example_df.pivot_table(index=['measurement_id', 'min', 'grp'], columns=['grp2'], values=['combined'], aggfunc=list)
example_df['res'] = example_df.values.tolist()
example_df = example_df.drop(columns=['combined'])
return example_df
t1 = timeit(lambda: solution_1(), number=100)
t2 = timeit(lambda: solution_2(), number=100)
print(t1)
print(t2)
</code></pre>
<p>印刷品:</p>
<pre><code>21.74300919502275
3.124330924008973
</code></pre>
<hr/>
<p>编辑:通过更新问题,您可以执行以下操作:</p>
<pre><code>example_df['combined'] = example_df[['x', 'y']].values.tolist()
example_df = example_df.pivot_table(index=['measurement_id', 'min', 'grp'], columns=['grp2'], values=['combined'], aggfunc=list)
example_df.apply(lambda x: list(x[x.notna()]), axis=1)
</code></pre>
<p>基准:</p>
<pre><code>from timeit import timeit
example_df = pd.DataFrame({'measurement_id': np.concatenate([[0] * 300, [1] * 300]),
'min': np.concatenate([np.repeat(range(0, 30), 10),
np.repeat(range(0, 30), 10)]),
'grp': list(np.repeat(['A', 'B'], 5)) * 60,
'grp2': list(np.random.choice([0, 1, 2], 10)) * 60,
'obj': np.array(list(range(0, 10)) * 60),
'x': np.random.normal(0.0, 10.0, 600),
'y': np.random.normal(50.0, 40.0, 600)})
example_df = pd.DataFrame({'measurement_id': np.concatenate([[0] * 300, [1] * 300]),
'min': np.concatenate([np.repeat(range(0, 30), 10),
np.repeat(range(0, 30), 10)]),
'grp': list(np.repeat(['A', 'B', 'C'], [4, 4, 2])) * 60,
'grp2': list(np.random.choice([0, 1, 2], 10)) * 60,
'obj': np.array(list(range(0, 10)) * 60),
'x': np.random.normal(0.0, 10.0, 600),
'y': np.random.normal(50.0, 40.0, 600)})
def get_df():
return example_df.copy()
def solution_1():
def df_to_points(df):
points = []
for index, row in df.iterrows():
points.append(tuple(row))
return(points)
example_df = get_df()
res = example_df \
.groupby(['measurement_id', 'min', 'grp']) \
.apply(lambda x: [df_to_points(g[['x', 'y']]) for _, g in x.groupby('grp2')])
return res
def solution_2():
example_df = get_df()
example_df['combined'] = example_df[['x', 'y']].values.tolist()
example_df = example_df.pivot_table(index=['measurement_id', 'min', 'grp'], columns=['grp2'], values=['combined'], aggfunc=list)
return example_df.apply(lambda x: list(x[x.notna()]), axis=1)
t1 = timeit(lambda: solution_1(), number=100)
t2 = timeit(lambda: solution_2(), number=100)
print(t1)
print(t2)
</code></pre>
<p>印刷品:</p>
<pre><code>45.391786905995104
13.506823723029811
</code></pre>