<p>编辑:</p>
<pre><code>#get set per groups by static and language
a = df.groupby(["static",'language'])['keys'].apply(set).reset_index()
#filter only en language per group by static and create set
b = df[df['language'] == 'en'].groupby("static")['keys'].apply(set)
#subtract mapped set b and join
c = (a['static'].map(b) - a['keys']).str.join(', ').rename('Keys')
#substract lengths
m = (a['static'].map(b).str.len() - a['keys'].str.len()).rename('Missing')
df = pd.concat([a[['static','language']], m, c], axis=1)
print (df)
static language Missing Keys
0 x de 0
1 x en 0
2 x nl 1 key_3
3 x ua 2 key_3, key_2
</code></pre>
<p>编辑:</p>
<p>我尝试更改数据:</p>
<pre><code>rows = [
['x', 'en', 'key_1', 'value_en_1'],
['x', 'en', 'key_2', 'value_en_2'],
['x', 'en', 'key_3', 'value_en_3'],
['x', 'de', 'key_1', 'value_de_1'],
['x', 'de', 'key_2', 'value_de_2'],
['x', 'de', 'key_3', 'value_de_3'],
['x', 'nl', 'key_1', 'value_nl_1'],
['x', 'nl', 'key_2', 'value_nl_2'],
['x', 'ua', 'key_1', 'value_en_1'],
['y', 'en', 'key_1', 'value_en_1'],
['y', 'en', 'key_2', 'value_en_2'],
['y', 'de', 'key_4', 'value_en_3'],
['y', 'de', 'key_1', 'value_de_1'],
['y', 'de', 'key_2', 'value_de_2'],
['y', 'de', 'key_3', 'value_de_3'],
['y', 'de', 'key_5', 'value_nl_1'],
['y', 'nl', 'key_2', 'value_nl_2'],
['y', 'ua', 'key_1', 'value_en_1']
]
# create DataFrame out of rows of data
df = pd.DataFrame(rows, columns=["static", "language", "keys", "values"])
# print out DataFrame
#print(df)
</code></pre>
<p>输出为:</p>
<pre><code>print (df)
static language Missing Keys
0 x de 0
1 x en 0
2 x nl 1 key_3
3 x ua 2 key_3, key_2
4 y de -3
5 y en 0
6 y nl 1 key_1
7 y ua 1 key_2
</code></pre>
<p>问题是对于<code>de</code>对于<code>y</code>静态,在en语言中有更多的键。你知道吗</p>