<p>您可以使用<a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.setdiff1d.html" rel="nofollow noreferrer">^{<cd1>}</a>:</p>
<pre><code>df['A-B']=df.apply(lambda x: ' '.join(np.setdiff1d(x['A'].lower().split(),
x['B'].lower().split())),axis=1)
print(df)
</code></pre>
<hr/>
<pre><code> A B A-B
0 Stack Overlflow is great stack great is overlflow
</code></pre>
<p>您的解决方案就快到了,只需在压缩它们时添加<a href="https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.Series.str.lower.html" rel="nofollow noreferrer">^{<cd2>}</a>:</p>
<pre><code>df['A-B']=[' '.join(set(a.split())-set(b.split()))
for a, b in zip(df['A'].str.lower(), df['B'].str.lower())]
</code></pre>
<hr/>
<p>如果序列中有重复的字符串,请使用<a href="https://docs.python.org/2/library/collections.html#collections.OrderedDict" rel="nofollow noreferrer">^{<cd3>}</a>,这有助于删除重复的字符串<code>set()</code>,但也要保持顺序:</p>
<pre><code>df = pd.DataFrame({'A': ['Stack Overlflow is great is great'], 'B': ['stack great']})
A B
0 Stack Overlflow is great is great stack great
</code></pre>
<hr/>
<pre><code>from collections import OrderedDict
df['A-B']=[' '.join([ele for ele in OrderedDict.fromkeys(a) if ele not in b ])
for a,b in zip(df.A.str.lower().str.split(),df.B.str.lower().str.split())]
print(df)
</code></pre>
<hr/>
<pre><code> A B A-B
0 Stack Overlflow is great is great stack great overlflow is
</code></pre>