擅长:python、mysql、java
<p>考虑<code>df</code>和<code>df2</code></p>
<pre><code>df = pd.DataFrame(dict(
a=['abcd', 'stk', 'shij', 'dfffedeffj', 'abcdefghijk'],
))
df2 = pd.DataFrame(dict(
b=['abc', 'hij', 'def'],
c=[1, 2, 3]
))
</code></pre>
<p>你可以用<code>get_value</code>和<code>set_value</code>产生不错的速度。我将这些值存储在一个数据帧中</p>
<pre><code>density = pd.DataFrame(index=df.index, columns=df2.index)
for i in df.index:
for j in df2.index:
a = df.get_value(i, 'a')
b = df2.get_value(j, 'b')
if a.find(b) >= 0:
density.set_value(i, j, df2.get_value(j, 'c'))
print(density)
0 1 2
0 1 NaN NaN
1 NaN NaN NaN
2 NaN 2 NaN
3 NaN NaN 3
4 1 2 3
</code></pre>
<p>也可以使用复合<code>numpy</code><code>str</code>函数</p>
<pre><code>t = df2.b.apply(lambda x: df.a.str.contains(x)).values
c = df2.c.values[:, None]
density = pd.DataFrame(
np.where(t, np.hstack([c] * t.shape[1]), np.nan).T,
df.index, df2.index)
</code></pre>