<pre><code>import pandas as pd
df = pd.DataFrame([['Apple', 'Med6g7867'],
['Orange', 'Med7g8976'],
['Banana', 'Signal'],
['Peach', 'Med8g8989'],
['Mango', 'Possible result %gggyy']],
columns=['A', 'B'])
df['B'] = df['B'].str.extract(r'(?:^Med.g.{4})|([^%]+)', expand=False)
print(df)
</code></pre>
<p>收益率</p>
^{pr2}$
<hr/>
<p>regex模式具有以下含义:</p>
<pre><code>(?: # start a non-capturing group
^ # match the start of the string
Med # match the literal string Med
. # followed by any character
g # a literal g
.{4} # followed by any 4 characters
) # end the non-capturing group
| # OR
( # start a capturing group
[^%]+ # 1-or-more of any characters except %
) # end capturing group
</code></pre>
<p>如果<code>B</code>列中的值以表单的唯一标识符开头
<code>MedXgXXXX</code>然后将匹配非捕获组。自从<code>str.extract</code>
只返回捕获组的值,返回的<code>Series</code>
<code>str.extract</code>在此位置将有一个<code>NaN</code>。在</p>
<p>如果捕获组匹配,则<code>str.extract</code>将返回
匹配值。在</p>