回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有一个dataframe<code>data</code>(带有冗长且不一致的文本字符串注释)和匹配的id。我的目标是使用子字符串的<code>list</code>提取相关的子字符串,并为提取的子字符串创建一个新列。我被告知regex是一个很好的开始,但我还没有想出一个好的模式,可以产生匹配的结果。我希望有人看到这一点,并指导我以正确的方式来解决这个问题。你知道吗</p>
<pre><code>list = ['sentara williamsburg regional medical',
'shady grove adventist hospital',
'sibley memorial hospital',
'southern maryland hospital center',
'st. mary`s hospital',
'suburban hospital healthcare system',
'the cancer center at lake manassas',
'ucla medical center',
'united medical center- greater southeast community',
'univ of md charles regional medical ctr',
'university of maryland medical center',
'university of north carolina hospital',
'university of virginia health system',
'unknown facility',
'va medical center',
'virginia hospital center-arlington',
'walter reed army medical center',
'washington adventist hospital',
'washington hospital center',
'wellstar health system, inc',
'winchester medical center']
data:
ID Notes
530.0 Cancer is best diag @Wwashington Adventist Hospital
651.0 nan
692.0 GMC-009 can be accessed at ST. Mary`s but not in UCLA Med. Center
993.0 I'm not sure of Sibley; however, Shady Grove Adventist Hosp. is great hospital
044.0 nan
055.0 2015-01-20 was the day she visited WR Army Medical Center in WDC
476.0 nan
</code></pre>
<p>预期输出-情况真的不重要!你知道吗</p>
<pre><code> data_out:
ID Notes
530.0 Washington Adventist Hospital
651.0 nan
692.0 ST. Mary`s Hospital, UCLA Medical Center
993.0 Sibley Memorial Hoapital, Shady Grove Adventist Hospital
044.0 nan
055.0 Walter Reed Army Medical Center
476.0 nan
</code></pre>