如何编写相关的REGEX模式来提取python中较大文本字符串的子字符串问题的回答

如何编写相关的REGEX模式来提取python中较大文本字符串的子字符串

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>我有一个dataframe<code>data</code>（带有冗长且不一致的文本字符串注释）和匹配的id。我的目标是使用子字符串的<code>list</code>提取相关的子字符串，并为提取的子字符串创建一个新列。我被告知regex是一个很好的开始，但我还没有想出一个好的模式，可以产生匹配的结果。我希望有人看到这一点，并指导我以正确的方式来解决这个问题。你知道吗</p> <pre><code>list = ['sentara williamsburg regional medical', 'shady grove adventist hospital', 'sibley memorial hospital', 'southern maryland hospital center', 'st. mary`s hospital', 'suburban hospital healthcare system', 'the cancer center at lake manassas', 'ucla medical center', 'united medical center- greater southeast community', 'univ of md charles regional medical ctr', 'university of maryland medical center', 'university of north carolina hospital', 'university of virginia health system', 'unknown facility', 'va medical center', 'virginia hospital center-arlington', 'walter reed army medical center', 'washington adventist hospital', 'washington hospital center', 'wellstar health system, inc', 'winchester medical center'] data: ID Notes 530.0 Cancer is best diag @Wwashington Adventist Hospital 651.0 nan 692.0 GMC-009 can be accessed at ST. Mary`s but not in UCLA Med. Center 993.0 I'm not sure of Sibley; however, Shady Grove Adventist Hosp. is great hospital 044.0 nan 055.0 2015-01-20 was the day she visited WR Army Medical Center in WDC 476.0 nan </code></pre> <p>预期输出-情况真的不重要！你知道吗</p> <pre><code> data_out: ID Notes 530.0 Washington Adventist Hospital 651.0 nan 692.0 ST. Mary`s Hospital, UCLA Medical Center 993.0 Sibley Memorial Hoapital, Shady Grove Adventist Hospital 044.0 nan 055.0 Walter Reed Army Medical Center 476.0 nan </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何编写相关的REGEX模式来提取python中较大文本字符串的子字符串

1 个回答

相关Python问题