<p>我需要确定我的句子中的所有缩写和连字符的单词开始。它们需要在被识别时打印出来。我的代码似乎不能很好地用于此标识</p>
<pre class="lang-py prettyprint-override"><code>import re
sentence_stream2=df1['Open End Text']
for sent in sentence_stream2:
abbs_ = re.findall(r'(?:[A-Z]\.)+', sent) #abbreviations
hypns_= re.findall(r'\w+(?:-\w+)*', sent) #hyphenated words
print("new sentence:")
print(sent)
print(abbs_)
print(hypns_)
</code></pre>
<p>我的语料库中有一句话是:
带API和;使用云数据分析环境自助BI的事件驱动体系结构</p>
<p>其输出为:</p>
<pre><code>new sentence:
DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI
[]
['DevOps', 'with', 'APIs', 'event-driven', 'architecture', 'using', 'cloud', 'Data', 'Analytics', 'environment', 'Self-service', 'BI']
</code></pre>
<p>预期输出为:</p>
<pre><code>new sentence:
DevOps with APIs & event-driven architecture using cloud Data Analytics environment Self-service BI
['APIs','BI']
['event-driven','Self-service']
</code></pre>