重新拆分特殊情况以拆分逗号分隔的字符串

2024-06-02 07:11:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用python重新拆分()将一个句子用逗号分隔成多个字符串,但我不希望应用于用逗号分隔的单个单词,例如:

示例

s = "Yes, alcohol can have a place in a healthy diet."
desired result = ["Yes, alcohol can have a place in a healthy diet."]

另一个例子:

s = "But, of course, excess alcohol is terribly harmful to health in a variety of ways, and even moderatealcohol intake is associated with an increase in the number two cause of premature death: cancer."
desired output = ["But, of course" , "excess alcohol is terribly harmful to health in a variety of ways" , "and even moderatealcohol intake is associated with an increase in the number two cause of premature death: cancer."] 

有什么建议吗?拜托。你知道吗


Tags: ofinishaveplacecanyesbut
1条回答
网友
1楼 · 发布于 2024-06-02 07:11:17

由于Python不支持regex中的可变长度lookbehind assertions,因此我将使用re.findall()

In [3]: re.findall(r"\s*((?:\w+,)?[^,]+)",s)
Out[3]:
['But, of course',
 'excess alcohol is terribly harmful to health in a variety of ways',
 'and even moderatealcohol intake is associated with an increase in the number two cause of premature death: cancer.']

说明:

\s*        # Match optional leading whitespace, don't capture that
(          # Capture in group 1:
 (?:\w+,)? #  optionally: A single "word", followed by a comma 
 [^,]+     #  and/or one or more characters except commas
)          # End of group 1

相关问题 更多 >