如何删除Python字符串中以大写字母开头的子字符串？

2条回答

网友

1楼 · 编辑于 2024-06-02 06:30:17

你为什么不直接用切片呢

title = text[:44]
print(title)

Read more: Indonesia to Get Moderna Vaccines

网友

2楼 · 编辑于 2024-06-02 06:30:17

您可以通过匹配一系列大写单词和words that can be non-capitalized in titles来匹配标题

^(?:Read\s+more\s*:)?\s*(?:(?:[A-Z]\S*|the|an?|[io]n|at|with(?:out)?|from|for|and|but|n?or|yet|[st]o|around|by|after|along|from|of)\s+)*(?=[A-Z])

详细信息：

^-字符串的开头
(?:Read\s+more\s*:)?-可选的非捕获组匹配Read、一个或多个空格、more、零个或多个空格和:
\s*-零个或多个空格
(?:(?:[A-Z]\S*|the|an?|[io]n|at|with(?:out)?|from|for|and|but|n?or|yet|[st]o|around|by|after|along|from|of)\s+)*-零个或多个
- (?:[A-Z]\S*|the|an?|[io]n|at|with(?:out)?|from|for|and|but|n?or|yet|[st]o|around|by|after|along|from|of)-一个大写的单词，可能包含任何非空白字符或在英文标题中可以保持非大写的单词之一
- \s+-一个或多个空格
(?=[A-Z])-后跟大写字母

注意：您提到您的语言不是英语，所以

您需要找到标题中可能不大写的语言单词列表，并使用它们而不是^(?:Read\s+more\s*:)?\s*(?:(?:[A-Z]\S*|the|an?|[io]n|at|with(?:out)?|from|for|and|but|n?or|yet|[st]o|around|by|after|along|from|of
您可能希望将[A-Z]替换为\p{Lu}以匹配任何Unicode大写字母，将\S*替换为\p{L}*以匹配任何零个或多个Unicode字母，但请确保使用PyPi正则表达式库，因为Python内置的re不支持Unicode类别