用于在标点符号之前提取单词的正则表达式

2024-06-16 17:12:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图提取出现在标点符号之前,但在短语中是大写形式的短语。你知道吗

Abstract Algebra. the area of modern mathematics that considers algebraic structures to be sets with operations defined on them, and extends algebraic concepts usually associated with the real number system to other more general systems, such as groups, rings, fields, modules and vector spaces.

Algebra. a branch of mathematics that uses symbols or letters to represent variables, values or numbers, which can then be used to express operations and relationships and to solve equations.

Algebraic Expression. a combination of numbers and letters equivalent to a phrase in language, e.g. x2 + 3x - 4.

Analytic (Cartesian) Geometry: the study of geometry using a coordinate system and the principles of algebra and analysis, thus defining geometrical shapes in a numerical way and extracting numerical information from that representation.

Inductive reasoning or logic: a type of reasoning that involves moving from a set of specific facts to a general conclusion, indicating some degree of support for the conclusion without actually ensuring its truth.

目前我正在使用以下正则表达式:

(([? ])([A-Z][a-z\s]+)?([A-Z][a-z\s]+?[.:]))

我有两个问题。你知道吗

  1. 我认为这不是最好的写作方式。你知道吗
  2. 它不捕捉一个短语中有两个以上单词的单词

Tags: orandofthetothatwithbe
2条回答

不匹配当前数据超过1个单词的一个原因是模式以[? ]开头,它将匹配空格或问号。你知道吗

您还可以省略一些捕获组而使用单个组。请注意,不必使用?使此匹配[a-z\s]+?[.:]非贪婪,因为character类不包含.:

要获取紧跟.:的大写单词,可以使用:

\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)[.:]

解释

  • \b字边界
  • (捕获组1
    • [A-Z][a-z]+
    • (?:\s+[A-Z][a-z]+)*重复0+次匹配A-Z和1+次A-Z
  • )关闭组
  • [.:]匹配.:

Regex demo

如果您还想匹配由()包围的单词,那么您可以使用替换。你知道吗

\b((?:\([A-Z][a-z]+\)|[A-Z][a-z]+)(?:\s+(?:\([A-Z][a-z]+\)|[A-Z][a-z]+))*)[.:]

Regex demo

试试^[A-Z][^.,:';]+

说明:

^-行首

[A-Z]-单个大写字符

[^.,:';]+-与.,:';不同的一个或多个字符

Demo

相关问题 更多 >