返回带有|(pipe)特殊字符的单字名称的正则表达式是什么

2024-06-09 08:49:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样的话

John | Gilbert | alan
Stephen | king | harris
| | Steve
Barack | | Obama
Tom | George | Stevenson 
Donald | | 
 | Alan | 
Sir | Alex | 
Stewart | | 
John | new | man

我想返回单字名称如下

Steve
Alan
Stewart

我试过了

Name = re.search('\| (.*)',name)

上面的代码将重新返回所有代码


Tags: 代码johnstevetomalanharrisstephengeorge
3条回答

对现有正则表达式模式进行简单的修改即可:

>>> name = """
|| John Deere
|| Stephen king
|| Steve
|| Barack Hussein Obama
|| Donald Trump 
|| Alan
|| Stewart"""
>>> re.findall('\| ([^\s]*)(?:\n|$)', name)
['Steve', 'Alan', 'Stewart']

您可以在输入字符串中使用re.findall查找所有匹配项

编辑:对于在名称之间包含|的已编辑输入,这可以:

>>> name = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump 
| | Alan
| | Stewart"""
>>> re.findall('^[|\W]*([^\s]+)(?:\n|$)', name, re.MULTILINE)
['Steve', 'Alan', 'Stewart']

使用

(?m)^(?:\|[^\S\n]*)*(\S+)[^\S\n]*$

proof

解释

--------------------------------------------------------------------------------
  (?m)                     multiline mode (= re.M / re.MULTILINE)
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \|                       '|'
--------------------------------------------------------------------------------
    [^\S\n]*                 any character except: non-whitespace
                             (all but \n, \r, \t, \f, and " "), '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  [^\S\n]*                 any character except: non-whitespace (all
                           but \n, \r, \t, \f, and " "), '\n'
                           (newline) (0 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Python code:

import re
string = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump 
| | Alan
| | Stewart"""
pattern = r"^(?:\|[^\S\n]*)*(\S+)[^\S\n]*$"
print(re.findall(pattern, string, re.M))

结果['Steve', 'Alan', 'Stewart']

您可以尝试将re.findall与模式(?:(?<=\n)|(?<=^))\|\s*\|\s*(\S+)(?:\n|$)一起使用,该模式只能找到单个单词名:

inp = """| John | Gilbert | alan
| Stephen | king | harris
| | Steve
| Barack | | Obama
|| Donald | | Trump 
| | Alan
| | Stewart"""

single_names = re.findall(r'(?:(?<=\n)|(?<=^))\|\s*\|\s*(\S+)(?:\n|$)', inp)
print(single_names)

这张照片是:

['Steve', 'Alan', 'Stewart']

相关问题 更多 >