正则表达式查找特定单词之后的所有单词?

2024-06-16 11:22:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一根绳子如下:

Features:  -Includes hanging accessories.  -Artist: William-Adolphe Bouguereau.  -Made with 100pct cotton canvas.  -100pct Anti-shrink pine wood bars and Epson anti-fade ultra chrome inks.  -100pct Hand-made and inspected in the U.S.A.  -Orientation: Horizontal.  **Subject: -Figures/Nautical and beach.**  Gender: -Unisex/Both.  Size: -Mini 17'' and under/Small 18''-24''/Medium 25''-32''/Large 33''-40''/Oversized 41'' and above.  Style: -Fine art.  Color: -Blue.  Country of Manufacture: -United States.  Product Type: -Print of painting.  Region: -Europe.  Primary Art Material: -Canvas. Dimensions:  -8'' H x 12'' W x 0.75'' D: 0.72 lb.  -12'' H x 18'' W x 0.75'' D: 1.14 lbs.  -12'' H x 18'' W x 1.5'' D: 2.45 lbs.  -18'' H x 26'' W x 0.75'' D: 1.44 lbs.  Paintings Prints Tori White Wildon Photography Photos Posters Abstract Black D cor Designs Framed Hazelwood Hokku Home Landscape Oil Accent 075 12 15 18 26 40 60 8 D H W x 1 1017 1824 2532 holidays, christmas gift gifts for girls boys

我必须在特定的单词后面找到单词。

我想提取上面例子中单词"Subject"后面的单词。

输出如下:

Subject: -Figures/Nautical and beach.

我试过下面的正则表达式:

re.compile('(?<=subject)(.{30}(?:\s|.))',re.I)

但是主题关键字后面没有固定的字数要指定,所以我不能指定确切的字数。

我该如何在“peroid”或“space”停下来。没有具体的停下来的标准。


Tags: andofre单词subjectfeaturesfiguresincludes
3条回答

正则表达式:

(Subject:.+)\*\*

Match Subject and content after that till '**'

代码:

str = 'Features:  -Includes hanging accessories.  -Artist: William-Adolphe Bouguereau.  -Made with 100pct cotton canvas.  -100pct Anti-shrink pine wood bars and Epson anti-fade ultra chrome inks.  -100pct Hand-made and inspected in the U.S.A.  -Orientation: Horizontal.  **Subject: -Figures/Nautical and beach.**  Gender: -Unisex/Both.  Size: -Mini 17'' and under/Small 18''-24''/Medium 25''-32''/Large 33''-40''/Oversized 41'' and above.  Style: -Fine art.  Color: -Blue.  Country of Manufacture: -United States.  Product Type: -Print of painting.  Region: -Europe.  Primary Art Material: -Canvas. Dimensions:  -8'' H x 12'' W x 0.75'' D: 0.72 lb.  -12'' H x 18'' W x 0.75'' D: 1.14 lbs.  -12'' H x 18'' W x 1.5'' D: 2.45 lbs.  -18'' H x 26'' W x 0.75'' D: 1.44 lbs.  Paintings Prints Tori White Wildon Photography Photos Posters Abstract Black D cor Designs Framed Hazelwood Hokku Home Landscape Oil Accent 075 12 15 18 26 40 60 8 D H W x 1 1017 1824 2532 holidays, christmas gift gifts for girls boys'
import re

a = re.search(r'(Subject:.+)\*\*',str)
print(a.group(1))

您的(?<=subject)(.{30}(?:\s|.))正则表达式在subject之后声明位置。然后抓取除换行符以外的30个字符,然后匹配空白或除换行符以外的任何字符。这并不真正符合您的要求,因为子字符串可以是任何长度。

您可以将基于交替的正则表达式与捕获组一起使用:

subject:\s*([^.]+|\S+)

regex demo

详细信息

  • subject:-文本subject:字符串
  • \s*-0+空格
  • ([^.]+|\S+)组1捕获1个或多个非周期符号或1+非空白符号

注意:[^.]+与空格匹配,\S+与空格不匹配,^{的顺序在这里很重要。如果\s*之后的子字符串以点开头,则\S+将匹配该子字符串,直到出现空白。

Python demo

import re
p = re.compile(r'subject:\s*([^.]+|\S+)', re.IGNORECASE)
s = "Features:  -Includes hanging accessories.  -Artist: William-Adolphe Bouguereau.  -Made with 100pct cotton canvas.  -100pct Anti-shrink pine wood bars and Epson anti-fade ultra chrome inks.  -100pct Hand-made and inspected in the U.S.A.  -Orientation: Horizontal.  **Subject: -Figures/Nautical and beach.**  Gender: -Unisex/Both.  Size: -Mini 17'' and under/Small 18''-24''/Medium 25''-32''/Large 33''-40''/Oversized 41'' and above.  Style: -Fine art.  Color: -Blue.  Country of Manufacture: -United States.  Product Type: -Print of painting.  Region: -Europe.  Primary Art Material: -Canvas. Dimensions:  -8'' H x 12'' W x 0.75'' D: 0.72 lb.  -12'' H x 18'' W x 0.75'' D: 1.14 lbs.  -12'' H x 18'' W x 1.5'' D: 2.45 lbs.  -18'' H x 26'' W x 0.75'' D: 1.44 lbs.  Paintings Prints Tori White Wildon Photography Photos Posters Abstract Black D cor Designs Framed Hazelwood Hokku Home Landscape Oil Accent 075 12 15 18 26 40 60 8 D H W x 1 1017 1824 2532 holidays, christmas gift gifts for girls boys"
m = p.search(s)
if m:
    print(m.group())    # this includes Subject: 
    print(m.group(1))   # this does not include Subject: 

尝试:

re.compile('Subject: [^*]+')

Demo

相关问题 更多 >