正则表达式匹配python中格式为{\…}的所有字符串

2024-06-16 12:37:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我无法构建匹配该格式所有可能字符串的正则表达式

{\some_text}

我试图为自己建立一个正则表达式,但我无法使它匹配所有类型的字符

我想到的:r"\{\\(.*)\}"

这不能正常工作,它只匹配{\~some_string}

这就是我努力实现的目标:

text = "F.N. Freitas, C. Singulani, G. Vila-Verde, Linea Science Server,: The Dark Energy Survey Data Release 2. Ap._J._Supp._Ser. 255, (2021).Alam S., A. de Mattia, A. Tamone, S. {\' A}vila, J.A. Peacock, V. Gonzalez-Perez, A. Smith, A. Raichoor, A.J. Ross, J.E. Bautista, E. Burtin, J. Comparat, K.S. Dawson, H. du Mas des Bourboux, S. Escoffier, H. Gil-Mar{\'\i}n, S. Habib, K. Heitmann, J. Hou, F.G. Mohammad, E.M. Mueller, R. Neveux, R. Paviot, W.J. Percival, G. Rossi, V. Ruhlmann-Kleider, R. Tojeiro, M. Vargas Maga{\~n}a, C. Zhao, G.B. Zhao: The completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: N-body mock challenge for the eBOSS emission line galaxy sample. Mon._Not._R._Astron._Soc. 504, (2021).Alam S., J.A. Peacock, D.J. Farrow, J. Loveday, A.M. Hopkins: Using GAMA to probe the impact of small-scale galaxy physics on nonlinear redshift-space distortions. Mon._Not._R._Astron._Soc. 503, (2021).Alam S., M. Aubert, S. Avila, C. Balland, J.E. Bautista, M.A. Bershady, D. Bizyaev, M.R. Blanton, A.S. Bolton, J. Bovy, J. Brinkmann, J.R. Brownstein, E. Burtin, S. Chabanier, M.J. Chapman, P.D. Choi, C.H. Chuang, J. Comparat, M.C. Cousinou, A. Cuceu, K.S. Dawson, S. de la Torre, A. de Mattia, V.S. Agathe, H.M. des Bourboux, S. Escoffier, T. Etourneau, J. Farr, A. Font-Ribera, P.M. Frinchaboy, S. Fromenteau, H. Gil-Mar{\'\i}n, J.M. Le Goff, A.X. Gonzalez-Morales, V. Gonzalez-Perez, K. Grabowski, J. Guy, A.J. Hawken, J. Hou, H. Kong, J. Parker, M. Klaene, J.P. Kneib, S. Lin, D. Long, B.W. Lyke, A. de la Macorra, P. Martini, K. Masters, F.G. Mohammad, J. Moon, E.M. Mueller, A. Mu{\~n}oz-Guti{\'e}rrez, A.D. Myers, S. Nadathur, R. Neveux, J.A. Newman, P. Noterdaeme, A. Oravetz, D. Oravetz, N. Palanque-Delabrouille, K. Pan, R. Paviot, W.J. Percival, I. P{\'e}rez-R{\`a}fols, P. Petitjean, M.M. Pieri, A. Prakash, A. Raichoor, C. Ravoux, M. Rezaie, J. Rich, A.J. Ross, G. Rossi, R. Ruggeri, V. Ruhlmann-Kleider, A.G. S{\'a}nchez, F.J. S{\'a}nchez, J.R. S{\'a}nchez-Gallego, C. Sayres, D.P. Schneider, H.J. Seo, A. Shafieloo, A. Slosar, A. Smith, J. Stermer, A. Tamone, J.L. Tinker, R. Tojeiro, M. Vargas-Maga{\~n}a, A. Variu, Y. Wang, B.A. Weaver, A.M. Weijmans, C. Y{\`e}che, P. Zarrouk, C. Zhao, G.B. Zhao, Z. Zheng: Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: Cosmological implications from two decades of spectroscopic surveys at the Apache Point Observatory. Physical_Review_D 103, (2021).Alam S., N.P. Ross, S. Eftekharzadeh, J.A. Peacock, J. Comparat, A.D. Myers, A.J. Ross: Quasars at intermediate redshift are not special; but they are often satellites. Mon._Not._R._Astron._Soc. 504, (2021).Alonso-Herrero A., S. Garc{\'\i}a-Burillo, S.F. H{\"o}nig, I. Garc{\'\i}a-Bernete, C. Ramos Almeida, O. Gonz{\'a}lez-Mart {'hallo}"


encodings = {
    "'": u'\u0300',
    "'\\": u'\u0301',
    "^": u'\u0302',
    "~": u'\u0303',
    "o":  u'\u00D8',
    "ss": 'ß'

}

# remove the encoding and replace it with its corresponding character
def repl(m):
    string = m.group()
    get_open_bracket_idx = string.find('{')
    get_close_bracket_idx = string.find('}')
    encoding = substring.substringByChar(
        string, startChar=string[get_open_bracket_idx + 1], endChar=string[get_close_bracket_idx - 2])
    string_content = string[get_close_bracket_idx - 1]
    string_and_encoding = encoding + string
    string_content = encodings.get(encoding, string_content) + string_content
    print()
    print(f'encoding: {encoding}')
    print(f'string content: {string_content}')
    print()
    return string_content


# This nearly works, it just matches {'some_text} which it shouldnt
changed_text = re.sub(r'\{\\?[^{}]*}', repl, text)
print(changed_text)


Tags: thetextgetstringdesomecontentsurvey
1条回答
网友
1楼 · 发布于 2024-06-16 12:37:55

您需要一个匹配{的正则表达式,然后在关闭}之前将所有非单词字符捕获到组1中,然后将字母捕获到组2中。然后,您将能够检查组内容并动态生成替换字符串

正则表达式看起来像

\{([^\w\s]+|_)\s*(\w)}

regex demo详细信息

  • \{-a{字符
  • ([^\w\s]+|_)-1组:一个特殊字符
  • \s*-零个或多个空格
  • (\w)-第2组:任何单词字符
  • }-}字符

样本implementation in Python

import re
text = r"{\' A}vila,  Y{\`e}che, {'hallo}"

encodings = {
    "\\'": u'\u0300',
    "\\`": u'\u0302',
}

def repl(m):
    encoding = m.group(1)
    string_content = m.group(2)
    if encoding in encodings:
        return string_content + encodings[encoding]
    return string_content

changed_text = re.sub(r'\{([^\w\s]+|_)\s*(\w)}', repl, text)
print(changed_text)
# => Àvila,  Yêche, {'hallo}

相关问题 更多 >