正则表达式匹配python中格式为{\…}的所有字符串

text = "F.N. Freitas, C. Singulani, G. Vila-Verde, Linea Science Server,: The Dark Energy Survey Data Release 2. Ap._J._Supp._Ser. 255, (2021).Alam S., A. de Mattia, A. Tamone, S. {\' A}vila, J.A. Peacock, V. Gonzalez-Perez, A. Smith, A. Raichoor, A.J. Ross, J.E. Bautista, E. Burtin, J. Comparat, K.S. Dawson, H. du Mas des Bourboux, S. Escoffier, H. Gil-Mar{\'\i}n, S. Habib, K. Heitmann, J. Hou, F.G. Mohammad, E.M. Mueller, R. Neveux, R. Paviot, W.J. Percival, G. Rossi, V. Ruhlmann-Kleider, R. Tojeiro, M. Vargas Maga{\~n}a, C. Zhao, G.B. Zhao: The completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: N-body mock challenge for the eBOSS emission line galaxy sample. Mon._Not._R._Astron._Soc. 504, (2021).Alam S., J.A. Peacock, D.J. Farrow, J. Loveday, A.M. Hopkins: Using GAMA to probe the impact of small-scale galaxy physics on nonlinear redshift-space distortions. Mon._Not._R._Astron._Soc. 503, (2021).Alam S., M. Aubert, S. Avila, C. Balland, J.E. Bautista, M.A. Bershady, D. Bizyaev, M.R. Blanton, A.S. Bolton, J. Bovy, J. Brinkmann, J.R. Brownstein, E. Burtin, S. Chabanier, M.J. Chapman, P.D. Choi, C.H. Chuang, J. Comparat, M.C. Cousinou, A. Cuceu, K.S. Dawson, S. de la Torre, A. de Mattia, V.S. Agathe, H.M. des Bourboux, S. Escoffier, T. Etourneau, J. Farr, A. Font-Ribera, P.M. Frinchaboy, S. Fromenteau, H. Gil-Mar{\'\i}n, J.M. Le Goff, A.X. Gonzalez-Morales, V. Gonzalez-Perez, K. Grabowski, J. Guy, A.J. Hawken, J. Hou, H. Kong, J. Parker, M. Klaene, J.P. Kneib, S. Lin, D. Long, B.W. Lyke, A. de la Macorra, P. Martini, K. Masters, F.G. Mohammad, J. Moon, E.M. Mueller, A. Mu{\~n}oz-Guti{\'e}rrez, A.D. Myers, S. Nadathur, R. Neveux, J.A. Newman, P. Noterdaeme, A. Oravetz, D. Oravetz, N. Palanque-Delabrouille, K. Pan, R. Paviot, W.J. Percival, I. P{\'e}rez-R{\`a}fols, P. Petitjean, M.M. Pieri, A. Prakash, A. Raichoor, C. Ravoux, M. Rezaie, J. Rich, A.J. Ross, G. Rossi, R. Ruggeri, V. Ruhlmann-Kleider, A.G. S{\'a}nchez, F.J. S{\'a}nchez, J.R. S{\'a}nchez-Gallego, C. Sayres, D.P. Schneider, H.J. Seo, A. Shafieloo, A. Slosar, A. Smith, J. Stermer, A. Tamone, J.L. Tinker, R. Tojeiro, M. Vargas-Maga{\~n}a, A. Variu, Y. Wang, B.A. Weaver, A.M. Weijmans, C. Y{\`e}che, P. Zarrouk, C. Zhao, G.B. Zhao, Z. Zheng: Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: Cosmological implications from two decades of spectroscopic surveys at the Apache Point Observatory. Physical_Review_D 103, (2021).Alam S., N.P. Ross, S. Eftekharzadeh, J.A. Peacock, J. Comparat, A.D. Myers, A.J. Ross: Quasars at intermediate redshift are not special; but they are often satellites. Mon._Not._R._Astron._Soc. 504, (2021).Alonso-Herrero A., S. Garc{\'\i}a-Burillo, S.F. H{\"o}nig, I. Garc{\'\i}a-Bernete, C. Ramos Almeida, O. Gonz{\'a}lez-Mart {'hallo}" encodings = { "'": u'\u0300', "'\\": u'\u0301', "^": u'\u0302', "~": u'\u0303', "o": u'\u00D8', "ss": 'ß' } # remove the encoding and replace it with its corresponding character def repl(m): string = m.group() get_open_bracket_idx = string.find('{') get_close_bracket_idx = string.find('}') encoding = substring.substringByChar( string, startChar=string[get_open_bracket_idx + 1], endChar=string[get_close_bracket_idx - 2]) string_content = string[get_close_bracket_idx - 1] string_and_encoding = encoding + string string_content = encodings.get(encoding, string_content) + string_content print() print(f'encoding: {encoding}') print(f'string content: {string_content}') print() return string_content # This nearly works, it just matches {'some_text} which it shouldnt changed_text = re.sub(r'\{\\?[^{}]*}', repl, text) print(changed_text)

1条回答

网友

1楼 · 发布于 2024-06-16 12:37:55

您需要一个匹配{的正则表达式，然后在关闭}之前将所有非单词字符捕获到组1中，然后将字母捕获到组2中。然后，您将能够检查组内容并动态生成替换字符串

正则表达式看起来像

\{([^\w\s]+|_)\s*(\w)}

见regex demo详细信息：

\{-a{字符
([^\w\s]+|_)-1组：一个特殊字符
\s*-零个或多个空格
(\w)-第2组：任何单词字符
}-}字符

样本implementation in Python：

import re
text = r"{\' A}vila,  Y{\`e}che, {'hallo}"

encodings = {
    "\\'": u'\u0300',
    "\\`": u'\u0302',
}

def repl(m):
    encoding = m.group(1)
    string_content = m.group(2)
    if encoding in encodings:
        return string_content + encodings[encoding]
    return string_content

changed_text = re.sub(r'\{([^\w\s]+|_)\s*(\w)}', repl, text)
print(changed_text)
# => Àvila,  Yêche, {'hallo}

相关问题更多 >

编程相关推荐

热门问题

热门文章