我无法构建匹配该格式所有可能字符串的正则表达式
{\some_text}
我试图为自己建立一个正则表达式,但我无法使它匹配所有类型的字符
我想到的:r"\{\\(.*)\}"
这不能正常工作,它只匹配{\~some_string}
这就是我努力实现的目标:
text = "F.N. Freitas, C. Singulani, G. Vila-Verde, Linea Science Server,: The Dark Energy Survey Data Release 2. Ap._J._Supp._Ser. 255, (2021).Alam S., A. de Mattia, A. Tamone, S. {\' A}vila, J.A. Peacock, V. Gonzalez-Perez, A. Smith, A. Raichoor, A.J. Ross, J.E. Bautista, E. Burtin, J. Comparat, K.S. Dawson, H. du Mas des Bourboux, S. Escoffier, H. Gil-Mar{\'\i}n, S. Habib, K. Heitmann, J. Hou, F.G. Mohammad, E.M. Mueller, R. Neveux, R. Paviot, W.J. Percival, G. Rossi, V. Ruhlmann-Kleider, R. Tojeiro, M. Vargas Maga{\~n}a, C. Zhao, G.B. Zhao: The completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: N-body mock challenge for the eBOSS emission line galaxy sample. Mon._Not._R._Astron._Soc. 504, (2021).Alam S., J.A. Peacock, D.J. Farrow, J. Loveday, A.M. Hopkins: Using GAMA to probe the impact of small-scale galaxy physics on nonlinear redshift-space distortions. Mon._Not._R._Astron._Soc. 503, (2021).Alam S., M. Aubert, S. Avila, C. Balland, J.E. Bautista, M.A. Bershady, D. Bizyaev, M.R. Blanton, A.S. Bolton, J. Bovy, J. Brinkmann, J.R. Brownstein, E. Burtin, S. Chabanier, M.J. Chapman, P.D. Choi, C.H. Chuang, J. Comparat, M.C. Cousinou, A. Cuceu, K.S. Dawson, S. de la Torre, A. de Mattia, V.S. Agathe, H.M. des Bourboux, S. Escoffier, T. Etourneau, J. Farr, A. Font-Ribera, P.M. Frinchaboy, S. Fromenteau, H. Gil-Mar{\'\i}n, J.M. Le Goff, A.X. Gonzalez-Morales, V. Gonzalez-Perez, K. Grabowski, J. Guy, A.J. Hawken, J. Hou, H. Kong, J. Parker, M. Klaene, J.P. Kneib, S. Lin, D. Long, B.W. Lyke, A. de la Macorra, P. Martini, K. Masters, F.G. Mohammad, J. Moon, E.M. Mueller, A. Mu{\~n}oz-Guti{\'e}rrez, A.D. Myers, S. Nadathur, R. Neveux, J.A. Newman, P. Noterdaeme, A. Oravetz, D. Oravetz, N. Palanque-Delabrouille, K. Pan, R. Paviot, W.J. Percival, I. P{\'e}rez-R{\`a}fols, P. Petitjean, M.M. Pieri, A. Prakash, A. Raichoor, C. Ravoux, M. Rezaie, J. Rich, A.J. Ross, G. Rossi, R. Ruggeri, V. Ruhlmann-Kleider, A.G. S{\'a}nchez, F.J. S{\'a}nchez, J.R. S{\'a}nchez-Gallego, C. Sayres, D.P. Schneider, H.J. Seo, A. Shafieloo, A. Slosar, A. Smith, J. Stermer, A. Tamone, J.L. Tinker, R. Tojeiro, M. Vargas-Maga{\~n}a, A. Variu, Y. Wang, B.A. Weaver, A.M. Weijmans, C. Y{\`e}che, P. Zarrouk, C. Zhao, G.B. Zhao, Z. Zheng: Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: Cosmological implications from two decades of spectroscopic surveys at the Apache Point Observatory. Physical_Review_D 103, (2021).Alam S., N.P. Ross, S. Eftekharzadeh, J.A. Peacock, J. Comparat, A.D. Myers, A.J. Ross: Quasars at intermediate redshift are not special; but they are often satellites. Mon._Not._R._Astron._Soc. 504, (2021).Alonso-Herrero A., S. Garc{\'\i}a-Burillo, S.F. H{\"o}nig, I. Garc{\'\i}a-Bernete, C. Ramos Almeida, O. Gonz{\'a}lez-Mart {'hallo}"
encodings = {
"'": u'\u0300',
"'\\": u'\u0301',
"^": u'\u0302',
"~": u'\u0303',
"o": u'\u00D8',
"ss": 'ß'
}
# remove the encoding and replace it with its corresponding character
def repl(m):
string = m.group()
get_open_bracket_idx = string.find('{')
get_close_bracket_idx = string.find('}')
encoding = substring.substringByChar(
string, startChar=string[get_open_bracket_idx + 1], endChar=string[get_close_bracket_idx - 2])
string_content = string[get_close_bracket_idx - 1]
string_and_encoding = encoding + string
string_content = encodings.get(encoding, string_content) + string_content
print()
print(f'encoding: {encoding}')
print(f'string content: {string_content}')
print()
return string_content
# This nearly works, it just matches {'some_text} which it shouldnt
changed_text = re.sub(r'\{\\?[^{}]*}', repl, text)
print(changed_text)
您需要一个匹配
{
的正则表达式,然后在关闭}
之前将所有非单词字符捕获到组1中,然后将字母捕获到组2中。然后,您将能够检查组内容并动态生成替换字符串正则表达式看起来像
见regex demo详细信息:
\{
-a{
字符([^\w\s]+|_)
-1组:一个特殊字符\s*
-零个或多个空格(\w)
-第2组:任何单词字符}
-}
字符李>样本implementation in Python:
相关问题 更多 >
编程相关推荐