解析复杂文本fi

savetonotherfile.write( openfileagain.read().replace( "b'<HTML>\n<HEAD>\n<TITLE> Euro Millions Winning Numbers</TITLE>\n<BODY>\n<PRE> Euro Millions Winning Numbers\n\nNo., Day,DD,MMM,YYYY, N1,N2,N3,N4,N5,L1,L2, Jackpot, Wins\n", '').replace( "\n<HR><B>All lotteries below have exceeded the 180 days expiry date</B><HR>No., Day,DD,MMM,YYYY, N1,N2,N3,N4,N5,L1,L2, Jackpot, Wins\n", '').replace( "\n\nThis page shows all the draws that used any machine and any ball set in any year.\n\nData obtained from http://lottery.merseyworld.com/Euro/\n</PRE>\n</BODY></HTML>\n'", ''))

3条回答

网友

1楼 · 编辑于 2024-04-19 04:17:05

在replace的第一个参数中的字符串文字之前添加r。或者将\n改为\\n。在

网友

2楼 · 编辑于 2024-04-19 04:17:05

对于复杂的文本操作，证据是必须使用正则表达式。
我建议你学习re模块。您将获得比修补replace（）

关于您给出的代码，执行会这样做：
-获取handler文件openfileagain：创建字符串1
-替换此文本的一部分，id est of this string#1:创建新字符串2
-替换文本的第二部分，也就是说替换字符串2中的所述部分：这将创建第三个字符串3
-替换第三部分，也就是说替换字符串3中的这个部分：这将创建一个字符串4。在

使用正则表达式时，您将给出由要替换的3个部分组成的信息，re机器将直接从字符串1创建相同的字符串4，而不必传递字符串2和3。在

网友

3楼 · 编辑于 2024-04-19 04:17:05

尝试像这样使用html并不是一个好主意——通常最好使用一个html解析模块，比如beautifulsoup（假设是html，请参阅下面的编辑）。不管怎样，如果将代码分解成更小的步骤，并分解出长的替换字符串，那么您将能够更容易地找到错误。E、 g.：

replace_map = (('first string', 'replace with this'),
               ('second string', 'replace the second with this'))

with open(inputfilename, 'rt') as infile:
    output = infile.read()
    for fromstr, tostr in replace_map:
        output = output.replace(fromstr, tostr)

with open(outputfilename, 'wt') as outfile:
    outfile.write(output)

编辑：在发布我的答案后，我注意到您似乎正在分析"b'<html code/>'"形式的文本，这是正确的吗？看起来您有一个描述python bytes对象的字符串。如果你真的是这样做的话，那么html解析对你没有帮助，但是我建议你认真地质疑你为什么要这么做，并决定这是否是实现最终结果的最佳方式。在

相关问题更多 >

编程相关推荐

热门问题

热门文章