要合并两个文本文件中的特定代码行吗

<TEXT> <Unknown1>-65535</Unknown1> <autoId>1</autoId> <autoId2>0</autoId2> <alias>Name2.Boast_Duel_Season01_sudden_death_1vs1</alias> <original>Уникальная массовая дуэль: Битва один на один до полного уничтожения в один раунд</original> </TEXT> <TEXT> <Unknown1>-65535</Unknown1> <autoId>2</autoId> <autoId2>0</autoId2> <alias>Name2.Boast_Duel_Season01_sudden_death_3vs3</alias> <original>Уникальная массовая дуэль: Битва трое на трое до полного уничтожения в один раунд</original>

<TEXT> <Unknown1>-65535</Unknown1> <autoId>1</autoId> <autoId2>0</autoId2> <alias>Name2.Boast_Duel_Season01_sudden_death_1vs1</alias> <replacement>Unique mass duel one on one battle to the complete destruction of one round</replacement> </TEXT> <TEXT> <Unknown1>-65535</Unknown1> <autoId>2</autoId> <autoId2>0</autoId2> <alias>Name2.Boast_Duel_Season01_sudden_death_3vs3</alias> <replacement>Unique mass duel Battle three against three to the complete destruction of one round</replacement> </TEXT>

<TEXT> <Unknown1>-65535</Unknown1> <autoId>1</autoId> <autoId2>0</autoId2> <alias>Name2.Boast_Duel_Season01_sudden_death_1vs1</alias> <original>Уникальная массовая дуэль: Битва один на один до полного уничтожения в один раунд</original> <replacement>Unique mass duel one on one battle to the complete destruction of one round</replacement> </TEXT> <TEXT> <Unknown1>-65535</Unknown1> <autoId>2</autoId> <autoId2>0</autoId2> <alias>Name2.Boast_Duel_Season01_sudden_death_3vs3</alias> <original>Уникальная массовая дуэль: Битва трое на трое до полного уничтожения в один раунд</original> <replacement>Unique mass duel Battle three against three to the complete destruction of one round</replacement> </TEXT>

2条回答

网友

1楼 · 编辑于 2024-04-20 00:30:54

关于“从哪里开始”的真相是学习基本的python字符串操作。我感觉很好，我喜欢这样的问题，但是，这里有一个（快速和肮脏的）解决办法。你唯一需要改变的就是原始.xml“和”替换.xml“文件名。当然，您还需要一个可用的python版本。那就看你自己了。你知道吗

关于我的代码有几个注意事项：

解析XML是一个已解决的问题。使用正则表达式来做这件事是不受欢迎的，但它是有效的，当你做像这样简单和固定的事情时，它真的不重要。你知道吗
在构建输出的XML文件时，我做了一些假设（例如4个空格的缩进样式），但它会输出有效的XML。你正在使用的应用程序应该可以很好地使用它。你知道吗

import re

def loadfile(filename):
    '''
    Returns a string containing all data from file
    '''
    infile = open(filename, 'r')
    infile_string = infile.read()
    infile.close()
    return infile_string


def main():
    #load the files into strings
    original = loadfile("original.xml")
    replacement = loadfile("replacement.xml")

    #grab all of the "replacement" lines from the replacement file
    replacement_regex = re.compile("(<replacement>.*?</replacement>)")
    replacement_list = replacement_regex.findall(replacement)

    #grab all of the "TEXT" blocks from the original file
    original_regex = re.compile("(<TEXT>.*?</TEXT>)", re.DOTALL)
    original_list = original_regex.findall(original)


    #a string to write out to the new file
    outfile_string = ""
    to_find = "</original>" #this is the point where the replacement text is going to be appended after
    additional_len = len(to_find)
    for i in range(len(original_list)): #loop through all of the original text blocks
        #build a new string with the replacement text after the original
        build_string = ""
        build_string += original_list[i][:original_list[i].find(to_find)+additional_len]
        build_string += "\n" + " "*4
        build_string += replacement_list[i]
        build_string += "\n</TEXT>\n"
        outfile_string+=build_string


    #write the outfile string out to a file
    outfile = open("outfile.txt", 'w')
    outfile.write(outfile_string)
    outfile.close()

if __name__ == "__main__":
    main()

Edit（回复评论）：IndexError，list-out-of-range错误意味着regex没有正常工作（它没有找到正确数量的替换文本并抓取每个条目将其放入列表）。我测试了我在您提供的blurbs上写的内容，因此您提供的blurbs和完整的XML文件之间存在差异。如果没有相同数量的原始/替换标签或类似的东西，这将破坏代码。如果没有这些文件我是不可能弄清楚的。你知道吗

网友

2楼 · 编辑于 2024-04-20 00:30:54

这里我提供了一种简单的方法（无需xml解析）。你知道吗

def parse_org(file_handle):
    for line in file_handle:
        if "<TEXT>" in line:
            record = line## start a new record if find tag <TEXT>
        elif "</TEXT>" in line:
            yield record## end a record if find tag <\TEXT>
            record = None
        elif record is not None:
            record +=line

def parse_rep(file_handle):
    for line in file_handle:
        if "<TEXT>" in line:
            record = None
        elif "</TEXT>" in line:
            yield record
            record = None
        elif "<replacement>" in line:
            record = line

if __name__ == "__main__":
    orginal_file = open("filepath/yourfile.xml")
    replacement_file = ("filepath/yourfile.xml")
    a_new_file = open("result_file","w")
    END = "NOT"
    while END =="NOT":
        try:
            org = parse_org(orginal_file).next()
            rep = parse_rep(replacement_file).next()
            new_record = org+rep+"</TEXT>\n"
            a_new_file.write(new_record)
        except StopIteration:
            END = "YES"
    a_new_file.close()
    orginal_file.close()
    replacement_file.close()

代码是用python编写的，它使用关键字yield，使用http://www.codecademy.com/如果你想学习python，google yield python学习如何在python中使用yield。如果你想在将来处理这样的txt文件，你应该学习一种脚本语言，python可能是最简单的一种。如果你遇到任何问题，你可以张贴在这个网站上，但不要什么都不做，只是问像“为我写这个程序”。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章