如何删除ml.transcript文件中句子开头和结尾的标记，并使用python将其保存在newml.transcript文件中？

1. <r>The quick brown fox jumps over a lazy dog </r> (umnle_001_001) 2. <r> I think we should go get it now </r> (umnle_001_002) 3. ...................................................... 4. <r> When I travel, I prefer to travel by air </r> (umnle_001_129) 5. <r> The law was changed </r> (umtci_001_001) 6. <r> This soup needs more salt </r> (umtci_001_002) 7. ................................................. 8. ................................................. 9. <r> Tom sat two rows ahead of me </r> (umtci_001_197)

1. umnle_001_001 The quick brown fox jumps over a lazy dog 2. umnle_001_002 I think we should go get it now 3. ...................................................... 4. umnle_001_129 When I travel, I prefer to travel by air 5. umtci_001_001 The law was changed 6. umtci_001_002 This soup needs more salt 7. ...................................................... 8. ...................................................... 9. umtci_001_197 Tom sat two rows ahead of me

#!/usr/bin/env python fo = open(" ml.transcription", "r") y_list = [] for line in fo.readlines(): a1 = line [-15:-2] a2 = line [4:] y = str(a1)+ " "+ str(a2) a3 = y[:-22] y_list.append(a3) print(a3) fo.close() fo = open("newml.transcription", "w") for lines in y_list: fo.write(lines,"\n") fo.close()

1条回答

网友

1楼 · 发布于 2024-06-16 09:45:56

一种粗糙的方法：

import re
with open("input", "r") as input:
    for line in input:
        print line.split("</r> ")[1][2:-2] + " " + line.split("r>")[1][1:-3]

上面的打印到屏幕上，你可以通过管道把它传输到一个文件中。它假定在<r>之后以及</r>之前和之后总是有一个空格。它还假定每行以换行符结束

相关问题更多 >

编程相关推荐

热门问题

热门文章