用Python创建包含多行的输出文件

5 投票

4 回答

18386 浏览

提问于 2025-04-16 12:44

我有一个文件，里面有我想提取的特定数据。

这个文件的内容是这样的：

DS User ID 1  
random garbage  
random garbage  
DS  N user name 1   
random garbage  
DS User ID 2   
random garbage  
random garbage  
DS  N user name 2

到目前为止，我已经有了：

import sys  
import re  
f = open(sys.argv[1])

strToSearch = ""

for line in f:
        strToSearch += line

patFinder1 = re.compile('DS\s+\d{4}|DS\s{2}\w\s{2}\w.*|DS\s{2}N', re.MULTILINE)

for i in findPat1:  
    print(i)

我在屏幕上看到的输出是这样的：

DS user ID 1  
DS  N user name 1  
DS user ID 2  
DS  N user name 2

如果我使用下面的代码写入文件：

outfile = "test.dat"   
FILE = open(outfile,"a")  
FILE.writelines(line)  
FILE.close()

那么所有内容都会被压缩到一行：

DS user ID 1DS  N user name 1DS user ID 2DS  N user name 2

对于输出的第一种情况，我可以接受。不过，理想情况下，我希望从输出文件中去掉 'DS' 和 'DS N'，并且让内容用逗号分隔。

User ID 1,user name 1  
User ID 2, username 2

有没有什么好主意可以实现这个目标？

文本处理数据提取文件处理数据清洗逗号分隔输出格式多行输出

4 个回答

FILE.writelines(line)

不会添加换行符。

只需这样做：

FILE.write(line + "\n")

或者：

FILE.write("\n".join(lines))

回答于 2025-04-16 由 Python大师

分享举报

print 在输出内容后会自动加一个换行符，也就是说每次打印完内容后，光标会跳到下一行。而 writelines 则不会自动换行，所以你需要像下面这样写：

file = open(outfile, "a")
file.writelines((i + '\n' for i in findPat1))
file.close()

你也可以把 writelines 这样写：

for i in findPat1:
    file.write(i + '\n')

回答于 2025-04-16 由 Python大师

分享举报

要给出一个可靠的解决方案，首先得了解实际输入数据的格式、允许多大的灵活性，以及解析后的数据将如何使用。

仅仅根据上面给出的示例输入和输出，我们可以快速写出一段能工作的示例代码：

out = open("test.dat", "a") # output file

for line in open("input.dat"):
    if line[:3] != "DS ": continue # skip "random garbage"

    keys = line.split()[1:] # split, remove "DS"
    if keys[0] != "N": # found ID, print with comma
        out.write(" ".join(keys) + ",")
    else: # found name, print and end line
        out.write(" ".join(keys[1:]) + "\n")

输出文件将会是：

User ID 1,user name 1
User ID 2,user name 2

当然，如果知道格式的具体要求，这段代码可以通过使用正则表达式变得更加强大。例如：

import re
pat_id = re.compile(r"DS\s+(User ID\s+\d+)")
pat_name = re.compile(r"DS\s+N\s+(.+\s+\d+)")
out = open("test.dat", "a")

for line in open("input.dat"):
    match = pat_id.match(line)
    if match: # found ID, print with comma
        out.write(match.group(1) + ",")
        continue
    match = pat_name.match(line)
    if match: # found name, print and end line
        out.write(match.group(1) + "\n")

上面的两个例子假设“用户ID X”总是在“N 用户名 X”之前，因此它们各自的结尾字符是“,”和“\n”。

如果顺序不固定，可以把这些值存储在一个字典里，用数字ID作为键，然后在解析完所有输入后打印出ID和名字的配对。

如果你提供更多信息，也许我们能提供更好的帮助。

回答于 2025-04-16 由 Python大师

分享举报

用Python创建包含多行的输出文件

4 个回答

撰写回答