如何在Python中清理文本文件？

2 投票

2 回答

5500 浏览

提问于 2025-04-16 16:27

我有一个文件里的文本，看起来是这样的：

text1 5,000 6,000
text2 2,000 3,000
text3 
           5,000 3,000
text4 1,000 2000
text5
          7,000 1,000
text6 2,000 1,000

有没有什么方法可以用Python来整理一下这些内容？比如，如果某一行的后面缺少数字，可以把下一行的数字放到上一行的后面：

text1 5,000 6,000
text2 2,000 3,000
text3 5,000 3,000
text4 1,000 2000
text5 7,000 1,000
text6 2,000 1,000

谢谢！

文件操作文本处理文本清理数据整理

2 个回答

假设每行应该正好有三个“单词”，你可以使用

tokens = (x for line in open("file") for x in line.split())
for t in zip(tokens, tokens, tokens):
    print str.join(" ", t)

补充说明: 由于显然上面的假设并不成立，这里有一个实际查看数据的实现方法：

from itertools import groupby
tokens = (x for line in open("file") for x in line.split())
for key, it in groupby(tokens, lambda x: x[0].isdigit()):
    if key:
        print str.join(" ", it)
    else:
        print str.join("\n", it),

回答于 2025-04-16 由 Python大师

分享举报

假设逻辑行在以空白开头的行上“继续”（这些行可以包含任意数量的记录），你可以使用以下代码：

>>> collapse_space = lambda s: str.join(" ", s.split())
>>>
>>> logical_lines = []
>>> for line in open("text"):
...   if line[0].isspace():
...     logical_lines[-1] += line #-- append the continuation to the last logical line
...   else:
...     logical_lines.append(line) #-- start a new logical line
... 
>>> l = map(collapse_space, logical_lines)
>>>
>>> print str.join("\n", l)
text1 5,000 6,000
text2 2,000 3,000
text3 5,000 3,000
text4 1,000 2000
text5 7,000 1,000
text6 2,000 1,000

回答于 2025-04-16 由 Python大师

分享举报

如何在Python中清理文本文件？

2 个回答

撰写回答