如何在Python中将制表符分隔和管道分隔转换为CSV文件格式

4 投票

5 回答

12324 浏览

提问于 2025-04-15 14:01

我有一个文本文件（.txt），这个文件可能是用制表符（tab）分隔的，也可能是用管道符（pipe）分隔的。我需要把它转换成CSV文件格式。我现在使用的是Python 2.6。有没有人能告诉我怎么识别文本文件中的分隔符，读取数据，然后把它转换成用逗号分隔的文件呢？

提前谢谢大家！

文本解析文本文件处理数据格式转换 csv文件分隔符识别

5 个回答

像这样

from __future__ import with_statement 
import csv
import re
with open( input, "r" ) as source:
    with open( output, "wb" ) as destination:
        writer= csv.writer( destination )
        for line in input:
            writer.writerow( re.split( '[\t|]', line ) )

回答于 2025-04-15 由 Python大师

分享举报

你的策略可以这样进行：

用一个可以读取用制表符分隔的csv文件的工具和一个可以读取用管道符分隔的csv文件的工具，分别解析这个文件。
对得到的结果进行一些统计，来决定你想要写入的结果集。一个想法是计算两个结果集中字段的总数（因为制表符和管道符不太常见）。另一个想法是，如果你的数据结构很明确，并且你期望每一行的字段数量相同，可以计算每行字段数量的标准差，然后选择标准差最小的结果集。

在下面的例子中，你会看到一个更简单的统计方法（字段的总数）

import csv

piperows= []
tabrows = []

#parsing | delimiter
f = open("file", "rb")
readerpipe = csv.reader(f, delimiter = "|")
for row in readerpipe:
 piperows.append(row)
f.close()

#parsing TAB delimiter
f = open("file", "rb")
readertab = csv.reader(f, delimiter = "\t")
for row in readerpipe:
 tabrows.append(row)
f.close()

#in this example, we use the total number of fields as indicator (but it's not guaranteed to work! it depends by the nature of your data)
#count total fields
totfieldspipe = reduce (lambda x,y: x+ y, [len(f) for f in piperows])
totfieldstab = reduce (lambda x,y: x+ y, [len(f) for f in tabrows])

if totfieldspipe > totfieldstab:
 yourrows = piperows
else:
 yourrows = tabrows


#the var yourrows contains the rows, now just write them in any format you like

回答于 2025-04-15 由 Python大师

分享举报

我担心你在不知道分隔符是什么的情况下，无法识别它。CSV（逗号分隔值）的问题在于，正如ESR所说：

微软版本的CSV是一个教科书式的反面教材，告诉我们如何不设计文本文件格式。

如果分隔符可能出现在字段中，就需要以某种方式对其进行转义。要自动识别分隔符而不知道转义的方式是很困难的。转义可以采用UNIX的方式，用反斜杠'\'，或者微软的方式，用引号，这样引号也必须被转义。这并不是一件简单的事情。

所以我的建议是，向生成你想转换的文件的人获取完整的文档。这样你就可以使用其他答案中建议的方法之一，或者某种变体。

编辑：

Python提供了csv.Sniffer，可以帮助你推断你的分隔值文件（DSV）的格式。如果你的输入看起来像这样（注意第二行第一个字段中的引号分隔符）：

a|b|c
"a|b"|c|d
foo|"bar|baz"|qux

你可以这样做：

import csv

csvfile = open("csvfile.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)

reader = csv.DictReader(csvfile, dialect=dialect)
for row in reader:
    print row,
# => {'a': 'a|b', 'c': 'd', 'b': 'c'} {'a': 'foo', 'c': 'qux', 'b': 'bar|baz'}
# write records using other dialect

回答于 2025-04-15 由 Python大师

分享举报

如何在Python中将制表符分隔和管道分隔转换为CSV文件格式

5 个回答

撰写回答