将文本文件按字符分割为列表项导入Python
我有一个普通的文本文件,里面的内容如下:
@M00964: XXXXX
YYY
+
ZZZZ
@M00964: XXXXX
YYY
+
ZZZZ
@M00964: XXXXX
YYY
+
ZZZZ
我想把这些内容读入一个列表中,并根据ID代码@M00964
来分割,也就是说:
['@M00964: XXXXX
YYY
+
ZZZZ'
'@M00964: XXXXX
YYY
+
ZZZZ'
'@M00964: XXXXX
YYY
+
ZZZZ']
我试过使用
in_file = open(fileName,"r")
sequences = in_file.read().split('@M00964')[1:]
in_file.close()
但这样会把ID序列@M00964
去掉。有没有办法保留这个ID序列呢?
另外,我想问一下,是否有办法在列表中保持空格(而不是用/n符号表示)?
我的总体目标是读取这一组项目,举个例子,取前两个,然后把它们写回一个文本文件,同时保持所有原始格式。
3 个回答
0
只需要在@符号上进行分割就可以了:
with open(fileName,"r") as in_file:
sequences = in_file.read().replace("@","###@").split('###')
3
如果你的文件很大,而你又不想把整个文件都放在内存里,你可以使用这个辅助函数来逐条读取记录:
def chunk_records(filepath)
with open(filepath, 'r') as f:
record = []
for line in f:
# could use regex for more complicated matching
if line.startswith('@M00964') and record:
yield ''.join(record)
record = []
else:
record.append(line)
if record:
yield ''.join(record)
用法如下:
for record in chunk_records('/your/filename.txt'):
...
或者,如果你想把整个文件都放在内存里:
records = list(chunk_records('/your/filename.txt'))
0
针对你的例子,你是不是可以这样做:
in_file = open(fileName, 'r')
file = in_file.readlines()
new_list = [''.join(file[i*4:(i+1)*4]) for i in range(int(len(file)/4))]
list_no_n = [item.replace('\n','') for item in new_list]
print new_list
print list_no_n
[扩展形式]
new_list = []
for i in range(int(len(file)/4)): #Iterates through 1/4 of the length of the file lines.
#This is because we will be dealing in groups of 4 lines
new_list.append(''.join(file[i*4:(i+1)*4])) #Joins four lines together into a string and adds it to the new_list
[写入新文件]
write_list = ''.join(new_list).split('\n')
output_file = open(filename, 'w')
output_file.writelines(write_list)