Python 分割时如何去除分隔符?
我有一段代码,它会在每个分号后面加上一个分隔符 ~||~,或者在500个字符后加上这个分隔符。这个功能是正常工作的,但它在找到分号的时候却把分号给删掉了。我在这里查过,找到了一个答案,但我不知道怎么把它用到我的代码里。
chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
length = len(line)
if length > chunk_len:
chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()
我在这里找到了这个解决方案,但我找不到把它融入到我代码里的方法。抱歉重复提问。
d = ">"
for line in all_lines:
s = [e+d for e in line.split(d) if e != ""]
3 个回答
0
把
lines = text.split(';')
改成
lines = filter(None,re.split('([^;]+;)',text))
这样就能保留分号了……或者你也可以像其他回答那样,稍后再加上分号。
1
如果我理解你的问题没错,你其实是想在每个分号后面和每500个字符后面插入你自己的分隔符。可以试着分两步来做:
with open(filename, "r") as fi: # read in file using "with" statement
text = fi.read()
block_size = 500 # sets how many characters separate new_delim
old_delim = ";" # character we are adding the new delimiter to
new_delim = "~||~" # this will be inserted every block_size characters
del_length = len(new_delim) # store length to prevent repeated calculations
for i in xrange(len(line)/block_size):
# calculate next index where the new delimiter should be inserted
index = i*block_size + i*del_length + block_size
# construct new string with new delimiter at the given index
text = "{0}{0}{1}".format(text[:index], new_delim, text[index:])
replacement_delim = old_delim + new_delim # old_delim will be replaced with this
with open(outputfile, 'w') as fo:
# write out new string with new delimiter appended to each semicolon
fo.write(text.replace(old_delim, replacement_delim))
如果分号恰好出现在每500个字符的倍数位置,你可能会发现你的特殊分隔符会挨在一起。而且,如果你的字符串正好是块大小的倍数,那么在字符串的末尾也会有你的分隔符。
另外,如果你在处理非常大的文件,这种方法可能不是最好的选择。因为每次插入分隔符时,For循环都会创建一个全新的字符串。
这种方法让分割方法对分隔符的处理变得无效。
-2
split()
是一个用来把字符串切分的方法,它会把你指定的分隔符去掉。你只需要在切分后把这个分隔符加回来就可以了。我在你循环的下面做了这个操作:
line = line + d
chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
line = line + d #NEW LINE ADDED HERE
length = len(line)
if length > chunk_len:
chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()