Python 分割时如何去除分隔符?

-1 投票
3 回答
6772 浏览
提问于 2025-04-18 14:38

我有一段代码,它会在每个分号后面加上一个分隔符 ~||~,或者在500个字符后加上这个分隔符。这个功能是正常工作的,但它在找到分号的时候却把分号给删掉了。我在这里查过,找到了一个答案,但我不知道怎么把它用到我的代码里。

chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
    length = len(line)
    if length > chunk_len:
        chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
        lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()

我在这里找到了这个解决方案,但我找不到把它融入到我代码里的方法。抱歉重复提问。

d = ">"
for line in all_lines:
    s =  [e+d for e in line.split(d) if e != ""]

3 个回答

0

lines = text.split(';')

改成

lines = filter(None,re.split('([^;]+;)',text))

这样就能保留分号了……或者你也可以像其他回答那样,稍后再加上分号。

1

如果我理解你的问题没错,你其实是想在每个分号后面和每500个字符后面插入你自己的分隔符。可以试着分两步来做:

with open(filename, "r") as fi: # read in file using "with" statement
    text = fi.read()

block_size = 500            # sets how many characters separate new_delim
old_delim = ";"             # character we are adding the new delimiter to
new_delim = "~||~"          # this will be inserted every block_size characters
del_length = len(new_delim) # store length to prevent repeated calculations

for i in xrange(len(line)/block_size): 
    # calculate next index where the new delimiter should be inserted
    index = i*block_size + i*del_length + block_size

    # construct new string with new delimiter at the given index        
    text = "{0}{0}{1}".format(text[:index], new_delim, text[index:]) 

replacement_delim = old_delim + new_delim # old_delim will be replaced with this

with open(outputfile, 'w') as fo:
    # write out new string with new delimiter appended to each semicolon
    fo.write(text.replace(old_delim, replacement_delim))

如果分号恰好出现在每500个字符的倍数位置,你可能会发现你的特殊分隔符会挨在一起。而且,如果你的字符串正好是块大小的倍数,那么在字符串的末尾也会有你的分隔符。

另外,如果你在处理非常大的文件,这种方法可能不是最好的选择。因为每次插入分隔符时,For循环都会创建一个全新的字符串。

这种方法让分割方法对分隔符的处理变得无效。

-2

split() 是一个用来把字符串切分的方法,它会把你指定的分隔符去掉。你只需要在切分后把这个分隔符加回来就可以了。我在你循环的下面做了这个操作: line = line + d

chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
    line = line + d  #NEW LINE ADDED HERE
    length = len(line)
    if length > chunk_len:
        chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
        lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()

撰写回答