如何限制每行字符数而不影响单词?
我想做的是打开一个包含几个段落的文本文件,并给每一行设置一个最大字符数的限制。不过,我的算法有个问题,因为这样会把单词切断,这样是不行的。我不太确定该怎么处理这个问题。另外,我也不知道怎么换行。
我查看了textwrap这个工具,但目前我不想用它,因为我想提高我的算法能力。
所以我的算法是先打开文件:
f.open("file.txt", "r", encoding="utf-8")
lines = f.readlines()
f.close()
现在我有了所有行的列表。这时候我就卡住了。我该怎么限制每一行的长度呢?
我真的不太确定该怎么做,非常希望能得到一些帮助。
谢谢。
5 个回答
要找到正确的方法,你首先需要弄清楚对于那些超过规定长度的内容,你想怎么处理。假设你想要一种比较传统的换行方式,也就是多出来的文字自动流到下一行,那么你可以用类似下面的逻辑来实现(注意,这只是伪代码)
for(int lineCount=0; lineCount<totalLines; lineCount++){
currentLine=lines[lineCount];
if(currentLine.length < targetLength){
int snipStart=currentLine.find_whitespace_before_targetLength;
snip = currentLine.snip(snipStart, currentLine.length);
if(lineCount<totalLines-1){
lines[lineCount+1].prepend(snip);
}else{
//Add snip to line array, since the last line is too long
}
}
}
作为程序员,掌握阅读和理解别人写的源代码的能力是很重要的。我知道你不想使用 textwrap
模块,但你可以从它的源代码中学习。原因在于,你需要去“逆向工程”,也就是理解别人脑海中对问题的思考方式。这样,你也能学会如何更好地写代码。
你可以在 c:\Python34\Lib\textwrap.py
找到 textwrap
的实现。你可以把它复制到你的工作目录,并重命名,以便进行实验。
Test.txt 文件里包含:
"""
What I'm trying to do is open up a text file with some paragraphs and give each line a maximum width of X number of characters.
However, I'm having a flaw in my algorithm as this will cut out words and it's not going to work.
I'm not really sure how to go about this. Also I'm not sure how to make it change line.
"""
with open("test.txt") as f:
lines = f.readlines()
max_width = 25
result = ""
col = 0
for line in lines:
for word in line.split():
end_col = col + len(word)
if col != 0:
end_col += 1
if end_col > max_width:
result += '\n'
col = 0
if col != 0:
result += ' '
col += 1
result += word
col += len(word)
print result
What I'm trying to do is
open up a text file with
some paragraphs and give
each line a maximum width
of X number of
characters.
What I'm trying to do is
open up a text file with
some paragraphs and give
each line a maximum width
of X number of
characters. However, I'm
having a flaw in my
algorithm as this will
cut out words and it's
not going to work.
What I'm trying to do is
open up a text file with
some paragraphs and give
each line a maximum width
of X number of
characters. However, I'm
having a flaw in my
algorithm as this will
cut out words and it's
not going to work. I'm
not really sure how to go
about this. Also I'm not
sure how to make it
change line.
有几种方法可以解决这个问题。一个方法是找到右边界之前的最后一个空格,然后在这个空格处分割字符串,先打印出前面的部分,然后对后面的部分重复这个查找和分割的过程。
还有一种方法是把文本分成一个个单词,然后一个一个地把单词放到一行的缓冲区里。如果下一个单词放进去会超过这一行的长度,就先打印出这一行,然后清空缓冲区重新开始。(另外,这段代码还允许你设置左边距。)
def par(s, wrap = 72, margin = 0):
"""Print a word-wrapped paragraph with given width and left margin"""
left = margin * " "
line = ""
for w in s.split():
if len(line) + len(w) >= wrap:
print left + line
line = ""
if line: line += " "
line += w
print left + line
print
par("""What I'm trying to do is open up a text file with some
paragraphs and give each line a maximum width of X number of
characters.""", 36)
par("""However, I'm having a flaw in my algorithm as this
will cut out words and it's not going to work. I'm not really
sure how to go about this. Also I'm not sure how to make it
change line.""", 36, 44)
par("""I checked textwrap and I don't really want to use it at
this point since I want to improve my algorithmic skills.""",
64, 8)
当然,除了打印,你也可以返回一个包含换行符的多行字符串,或者更好的是,返回一个行的列表。
你可以使用标准的 textwrap
模块:
import textwrap
txt = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
print '\n'.join(textwrap.wrap(txt, 20, break_long_words=False))
首先,读取文件时应该使用 with
这个结构:
with open(filename, 'r') as f:
lines = f.readlines()
def wrap(line):
broken = textwrap.wrap(line, 20, break_long_words=False)
return '\n'.join(broken)
wrapped = [wrap(line) for line in lines]
但是你说过,你不想使用内置的 textwrap,而是想自己实现,所以这里有一个不需要导入的解决方案:
import textwrap
lorem = """Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Phasellus ac commodo libero, at dictum leo. Nunc convallis est id purus porta,
malesuada erat volutpat. Cras commodo odio nulla. Nam vehicula risus id lacus
vestibulum. Maecenas aliquet iaculis dignissim. Phasellus aliquam facilisis
pellentesque ultricies. Vestibulum dapibus quam leo, sed massa ornare eget.
Praesent euismod ac nulla in lobortis.
Sed sodales tellus non semper feugiat."""
def wrapped_lines(line, width=80):
whitespace = set(" \n\t\r")
length = len(line)
start = 0
while start < (length - width):
# we take next 'width' of characters:
chunk = line[start:start+width+1]
# if there is a newline in it, let's return first part
if '\n' in chunk:
end = start + chunk.find('\n')
yield line[start:end]
start = end+1 # we set new start on place where we are now
continue
# if no newline in chunk, let's find the first whitespace from the end
for i, ch in enumerate(reversed(chunk)):
if ch in whitespace:
end = (start+width-i)
yield line[start:end]
start = end + 1
break
else: # just for readability
continue
yield line[start:]
for line in wrapped_lines(lorem, 30):
print line
编辑 我不太喜欢上面的版本,觉得有点丑,而且不够 Python 风格。这里有另一个版本:
def wrapped_lines(line, width=80):
whitespace = set(" \n\t\r")
length = len(line)
start = 0
while start < (length - width):
end = start + width + 1
chunk = line[start:end]
try:
end = start + chunk.index('\n')
except ValueError: # no newline in chunk
# we iterate characters from the end:
for i, ch in enumerate(reversed(chunk)):
if ch in whitespace:
end -= i # we have our end on first whitespace
break
yield line[start:end]
start = end + 1
yield line[start:]