尝试在多条固定长度的行上打印单个字符串并尽量减少成本

3 投票

3 回答

1723 浏览

提问于 2025-04-17 04:49

首先，给大家一些背景信息。我刚开始学习算法（现在觉得自己在逻辑和推理方面还不够强），我一直在尝试把“This is a sample text”这句话分成多行，每行最多7个字符。所以第一行应该是：

this is  (no spaces left in the end so cost 0)
a  
[cost=6*6*6(The spaces left at the end of each line are cubed which will be the cost) ]
sample [cost=1*1*1]
text [cost= 3*3*3]

(Total cost = 0+216+1+27=244)

现在，这个过程可以优化一下：

this [cost 3*3*3]
is a [cost 3*3*3]
sample [cost 1*1*1]
text [cost 3*3*3]

[Total cost = 27+27+1+27 = 82]

很明显，我们不能用贪心算法来解决这个问题，而是应该使用动态规划。不过，我的问题是我无法找到可以重复使用的子结构。我真的不知道怎么把成本条件和在Python中打印的过程联系起来。我可以给每个单词编号，也可以得到每个单词的长度，但接下来该怎么做我就卡住了。当我打印的时候，所有的内容都在一行上显示（这就是我目前的进展）。

如果这个问题听起来很傻，我很抱歉，但我真的卡住了，需要一些帮助。

谢谢！

这是我尝试实现的代码，虽然我对代码进行了测试，这些测试是我朋友写的，但我觉得我没有做到正确。任何帮助或建议都非常感谢。

print_test.py

 import os
 import sys
 from glob import glob

  #TODO -- replace this with your solution 
 from printing import print_neatly

 log = open('output.log', 'w')

 #This tests the code against my own text
 maxline = 80
 for source in glob('*.txt'):
 with open(source) as f:
    fulltext = f.read()

 words = fulltext.split()
 (cost, text) = print_neatly(words, maxline)

 #double check the cost
 #lines = text.split('\n')
 truecost = 0
 for line in text[0:-1]:
    truecost += (maxline - len(line))**3


   #print the output and cost
   print >>log, '----------------------'
   print >>log, source
   print >>log, '----------------------'
   print >>log, text
   print >>log, '----------------------'
   print >>log, 'cost = ', cost
   print >>log, 'true cost = ', truecost
   print >>log, '----------------------'


log.close()

#print the log
with open('output.log') as f: print f.read()

printing.py

def print_neatly(wordlist, max):
   #strings='This is a sample string'

   #splitting the string and taking out words from it 
   #wordlist=strings.split()
   (cost, dyn_print) = print_line(wordlist, len(wordlist), max)
   for dyn in dyn_print:
      print dyn
   return cost, dyn_print

 def cost(lines, max):

    return sum([(max-len(x)) ** 3 for x in lines])

 def print_line(wordlist, count, max, results = {}):
  results = [([],0)]
  for count in range(1, len(wordlist) + 1):
    best = wordlist[:count]               
    best_cost = cost(best, max)
    mycount = count - 1
    line = wordlist[mycount]       
    while len(line) <= max: 
        attempt, attempt_cost = results[mycount]
        attempt = attempt + [line]
        attempt_cost += cost([line],max)
        if attempt_cost < best_cost:
            best = attempt
            best_cost = attempt_cost
        if mycount > 0:
            mycount -= 1
            line = wordlist[mycount] + ' ' + line
        else:
            break
    results += [(best, best_cost)]

 #print best
 #print best_cost
 return (best_cost, best)


#print_neatly(0,7)

需要测试的文本文件给我这个输出，这里两个成本应该是相同的，但我没有得到，能有人指出我哪里出错了吗？

cost = 16036

true cost = 15911

代码优化字符串处理文本格式化算法逻辑推理动态规划成本计算贪心算法

3 个回答

这个算法的基础假设是，如果我们知道文本中最后 N-1、N-2、...、2、1 个单词的最佳解决方案，那么就很容易为 N 个单词构建出最佳解决方案。通过记忆化技术，我们可以避免对相同输入重复计算 best_partition() 的结果：

import functools

def wrap(text, width):
    """
    >>> wrap('This is a sample text', 7)
    ['This', 'is a', 'sample', 'text']
    """
    return [' '.join(line) for line in best_partition(
        tuple(text.split()), functools.partial(cost, width=width))]

def best_partition(words, cost):
    """The best partition of words into lines according to the cost function."""
    best = [words] # start with all words on a single line
    for i in reversed(range(1, len(words))): # reverse to avoid recursion limit
        lines = [words[:i]] + best_partition(words[i:], cost)
        if cost(lines) < cost(best):
            best = lines
    return best

def memoize(func):
    cache = {}
    @functools.wraps(func)
    def wrapper(*args):
        try: return cache[args]
        except KeyError:
            ret = cache[args] = func(*args)
            return ret
    return wrapper

best_partition = memoize(best_partition)

其中 cost() 是：

def linelen(words):
    """Number of characters in a line created from words."""
    if not words: return 0
    # words + spaces between them
    return sum(map(len, words)) + len(words) - 1

def cost(lines, width):
    """
    - each line except last costs `(width - w)**3`, where `w` is the
      line width

    - cost is infinite if `w > width` and the line has more than one word

    >>> cost([['a'], ['b']], 1)
    0
    >>> cost([['a','b']], 1)
    inf
    >>> cost([['a'], ['b']], 3)
    8
    >>> cost([['a', 'b']], 2)
    inf
    """
    if not lines: return 0
    s = 0
    for i, words in enumerate(lines, 1):
        w = linelen(words)
        if width >= w:
            if i != len(lines): # last line has zero cost
                s += (width - w)**3
        elif len(words) != 1: # more than one word in the line
            return float("inf") # penalty for w > width
    return s

示例：

print('\n'.join(wrap("""
    In olden times when wishing still helped one, there lived a king whose
    daughters were all beautiful, but the youngest was so beautiful that
    the sun itself, which has seen so much, was astonished whenever it
    shone in her face. Close by the king's castle lay a great dark forest,
    and under an old lime-tree in the forest was a well, and when the day
    was very warm, the king's child went out into the forest and sat down
    by the side of the cool fountain, and when she was bored she took a
    golden ball, and threw it up on high and caught it, and this ball was
    her favorite plaything.
    """, int(sys.argv[1]) if len(sys.argv) > 1 else 70)))

输出

In olden times when wishing still helped one, there lived a king whose
daughters were all beautiful, but the youngest was so beautiful that
the sun itself, which has seen so much, was astonished whenever it
shone in her face. Close by the king's castle lay a great dark forest,
and under an old lime-tree in the forest was a well, and when the day
was very warm, the king's child went out into the forest and sat down
by the side of the cool fountain, and when she was bored she took a
golden ball, and threw it up on high and caught it, and this ball was
her favorite plaything.

回答于 2025-04-17 由 Python大师

分享举报

如果有一种“最佳”方式来把一个词、两个词等排成行，这种方式不会因为后面出现的行而改变。它可能会因为后面出现的词而改变，如果这些词足够小，可以和其他词放在同一行。但如果我们单独考虑这些词并试图把它们排成行，那么同一组解决方案总是最优的。（可能会有等效的答案；例如，按照标准，“cats in hats”在7个字符的行上有两个解决方案。两个都是“最佳”的，永远都是——我们可以选择其中一个并坚持使用，而不会影响正确性。）

"This" 总是最好放成 ["This"]。（注意，我并不是说它总是最好单独放在一行！我想说的是，如果你只有一个词，最好的方式就是把它放在一行。）
"This is" 可以排成 ["This", "is"] 或 ["This is"]。不过，后者是最好的。所以从现在开始，每当我们只考虑这两个词时，我们可以完全忽略 ["This", "is"]——它永远不会更好。
"This is a" 可以排成 ~~["This", "is", "a"],~~ ["This is", "a"] 或 ["This", "is a"]。（我们已经知道 ["This is"] 比 ["This", "is"] 更好——见上一个要点！）结果是 ["This", "is a"] 是最好的。所以从现在开始，我们可以忽略 ["This is", "a"]。
"This is a sample" 可以排成：
- ~~["This", "is", "a", "sample"]~~ （见要点#2——我们甚至不需要考虑这个）
- ~~["This is", "a", "sample"]~~ （见要点#3）
- ["This", "is a", "sample"]

我不懂Python；我只是把这个拼凑在一起。所以如果它看起来“不够Pythonic”或者其他什么，请多多包涵。:P

def cost(lines, limit):
    # figures the cost of the current arrangement of words in lines.
    return sum([(limit-len(x)) ** 3 for x in lines])


def lineify(words, limit):
    # splits up words into lines of at most (limit) chars.
    # should find an optimal solution, assuming all words are < limit chars long

    results = [([], 0)]

    for count in range(1, len(words) + 1):
        best = words[:count]         # (start off assuming one word per line)
        best_cost = cost(best, limit)
        mycount = count - 1
        line = words[mycount]        # start with one word

        while len(line) <= limit:
            # figure the optimal cost, assuming the other words are on another line
            attempt, attempt_cost = results[mycount]
            attempt = attempt + [line]
            attempt_cost += cost([line],limit)
            # print attempt
            if attempt_cost < best_cost:
                best = attempt
                best_cost = attempt_cost

            # steal another word.  if there isn't one, we're done
            if mycount > 0:
                mycount -= 1
                line = words[mycount] + ' ' + line
            else:
                break

        # once we have an optimal result for (count) words, save it for posterity
        results += [(best, best_cost)]

    return results[len(words)][0]


def wrap(phrase, limit):
    # helper function...so the caller doesn't have to pass an array of words.
    # they shouldn't need to know to do that
    words = phrase.split()
    return lineify(words, limit)

我最开始有一个递归的解决方案，但结果发现Python对递归有一些限制，这让它在处理较大文本和现实世界的长度限制时不太适用。（你必须回溯到开始，才能进行记忆化处理，如果我有超过1000个词，就会触及递归限制。这可以通过从足够的词开始填满最后一行来扩展，但仍然会限制最大值为原始限制的某个倍数。）我发现自己使用了一种技巧来逐步构建结果，直到递归限制不再是问题。不过，如果你必须这样做，那可能说明递归本身就是个问题。

回答于 2025-04-17 由 Python大师

分享举报

一种方法是列出所有可能的选择，然后挑选出成本最低的那个：

from functools import wraps

def cache(origfunc):
    d = {}
    @wraps(origfunc)
    def wrapper(*args):
        if args in d:
            return d[args]
        result = origfunc(*args)
        d[args] = result
        return result
    return wrapper

@cache
def alternatives(t, m=7):
    ''' Given a tuple of word lengths and a maximum line length,
        return a list of all possible line groupings
        showing the total length of each line.

        >>> alternatives((4, 2, 1, 3), 7)
        [[4, 2, 1, 3], [4, 2, 5], [4, 4, 3], [7, 1, 3], [7, 5]]

    '''
    if not t:
        return []
    alts = []
    s = 0
    for i, x in enumerate(t):
        s += x
        if s > m:
            break
        tail = t[i+1:]
        if not tail:
            alts.append([s])
            break
        for subalt in alternatives(tail, m):
            alts.append([s] + subalt)
        s += 1
    return alts

def cost(t, m=7):
    ''' Evaluate the cost of lines given to line lengths

            >>> cost((7, 1, 6, 4), m=7)  # 'this is', 'a', 'sample', 'text'
            244
            >>> cost((4, 4, 6, 4))       # 'this', 'is a', 'sample', 'text'
            82

    '''
    return sum((m - x) ** 3 for x in t)

def textwrap(s, m=7):
    ''' Given a string, result a list of strings with optimal line wrapping

        >>> print textwrap('This is a sample text', 7)
        ['This', 'is a', 'sample', 'text']

    '''
    words = s.split()
    t = tuple(map(len, words))
    lengths = min(alternatives(t, m), key=cost)
    result = []
    worditer = iter(words)
    for length in lengths:
        line = []
        s = 0
        while s < length:
            word = next(worditer)
            line.append(word)
            s += len(word) + 1
        result.append(' '.join(line))
    return result


if __name__ == '__main__':
    import doctest
    print doctest.testmod()

为了加快代码的运行速度，可以限制搜索的选择数量（比如每行只考虑三个最长的选择）。

回答于 2025-04-17 由 Python大师

分享举报

尝试在多条固定长度的行上打印单个字符串并尽量减少成本

true cost = 15911

3 个回答

示例：

输出

撰写回答