使用文本换行换行带字节计数

2条回答

网友

1楼 · 编辑于 2024-05-14 17:52:20

结果取决于使用的编码，因为每个字符是编码的函数，在许多编码中，字符是还有性格。我假设我们使用的是UTF-8，其中'☺'是编码为e298ba，长度为3个字节；给定的示例是与这个假设一致。在

textwrap中的所有内容都对字符有效；它什么都不知道关于编码。一种解决方法是将输入字符串转换为另一种格式，每个字符都变成一个字符串其长度与字节长度成正比。我用三个字符：两个用于十六进制字节，另一个用于控制换行。因此：

'a' -> '61x'         non-breaking
' ' -> '20 '         breaking
'☺' -> 'e2x98xbax'   non-breaking

为了简单起见，我假设我们只在空格上断开，而不是在制表符或任何地方其他角色。在

^{pr2}$

网友

2楼 · 编辑于 2024-05-14 17:52:20

最后我重写了textwrap的一部分，在它分割字符串后对单词进行编码。在

与Tom的解决方案不同，Python代码不需要遍历每个字符。在

def byteTextWrap(text, size, break_long_words=True):
    """Similar to textwrap.wrap(), but considers the size of strings (in bytes)
    instead of their length (in characters)."""
    try:
        words = textwrap.TextWrapper()._split_chunks(text)
    except AttributeError: # Python 2
        words = textwrap.TextWrapper()._split(text)
    words.reverse() # use it as a stack
    if sys.version_info[0] >= 3:
        words = [w.encode() for w in words]
    lines = [b'']
    while words:
        word = words.pop(-1)
        if len(word) > size:
            words.append(word[size:])
            word = word[0:size]
        if len(lines[-1]) + len(word) <= size:
            lines[-1] += word
        else:
            lines.append(word)
    if sys.version_info[0] >= 3:
        return [l.decode() for l in lines]
    else:
        return lines

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用文本换行换行带字节计数

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >