读取（压缩的）fi

1条回答

网友

1楼 · 发布于 2024-04-20 02:12:50

您的方法确实不是压缩文本文件的有效方法，只需使用现有的zlib。你知道吗

但是，对于学术练习，您需要使用pickle来存储字典键，以便在恢复它时获得相同的值。由于希望“压缩”窗体在两次调用之间存在，以便可以成功地解压缩以前的“压缩”文件，因此需要为每个单词分配一个索引。如果您想要一个“标准”python方法，可以使用collections中的OrderedDict来创建索引，这样就可以在末尾添加新词，但与传统dict对象不同的是，旧的dict对象保留其位置。一个更好的方法是OrderedSet，但这不是标准python中的方法，请参见this recipe。你知道吗

案例
你还必须决定“THIS”、“THIS”和“THIS”是不同的词还是同一个词。可能每个单词标记都需要一个位字段来指示每个字符是小写还是大写，例如，“ThIs”得到一个标记15，但位字段是5“0x1010”，在压缩文件中产生一个元组（15,5）。你知道吗

标点符号
你还需要考虑标点符号，当一个单词因此被标点时，你需要一种用压缩形式表示的方法，标点符号的符号。但这有个问题。当你解压的时候，你需要准确地重建原稿，所以要处理标点符号。e、 g.“是这样吗？”->；[1,2,3,4]->；“是否正确？”或者“这是正确的吗？”没有空间。因此，对于每个标点符号，你需要指出它是如何连接到上一个和下一个字符的，例如。由于标点符号只有一个字符（即一个8位数字），您可能需要考虑将字符按原样放置。你知道吗

多个空格
您还需要处理多个空格。你知道吗

示例代码
这段代码不完整，大部分未经测试，可能无法处理所有用例，但它说明了一种可能的解决方案。你知道吗

要使用它，请创建一个名为在.txt中包含要压缩的文本，然后运行 python复合材料-c级在.txt中外部公司或 python复合材料-d级外部公司顺序文件或 python复合材料列表

from ordered_set import OrderedSet #pip install ordered_set
import os
import cPickle as pickle
import string
import argparse

class CompDecomp(object):
  __DEFAULT_PICKLE_FN__ = "my.dict"

  printable_non_chars = set(string.printable) - set(string.digits) - set(string.ascii_letters)

  def __init__(self, fn=None, *args, **kw):
    if fn is None:
      self.fn = self.__DEFAULT_PICKLE_FN__
    else:
      self.fn = fn

    self.dict = self.loaddict()

  def loaddict(self):
    if os.path.exists(self.fn):
      pkl = open(self.fn, "rb")
      d = pickle.load(pkl)
      pkl.close()
    else:
      d = OrderedSet()
    return d

  def savedict(self):
      pkl = open(self.fn, "wb")
      pickle.dump(self.dict, pkl)
      pkl.close()

  def compressword(self, word, conjoin=False):
    if word.lower() not in self.dict:
      self.dict.append(word.lower())
      print "New word: \'%s\'" % word
      self.savedict()
    index, flag, _ = self.__caseflag__(word, conjoin)
    #print index, bin(flag)[2:].zfill(len(word)), conjoin
    return index, flag, conjoin

  def decompressword(self, index, caseflag=0, conjoin=False):
    if isinstance(index, int):
      word = self.dict[index]
    else:
      word = index
    if caseflag == 0:
      return word, conjoin
    flag = bin(caseflag)[2:].zfill(len(word))
    res = ""
    for n, c in enumerate(word):
      if flag[n] == '1':
        res += c.upper()
      else:
        res += c.lower()
    return res, conjoin

  def __caseflag__(self, word, conjoin):
    index = self.dict.index(word.lower())
    if word.lower() == word:
      #Word is all lowercase
      return (index,0, conjoin)
    if word.upper() == word:
      #Word is all uppercase
      return index, int("1" * len(word), 2), conjoin
    res = ""
    for c in word:
      if c in string.uppercase:
        res += "1"
      else:
        res += "0"
    return index, int(res, 2), conjoin

  def compressfile(self, fileobj):
    with fileobj as f:
      data = f.read(-1)
      f.close()

    words = data.split(" ")

    compress = []
    for word in words:
      #Handle multiple spaces
      if word == "":
        compress.append(" ")
        continue

      #Handle puntuation, treat apostrophied words as new words
      substr = []
      p1 = 0
      csplit = word.translate(None, string.ascii_letters+'\'')
      for n, c in enumerate(csplit):
        subword, word = word.split(c, 1)
        compress.append(self.compressword(subword, True if n > 0 else False))
        compress.append((c, 0, True))

      #Handle words
      if len(word) and not len(csplit):
        compress.append(self.compressword(word))
    return compress

  def decompressfile(self, fileobj):
    data = pickle.load(fileobj)

    decomp = ""
    for v in data:
      if not isinstance(v,tuple):
        print "Bad data %s" % v
        continue
      if len(v) > 0 and len(v) <= 3:
        d, conjoin = self.decompressword(*v)
        if len(decomp):
          decomp += "" if conjoin else " "
        decomp += d
      else:
        print "Bad data %s (length %d)" % (v, len(v))
    return decomp


if __name__ == "__main__":
  parser = argparse.ArgumentParser(description='Test file compress / decompress')

  group = parser.add_mutually_exclusive_group()
  parser.add_argument('infile', nargs='?', default=None)
  parser.add_argument('outfile', nargs='?', default=None)
  group.add_argument('-compress', action='store_true')
  group.add_argument('-decompress', action='store_true')
  group.add_argument(' list', action='store_true')

  args = parser.parse_args()

  cd = CompDecomp()

  #Invocation
  #python dictcompress.py [-h|-c|-d| list] [<infile>] [<outfile>]
  infile, outfile = args.infile, args.outfile

  if infile is not None and not os.path.exists(infile):
    print "Input file missing"

  if outfile is not None:
    of = open(outfile, "wb")
  else:
    of = None

  if not args.list:
    if args.compress:
      print "Compress"
      pickle.dump(cd.compressfile(open(infile, "r")), of)

    if args.decompress:
      print "Decompress"
      of.write(cd.decompressfile(open(infile, "r")))
  else:
    for k in cd.dict:
      print k

  if of is not None:
    of.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章