如何优化二进制文件操作?

2024-04-20 03:04:26 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的密码:

def decode(filename):

    with open(filename, "rb") as binary_file:
        # Read the whole file at once
        data = bytearray( binary_file.read())

    for i in range(len(data)):
        data[i] = 0xff - data[i]

    with open("out.log", "wb") as out:
        out.write(data)

我有一个大约10MB的文件,我需要通过翻转每一位来翻译这个文件,并将它保存到一个新文件中。你知道吗

使用我的代码翻译一个10MB的文件大约需要1秒,而使用C只需要不到1ms的时间

这是我的第一个python脚本。我不知道使用bytearray是否正确。最耗时的代码是bytearray的循环。你知道吗


Tags: 文件代码密码datadefaswithopen
1条回答
网友
1楼 · 发布于 2024-04-20 03:04:26

如果使用numpy库是一个选项,那么使用它会更快,因为它可以通过一条语句对所有字节执行操作。在纯Python中对相对较大的数据量执行字节级操作,与使用类似于numpy的模块(该模块是用C实现的,并针对数组处理进行了优化)相比,速度会相对较慢。你知道吗

尽管在Python 2中没有在Python 3中那么多(参见下面的结果)。

下面是我设置的一个框架,用它与您的问题中的代码进行基准测试。这看起来可能有很多代码,但大部分只是用于进行性能比较的脚手架的一部分。你知道吗

我鼓励其他回答这个问题的人也利用它。你知道吗

from __future__ import print_function
from collections import namedtuple
import os
import sys
from random import randrange
from textwrap import dedent
from tempfile import NamedTemporaryFile
import timeit
import traceback


N = 1  # Number of executions of each "algorithm".
R = 3  # Number of repetitions of those N executions.

UNITS = 1024 * 1024  # MBs
FILE_SIZE = 10 * UNITS

# Create test files. Must be done here at module-level to allow file
# deletions at end.
with NamedTemporaryFile(mode='wb', delete=False) as inp_file:
    FILE_NAME_IN = inp_file.name
    print('Creating temp input file: "{}", length {:,d}'.format(FILE_NAME_IN, FILE_SIZE))
    inp_file.write(bytearray(randrange(256) for _ in range(FILE_SIZE)))

with NamedTemporaryFile(mode='wb', delete=False) as out_file:
    FILE_NAME_OUT = out_file.name
    print('Creating temp output file: "{}"'.format(FILE_NAME_OUT))


# Common setup for all testcases (executed prior to any Testcase specific setup).
COMMON_SETUP = dedent("""
    from __main__ import FILE_NAME_IN, FILE_NAME_OUT
""")

class Testcase(namedtuple('CodeFragments', ['setup', 'test'])):
    """ A test case is composed of separate setup and test code fragments. """
    def __new__(cls, setup, test):
        """ Dedent code fragment in each string argument. """
        return tuple.__new__(cls, (dedent(setup), dedent(test)))

testcases = {
    "user3181169": Testcase("""
        def decode(filename, out_filename):
            with open(filename, "rb") as binary_file:
                # Read the whole file at once
                data = bytearray(binary_file.read())

            for i in range(len(data)):
                data[i] = 0xff - data[i]

            with open(out_filename, "wb") as out:
                out.write(data)

        """, """
        decode(FILE_NAME_IN, FILE_NAME_OUT)
        """
    ),

    "using numpy": Testcase("""
        import numpy as np

        def decode(filename, out_filename):
            with open(filename, 'rb') as file:
                data = np.frombuffer(file.read(), dtype=np.uint8)

            # Applies mathematical operation to entire array.
            data = 0xff - data

            with open(out_filename, "wb") as out:
                out.write(data)
        """, """
        decode(FILE_NAME_IN, FILE_NAME_OUT)
        """,
    ),
}

# Collect timing results of executing each testcase multiple times.
try:
    results = [
        (label,
         min(timeit.repeat(testcases[label].test,
                           setup=COMMON_SETUP + testcases[label].setup,
                           repeat=R, number=N)),
        ) for label in testcases
    ]
except Exception:
    traceback.print_exc(file=sys.stdout)  # direct output to stdout
    sys.exit(1)

# Display results.
major, minor, micro = sys.version_info[:3]
bitness = 64 if sys.maxsize > 2**32 else 32
print('Fastest to slowest execution speeds using ({}-bit) Python {}.{}.{}\n'
      '({:,d} execution(s), best of {:d} repetition(s)'.format(
            bitness, major, minor, micro, N, R))
print()

longest = max(len(result[0]) for result in results)  # length of longest label
ranked = sorted(results, key=lambda t: t[1]) # ascending sort by execution time
fastest = ranked[0][1]
for result in ranked:
    print('{:>{width}} : {:9.6f} secs, relative speed: {:6,.2f}x, ({:8,.2f}% slower)'
          ''.format(
                result[0], result[1], round(result[1]/fastest, 2),
                round((result[1]/fastest - 1) * 100, 2),
                width=longest))

# Clean-up.
for filename in (FILE_NAME_IN, FILE_NAME_OUT):
    try:
        os.remove(filename)
    except FileNotFoundError:
        pass

输出(Python 3):

Creating temp input file: "T:\temp\tmpw94xdd5i", length 10,485,760
Creating temp output file: "T:\temp\tmpraw4j4qd"
Fastest to slowest execution speeds using (32-bit) Python 3.7.1
(1 execution(s), best of 3 repetition(s)

using numpy :  0.017744 secs, relative speed:   1.00x, (    0.00% slower)
user3181169 :  1.099956 secs, relative speed:  61.99x, (6,099.14% slower)

输出(Python 2):

Creating temp input file: "t:\temp\tmprk0njd", length 10,485,760
Creating temp output file: "t:\temp\tmpvcaj6n"
Fastest to slowest execution speeds using (32-bit) Python 2.7.15
(1 execution(s), best of 3 repetition(s)

using numpy :  0.017930 secs, relative speed:   1.00x, (    0.00% slower)
user3181169 :  0.937218 secs, relative speed:  52.27x, (5,126.97% slower)

相关问题 更多 >