使用pycrypto通过公钥加密Python日志

Question

我正在开发一个网络应用（使用gevent，但这不是重点），需要在日志中写入一些机密信息。显而易见的想法是用一个硬编码在我应用中的公钥来加密这些机密信息。要读取这些信息，就需要一个私钥，2048位的RSA加密看起来是足够安全的。我选择了pycrypto（也试过M2Crypto，但发现对我来说几乎没有区别），并将日志加密实现为一个logging.Formatter的子类。不过，我对pycrypto和加密技术还不太熟悉，不确定我选择的加密方式是否合理。PKCS1_OAEP模块是我需要的吗？还是有更简单的加密方法，不需要把数据分成小块？

所以，我做了以下事情：

import logging
import sys

from Crypto.Cipher import PKCS1_OAEP as pkcs1
from Crypto.PublicKey import RSA

PUBLIC_KEY = """ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDe2mtK03UhymB+SrIbJJUwCPhWNMl8/gA9d7jex0ciSuFfShDaqJ4wYWG4OOl\
VqKMxPrPcZ/PMSwtc021yI8TXfgewb65H/YQw4JzzGANq2+mFT8jWRDn+xUc6vcWnXIG3OPg5DvIipGQvIPNIUUP3qE7yDHnS5xdVdFrVe2bUUXmZJ9\
0xJpyqlTuRtIgfIfEQC9cggrdr1G50tXdXZjS0M1WXl5P6599oH/ykjpDFrCnh5fz9WDwUc0mNJ+11Qh+yfDp3k7AhzhRaROKLVWnfkklFaFm7LsdVX\
KPjp7dPRcTb84c2OnlIjU0ykL74Fy0K3eaPvM6TLe/K1XuD3933 pupkin@pupkin"""

PUBLIC_KEY = RSA.importKey(PUBLIC_KEY)

LOG_FORMAT = '[%(asctime)-15s - %(levelname)s: %(message)s]'

# May be more, but there is a limit.
# I suppose, the algorithm requires enough padding,
# and size of padding depends on key length.
MAX_MSG_LEN = 128

# Size of a block encoded with padding. For a 2048-bit key seems to be OK.
ENCODED_CHUNK_LEN = 256


def encode_msg(msg):
    res = []
    k = pkcs1.new(PUBLIC_KEY)
    for i in xrange(0, len(msg), MAX_MSG_LEN):
        v = k.encrypt(msg[i : i+MAX_MSG_LEN])
        # There are nicer ways to make a readable line from data than using hex. However, using
        # hex representation requires no extra code, so let it be hex.
        res.append(v.encode('hex'))
        assert len(v) == ENCODED_CHUNK_LEN
    return ''.join(res)


def decode_msg(msg, private_key):
    msg = msg.decode('hex')
    res = []
    k = pkcs1.new(private_key)
    for i in xrange(0, len(msg), ENCODED_CHUNK_LEN):
        res.append(k.decrypt(msg[i : i+ENCODED_CHUNK_LEN]))
    return ''.join(res)


class CryptoFormatter(logging.Formatter):
    NOT_SECRET = ('CRITICAL',)
    def format(self, record):
        """
        If needed, I may encode only certain types of messages.
        """
        try:
            msg = logging.Formatter.format(self, record)
            if not record.levelname in self.NOT_SECRET:
                msg = encode_msg(logging.Formatter.format(self, record))
            return msg
        except:
            import traceback
            return traceback.format_exc()


def decrypt_file(key_fname, data_fname):
    """
    The function decrypts logs and never runs on server. In fact,
    server does not have a private key at all. The only key owner
    is server admin.
    """
    res = ''
    with open(key_fname, 'r') as kf:
        pkey = RSA.importKey(kf.read())
    with open(data_fname, 'r') as f:
        for l in f:
            l = l.strip()
            if l:
                try:
                    res += decode_msg(l, pkey) + '\n'
                except Exception: # A line may be unencrypted
                    res += l + '\n'
    return res

# Unfortunately dictConfig() does not support altering formatter class.
# Anyway, in demo code I am not going to use dictConfig().


logger = logging.getLogger()
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(CryptoFormatter(LOG_FORMAT))
logger.handlers = []
logger.addHandler(handler)

logging.warning("This is secret")
logging.critical("This is not secret")

更新：感谢下面的接受答案，现在我明白了：

我的解决方案目前看起来是相当有效的（日志条目很少，没有性能考虑，存储也比较可信）。关于安全性，我现在能做的最好的事情就是不忘记禁止运行我守护进程的用户写入程序的.py和.pyc文件。:-) 不过，如果用户的权限被破坏，他仍然可能尝试附加调试器到我的守护进程，所以我也应该禁用他的登录。虽然这些都是显而易见的事情，但非常重要。
当然，还有更可扩展的解决方案。一种非常常见的技术是用慢但可靠的RSA加密AES密钥，然后用AES加密数据，这样速度会快很多。在这种情况下，数据加密是对称的，但获取AES密钥要么是破解RSA，要么是在我的程序运行时从内存中获取。使用更高级的库进行流加密和二进制日志文件格式也是一种可行的方法，尽管以流方式加密的二进制日志格式可能非常容易受到日志损坏的影响，甚至由于电力中断导致的突然重启也可能是个问题，除非我在更低的层面做一些事情（至少在每次守护进程启动时进行日志轮换）。
我把.encode('hex')改成了.encode('base64').replace('\n').replace('\r')。幸运的是，base64编码在没有换行的情况下工作得很好，这样可以节省一些空间。
使用不可信的存储可能需要对记录进行签名，但那似乎是另一个话题。
通过捕获异常来检查字符串是否被加密是可以的，因为，除非日志被恶意用户篡改，否则是base64编码引发异常，而不是RSA解密。

异常处理加密 aes 数据保护 rsa 公钥加密日志安全认证与签名

使用pycrypto通过公钥加密Python日志

1 个回答

撰写回答