Python中MD5哈希返回不同结果

0 投票
2 回答
3393 浏览
提问于 2025-04-18 03:10

为了完成一个课堂作业,我需要读取一个文件的内容,计算它的MD5哈希值,然后把这个哈希值存储到一个单独的文件里。接着,我还需要通过比较MD5哈希值来检查文件的完整性。我对Python和JSON还比较陌生,所以我想借这个作业的机会来学习这些东西,而不是用我已经熟悉的内容。

总之,我的程序可以从一个文件中读取内容,创建哈希值,并把这个哈希值存储到一个JSON文件里,这些都没问题。问题出现在完整性检查上。当我返回计算出的文件哈希值时,它和JSON文件中记录的值不一样,尽管文件没有被修改。下面是发生的情况的例子,我也贴出了我的代码。提前感谢大家的帮助。

举个例子:这是我的JSON文件的内容

内容: b'I made a file to test the md5\n'

哈希值: 1e8f4e6598be2ea2516102de54e7e48e

当我尝试检查完全相同的文件的完整性时(没有做任何修改),返回的结果是:

内容: b'I made a file to test the md5\n'

哈希值: ef8b7bf2986f59f8a51aae6b496e8954

import hashlib
import json
import os
import fnmatch
from codecs import open


#opens the file, reads/encodes it, and returns the contents (c)
def read_the_file(f_location):
    with open(f_location, 'r', encoding="utf-8") as f:
        c = f.read()

    f.close()
    return c


def scan_hash_json(directory_content):
    for f in directory_content:
        location = argument + "/" + f
        content = read_the_file(location)
        comp_hash = create_hash(content)
        json_obj = {"Directory": argument, "Contents": {"filename": str(f),
                                                        "original string": str(content), "md5": str(comp_hash)}}
        location = location.replace(argument, "")
        location = location.replace(".txt", "")
        write_to_json(location, json_obj)


#scans the file, creates the hash, and writes it to a json file
def read_the_json(f):
    f_location = "recorded" + "/" + f
    read_json = open(f_location, "r")
    json_obj = json.load(read_json)
    read_json.close()
    return json_obj


#check integrity of the file
def check_integrity(d_content):
    #d_content = directory content
    for f in d_content:
        json_obj = read_the_json(f)
        text = f.replace(".json", ".txt")
        result = find(text, os.getcwd())
        content = read_the_file(result)
        comp_hash = create_hash(content)
        print("content: " + str(content))
        print(result)
        print(json_obj)
        print()
        print("Json Obj: " + json_obj['Contents']['md5'])
        print("Hash: " + comp_hash)


#find the file being searched for
def find(pattern, path):
    result = ""
    for root, dirs, files in os.walk(path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                result = os.path.join(root, name)
    return result


#create a hash for the file contents being passed in
def create_hash(content):
    h = hashlib.md5()
    key_before = "reallyBad".encode('utf-8')
    key_after = "hashKeyAlgorithm".encode('utf-8')
    content = content.encode('utf-8')
    h.update(key_before)
    h.update(content)
    h.update(key_after)
    return h.hexdigest()


#write the MD5 hash to the json file
def write_to_json(arg, json_obj):
    arg = arg.replace(".txt", ".json")
    storage_location = "recorded/" + str(arg)
    write_file = open(storage_location, "w")
    json.dump(json_obj, write_file, indent=4, sort_keys=True)
    write_file.close()

#variable to hold status of user (whether they are done or not)
working = 1
#while the user is not done, continue running the program
while working == 1:
    print("Please input a command. For help type 'help'. To exit type 'exit'")

    #grab input from user, divide it into words, and grab the command/option/argument
    request = input()
    request = request.split()

    if len(request) == 1:
        command = request[0]
    elif len(request) == 2:
        command = request[0]
        option = request[1]
    elif len(request) == 3:
        command = request[0]
        option = request[1]
        argument = request[2]
    else:
        print("I'm sorry that is not a valid request.\n")
        continue

    #if user inputs command 'icheck'...
    if command == 'icheck':
        if option == '-l':
            if argument == "":
                print("For option -l, please input a directory name.")
                continue

            try:
                dirContents = os.listdir(argument)
                scan_hash_json(dirContents)

            except OSError:
                print("Directory not found. Make sure the directory name is correct or try a different directory.")

        elif option == '-f':
            if argument == "":
                print("For option -f, please input a file name.")
                continue

            try:
                contents = read_the_file(argument)
                computedHash = create_hash(contents)
                jsonObj = {"Directory": "Default", "Contents": {
                    "filename": str(argument), "original string": str(contents), "md5": str(computedHash)}}

                write_to_json(argument, jsonObj)
            except OSError:
                print("File not found. Make sure the file name is correct or try a different file.")

        elif option == '-t':
            try:
                dirContents = os.listdir("recorded")
                check_integrity(dirContents)
            except OSError:
                print("File not found. Make sure the file name is correct or try a different file.")

        elif option == '-u':
            print("gonna update stuff")
        elif option == '-r':
            print("gonna remove stuff")

    #if user inputs command 'help'...
    elif command == 'help':
        #display help screen
        print("Integrity Checker has a few options you can use. Each option "
              "must begin with the command 'icheck'. The options are as follows:")
        print("\t-l <directory>: Reads the list of files in the directory and computes the md5 for each one")
        print("\t-f <file>: Reads a specific file and computes its md5")
        print("\t-t: Tests integrity of the files with recorded md5s")
        print("\t-u <file>: Update a file that you have modified after its integrity has been checked")
        print("\t-r <file>: Removes a file from the recorded md5s\n")

    #if user inputs command 'exit'
    elif command == 'exit':
        #set working to zero and exit program loop
        working = 0

    #if anything other than 'icheck', 'help', and 'exit' are input...
    else:
        #display error message and start over
        print("I'm sorry that is not a valid command.\n")

2 个回答

0

我看到你可能面临两个问题:

  1. 哈希计算是基于字符串的二进制表示来进行的。
  2. 除非你只使用ASCII编码,否则同一个国际字符,比如č,在UTF-8或Unicode编码中有不同的表示方式。

需要考虑的事项:

  1. 如果你需要使用UTF-8或Unicode,建议在保存内容或计算哈希之前,先对内容进行规范化
  2. 为了测试,可以比较内容的二进制表示。
  3. 在输入输出操作中只使用UTF-8,codecs.open会为你处理所有的转换。

    示例代码:

    from codecs import open
    with open('yourfile', 'r', encoding="utf-8") as f:
      decoded_content = f.read()
0

你在这个方法里是在哪里定义 h,也就是用到的 md5 对象呢?

 #create a hash for the file contents being passed in
 def create_hash(content):
     key_before = "reallyBad".encode('utf-8')
     key_after = "hashKeyAlgorithm".encode('utf-8')
     print("Content: " + str(content))
     h.update(key_before)
     h.update(content)
     h.update(key_after)
     print("digest: " + str(h.hexdigest()))
     return h.hexdigest()

我怀疑你在调用 create_hash 这个函数的时候,实际上是调用了两次,但用的是同一个 md5 对象。这样的话,第二次调用的时候,你实际上是在对“reallyBad*文件内容*哈希键算法reallyBad*文件内容*哈希键算法”进行哈希处理。你应该在 create_hash 里面创建一个新的 md5 对象,这样才能避免这个问题。

编辑:在我做了这个修改后,你的程序是这样运行的:

 Please input a command. For help type 'help'. To exit type 'exit'
 icheck -f ok.txt Content: this is a test

 digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For
 help type 'help'. To exit type 'exit' icheck -t Content: this is a
 test

 digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For
 help type 'help'. To exit type 'exit'

编辑 #2: 你的 scan_hash_json 函数在最后也有一个错误。你在去掉文件名的 .txt 后缀,然后调用 write_to_json:

def scan_hash_json(directory_content):
        ...
        location = location.replace(".txt", "")
        write_to_json(location, json_obj)

但是,write_to_json 这个函数是期望文件名以 .txt 结尾的:

def write_to_json(arg, json_obj):
    arg = arg.replace(".txt", ".json")

如果你修复了这个问题,我觉得它应该能按预期工作了……

撰写回答