Python中MD5哈希返回不同结果
为了完成一个课堂作业,我需要读取一个文件的内容,计算它的MD5哈希值,然后把这个哈希值存储到一个单独的文件里。接着,我还需要通过比较MD5哈希值来检查文件的完整性。我对Python和JSON还比较陌生,所以我想借这个作业的机会来学习这些东西,而不是用我已经熟悉的内容。
总之,我的程序可以从一个文件中读取内容,创建哈希值,并把这个哈希值存储到一个JSON文件里,这些都没问题。问题出现在完整性检查上。当我返回计算出的文件哈希值时,它和JSON文件中记录的值不一样,尽管文件没有被修改。下面是发生的情况的例子,我也贴出了我的代码。提前感谢大家的帮助。
举个例子:这是我的JSON文件的内容
内容: b'I made a file to test the md5\n'
哈希值: 1e8f4e6598be2ea2516102de54e7e48e
当我尝试检查完全相同的文件的完整性时(没有做任何修改),返回的结果是:
内容: b'I made a file to test the md5\n'
哈希值: ef8b7bf2986f59f8a51aae6b496e8954
import hashlib
import json
import os
import fnmatch
from codecs import open
#opens the file, reads/encodes it, and returns the contents (c)
def read_the_file(f_location):
with open(f_location, 'r', encoding="utf-8") as f:
c = f.read()
f.close()
return c
def scan_hash_json(directory_content):
for f in directory_content:
location = argument + "/" + f
content = read_the_file(location)
comp_hash = create_hash(content)
json_obj = {"Directory": argument, "Contents": {"filename": str(f),
"original string": str(content), "md5": str(comp_hash)}}
location = location.replace(argument, "")
location = location.replace(".txt", "")
write_to_json(location, json_obj)
#scans the file, creates the hash, and writes it to a json file
def read_the_json(f):
f_location = "recorded" + "/" + f
read_json = open(f_location, "r")
json_obj = json.load(read_json)
read_json.close()
return json_obj
#check integrity of the file
def check_integrity(d_content):
#d_content = directory content
for f in d_content:
json_obj = read_the_json(f)
text = f.replace(".json", ".txt")
result = find(text, os.getcwd())
content = read_the_file(result)
comp_hash = create_hash(content)
print("content: " + str(content))
print(result)
print(json_obj)
print()
print("Json Obj: " + json_obj['Contents']['md5'])
print("Hash: " + comp_hash)
#find the file being searched for
def find(pattern, path):
result = ""
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
result = os.path.join(root, name)
return result
#create a hash for the file contents being passed in
def create_hash(content):
h = hashlib.md5()
key_before = "reallyBad".encode('utf-8')
key_after = "hashKeyAlgorithm".encode('utf-8')
content = content.encode('utf-8')
h.update(key_before)
h.update(content)
h.update(key_after)
return h.hexdigest()
#write the MD5 hash to the json file
def write_to_json(arg, json_obj):
arg = arg.replace(".txt", ".json")
storage_location = "recorded/" + str(arg)
write_file = open(storage_location, "w")
json.dump(json_obj, write_file, indent=4, sort_keys=True)
write_file.close()
#variable to hold status of user (whether they are done or not)
working = 1
#while the user is not done, continue running the program
while working == 1:
print("Please input a command. For help type 'help'. To exit type 'exit'")
#grab input from user, divide it into words, and grab the command/option/argument
request = input()
request = request.split()
if len(request) == 1:
command = request[0]
elif len(request) == 2:
command = request[0]
option = request[1]
elif len(request) == 3:
command = request[0]
option = request[1]
argument = request[2]
else:
print("I'm sorry that is not a valid request.\n")
continue
#if user inputs command 'icheck'...
if command == 'icheck':
if option == '-l':
if argument == "":
print("For option -l, please input a directory name.")
continue
try:
dirContents = os.listdir(argument)
scan_hash_json(dirContents)
except OSError:
print("Directory not found. Make sure the directory name is correct or try a different directory.")
elif option == '-f':
if argument == "":
print("For option -f, please input a file name.")
continue
try:
contents = read_the_file(argument)
computedHash = create_hash(contents)
jsonObj = {"Directory": "Default", "Contents": {
"filename": str(argument), "original string": str(contents), "md5": str(computedHash)}}
write_to_json(argument, jsonObj)
except OSError:
print("File not found. Make sure the file name is correct or try a different file.")
elif option == '-t':
try:
dirContents = os.listdir("recorded")
check_integrity(dirContents)
except OSError:
print("File not found. Make sure the file name is correct or try a different file.")
elif option == '-u':
print("gonna update stuff")
elif option == '-r':
print("gonna remove stuff")
#if user inputs command 'help'...
elif command == 'help':
#display help screen
print("Integrity Checker has a few options you can use. Each option "
"must begin with the command 'icheck'. The options are as follows:")
print("\t-l <directory>: Reads the list of files in the directory and computes the md5 for each one")
print("\t-f <file>: Reads a specific file and computes its md5")
print("\t-t: Tests integrity of the files with recorded md5s")
print("\t-u <file>: Update a file that you have modified after its integrity has been checked")
print("\t-r <file>: Removes a file from the recorded md5s\n")
#if user inputs command 'exit'
elif command == 'exit':
#set working to zero and exit program loop
working = 0
#if anything other than 'icheck', 'help', and 'exit' are input...
else:
#display error message and start over
print("I'm sorry that is not a valid command.\n")
2 个回答
我看到你可能面临两个问题:
- 哈希计算是基于字符串的二进制表示来进行的。
- 除非你只使用ASCII编码,否则同一个国际字符,比如č,在UTF-8或Unicode编码中有不同的表示方式。
需要考虑的事项:
- 如果你需要使用UTF-8或Unicode,建议在保存内容或计算哈希之前,先对内容进行规范化。
- 为了测试,可以比较内容的二进制表示。
在输入输出操作中只使用UTF-8,
codecs.open
会为你处理所有的转换。示例代码:
from codecs import open with open('yourfile', 'r', encoding="utf-8") as f: decoded_content = f.read()
你在这个方法里是在哪里定义 h,也就是用到的 md5 对象呢?
#create a hash for the file contents being passed in
def create_hash(content):
key_before = "reallyBad".encode('utf-8')
key_after = "hashKeyAlgorithm".encode('utf-8')
print("Content: " + str(content))
h.update(key_before)
h.update(content)
h.update(key_after)
print("digest: " + str(h.hexdigest()))
return h.hexdigest()
我怀疑你在调用 create_hash 这个函数的时候,实际上是调用了两次,但用的是同一个 md5 对象。这样的话,第二次调用的时候,你实际上是在对“reallyBad*文件内容*哈希键算法reallyBad*文件内容*哈希键算法”进行哈希处理。你应该在 create_hash 里面创建一个新的 md5 对象,这样才能避免这个问题。
编辑:在我做了这个修改后,你的程序是这样运行的:
Please input a command. For help type 'help'. To exit type 'exit'
icheck -f ok.txt Content: this is a test
digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For
help type 'help'. To exit type 'exit' icheck -t Content: this is a
test
digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For
help type 'help'. To exit type 'exit'
编辑 #2: 你的 scan_hash_json 函数在最后也有一个错误。你在去掉文件名的 .txt 后缀,然后调用 write_to_json:
def scan_hash_json(directory_content):
...
location = location.replace(".txt", "")
write_to_json(location, json_obj)
但是,write_to_json 这个函数是期望文件名以 .txt 结尾的:
def write_to_json(arg, json_obj):
arg = arg.replace(".txt", ".json")
如果你修复了这个问题,我觉得它应该能按预期工作了……