MD5编码HTML,给出2种不同的结果

2022-09-28 21:09:43 发布

您现在位置:Python中文网/ 问答频道 /正文

有人能解释一下为什么会这样吗?如果我使用requests模块从一个站点抓取HTML并使用hashlib获取md5校验和,我会得到一个答案。然后,如果我将html保存为一个html文件,打开它,然后执行相同的md5校验和,它会给我一个不同的校验和

import requests
import hashlib

resp = requests.post("http://casesearch.courts.state.md.us/", timeout=120)
html = resp.text
print("CheckSum 1: " + hashlib.md5(html.encode('utf-8')).hexdigest())

f = open("test.html", "w+")
f.write(html)
f.close()

with open('test.html', "r", encoding='utf-8') as f:
    html2 = f.read()
print("CheckSum 2: " + hashlib.md5(html2.encode('utf-8')).hexdigest())

结果如下:

CheckSum 1: e0b253903327c7f68a752c6922d8b47a
CheckSum 2: 3aaf94e0df9f1298d61830d99549ddb0

Tags: testimporthtmlopenrequestsresp校验md5utfencodehashlibprintchecksumhexdigesthtml2
1条回答
网友
1楼 ·

当以文本模式读取文件时,Python可能会根据提供给opennewlines参数的值转换换行符

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated.

此差异将影响生成的哈希值