哈希算法本质来说就是将一个元素映射成另一个元素,可以分为加密哈希函数 和 非加密哈希函数
加密哈希函数:
加密哈希函数旨在保证一系列的安全属性。它们大部分都很难发生碰撞或是被找出加密的原文,而且哈希值看起来是随机的。
加密哈希,如MD5,SHA256等,
非加密哈希函数:
只是试图避免非恶意输入的冲突。作为较弱担保的交换,它们通常更快。如果数据量小,或者不太在意哈希碰撞的频率,甚至可以选择生成哈希值小的哈希算法,占用更小的空间。
非加密哈希,如MurMurHash,CRC32,DJB等。
Smhasher-评价哈希算法的函数
评价一个哈希算法的好坏,人们通常会引用 SMHasher 测试集的运行结果。
Smhasher 测试Hash函数的功能,测试包括以下几个方面:
Sanity 是不是可以使用的
Performance 完成一个散列需要多长时间
Differentials 产生相同哈希的概率,可能导致相同的的最小差异
Keysets 分布均匀程度
一系列的测试方式具体可参考:https://github.com/aappleby/smhasher/wiki/SMHasher
是否通过了卡方检验和雪崩测试
1、Avalanche Test(雪崩测试)
这意味着输入的微小变化会导致输出发生显著变化,使其统计上看起来与随机变化没有差别。例如:MurmurHash3(“abd”,123)=454173339;MurmurHash3(“abe”,123)=4085872068
2、Chi-Squared Test(卡方检验)
均匀性:一般期望设计的哈希函数的哈希值均匀落入哈希空间。将哈希空间 n 等分, 得到 p个 哈希值, 那么平均落入每个哈希子空间的哈希值是 𝑓_0= p /n, 落入第 i个子空间的哈希值个数是𝑓_𝑖 。统计量 x^2表示𝑓_𝑖到均匀分布的偏离度。哈希函数均匀性可用卡方拟合优度检验来判断。
具体可参考:
1、https://crypto.stackexchange.com/questions/3690/why-is-sha-1-considered-broken
2、https://security.stackexchange.com/questions/11839/what-is-the-difference-between-a-hash-function-and-a-cryptographic-hash-function
Every cryptographic hash function is a hash function. But not every hash function is a cryptographic hash.
A cryptographic hash function aims to guarantee a number of security properties. Most of that it's hard to find collisions or pre-images and that the output appears random. (There are a few more properties, and "hard" has well defined bounds in this Context, but that's not important here.)
Non cryptographic hash functions just try to avoid collisions for non malicious input.Some aim to detect accidental changes in data (CRCs), others try to put objects into different buckets in a hash table with as few collisions as possible.
In exchange for weaker stories they are typically very much.
I'd still call md5 a cryptographic hashfunction, since it aimed to prevent security. But it's broken, and thus no longer usable as cryptographic hash. On the other hand when you have a non cryptographic hash function, you can not really call it "Broken", since it never tried to be secure in the first place.