[Java8源码]HashMap的key的哈希值为什么是(h = key.hashCode()) ^ (h >>> 16)

2023-08-25 16:11 作者:nicke_lin 0人读过 | 我要投稿

源码如下：

为什么不是直接使用key.hashCode()，而是(h = key.hashCode()) ^ (h >>> 16)?

官方的解释：Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

翻译过来：table的容量都是2的N次方，如果有一组哈希值总是在当前table容量之上的bit位变化，则总是会发生hash碰撞。在同时权衡速度、效用和性能的情况下，将高位和低位进行异或让高位的变化能影响到table索引的计算

举例说明：

比如：table容量是2的4次方，即16。那么总是在table容量之上的bit位变化，就应该满足公司16 * N + X ，其中N变，X不变

如上表发生了hash碰撞，那么(h = key.hashCode()) ^ (h >>> 16)就能解决问题吗？

从上表可看出结果没有变化，原因是什么？

首先，我们需要了解h>>>16是将哈希值进行无符号右移16位

比如 h（二进制表示）= 0101 1010 1111 0000 1111 1011 0101 1001

h >>> 16之后的值：0000 0000 0000 0000 0101 1010 1111 0000

其次，我们从异或的计算规则

0 ^ 0 = 0， 1 ^ 0 = 1, 0 ^ 1 = 1, 1 ^ 1 - 0

可以得出

X ^ 0 = X

进而知道，如果h（二进制表示）的高16位都是0，那么h>>>16之后的值还是h，即只有高16位不全是0，才会有效果

个人总结：(h = key.hashCode()) ^ (h >>> 16) 只能在尽量少的影响运算速度下，一小部分情况下减少hash碰撞，同时也有可能在另一小部分情况下增加hash碰撞。直观感觉上，有点鸡肋。那么如果官方是基于现实的大量数据测试结果之后，选择这个方案，就另当别论了

标签：源码 Java HashMap

[Java8源码]HashMap的key的哈希值为什么是(h = key.hashCode()) ^ (h >>> 16)

[Java8源码]HashMap的key的哈希值为什么是(h = key.hashCode()) ^ (h >>> 16)的评论 (共条)

你可能也喜欢这些文章

最新发布的文章

[Java8源码]HashMap的key的哈希值为什么是(h = key.hashCode()) ^ (h >>> 16)

本文作者的其他文章

[Java8源码]HashMap的key的哈希值为什么是(h = key.hashCode()) ^ (h >>> 16)的评论 (共 条)

你可能也喜欢这些文章

最新发布的文章

[Java8源码]HashMap的key的哈希值为什么是(h = key.hashCode()) ^ (h >>> 16)的评论 (共条)