HashMap源码解析

容量和扩容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* The default initial capacity ‐ MUST be a power of two.
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
*/
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
* The load factor used when none specified in constructor.
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;

我们这里能看到,和 HashTable 一样有 Capacity 和 Factor 的设定,这里面我们的默认值也是 16 和 0.75 的值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
*/
static final int TREEIFY_THRESHOLD = 8;
/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
*/
static final int UNTREEIFY_THRESHOLD = 6;
/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
*/
static final int MIN_TREEIFY_CAPACITY = 64;

这个就是我们说的链表成树的门限,我们在大小超过 THRESHOLD(8) 的时候会把列表变成一棵树,还有 UNTREEIFY_THRESHOLD 就是树返回链表的门限,MIN_TREEIFY_CAPACITY 这个是树化的最小的 HashMap 容量。

数据存储

1
2
3
4
5
6
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
}

我们每一个 Node 都是这样的一个数据节点,包含K-V和子节点,这个和 Hashtable 没啥区别。

hash方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power‐of‐two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit‐spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

JDK7、JDK8 的 HashMap 的 hash 方法是不同的,但是实现的原理其实是一样的东西,我们在程序中实现了一个扰动函数,在这里面我们解释一下这个扰动函数的具体实现细节:

我们先使用默认的HashCode方法求出了返回的Int类型的hash,但是我们的 hash 本身是一个 32bit 的数据,二次幂的话,会有 40亿的 hash 空间,但是 40亿 的 hash 空间是没办法存储的开的,所以我们本身就用通过掩模的方式去的降低我们的数位。

我们先右移 16 位和自己的 hash 本身异或一下,能让我们的数据更为分散一些。这也揭示了我们的 HashMap 的大小为什么都是取 2 的次幂进行处理,那样我们的 (table.length ‐ 1) 能成为我们的 低位掩模 。我们最后在进行一次 (table.length - 1) & hash 之后我们就能求出位数较小的 hash 值。我们通过 >>> 的方式,让我们的低位值也能保留出高位数据的值,能让我们的 hash 更加平均。

Field数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
*/
transient Node<K,V>[] table;
/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
transient Set<Map.Entry<K,V>> entrySet;
/**
* The number of key‐value mappings contained in this map.
*/
transient int size;
/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection‐views of
* the HashMap fail‐fast. (See ConcurrentModificationException).
*/
transient int modCount;
/**
* The next size value at which to resize (capacity * load factor).
*
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;
/**
* The load factor for the hash table.
*
* @serial
*/
final float loadFactor;

这些数据我们都很熟悉,各种的存储的 table ,用作缓存的 entrySet,还有用作判断同步的 modCount 参数,还有门限和加载因子。

1
2
3
4
5
6
7
8
9
10
11
12
/**
* Returns a power of two size for the given target capacity.
*/
static final int tableSizeFor(int cap) {
int n = cap ‐ 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

我们在构造函数中通过这个算法找到了大于等于我们给定参数的最小的二次幂方的参数,这个算法的分析很多人节写的比较麻烦,我们可以简单地理解为我们收到的结果参数都是将数位的所有的位数都变成 1,这样子二进制再加以就会进位,那就肯定是一个 1 全 0 的情况,然后我们拿到值就全是 2 的幂了,这样和 之前介绍的 hash 方法的结合,我们的 HashMap 的容量就能被控制在 2 的幂次了。

0%