什么是Java中的“内部地址”?

在Javadoc for Object.hashCode()中声明

尽可能多地合理实用,由类Object定义的hashCode方法确实为不同的对象返回不同的整数。 (这通常通过将对象的内部地址转换为整数来实现,但Java™编程语言不需要此实现技术。)

这是一个常见的miconception这与内存地址有关,但它不会改变,恕不另行通知,并且hashCode()不会也不能改变对象。

@ Neet提供了一个很好的答案https://stackoverflow.com/a/565416/57695但我正在寻找更多的细节。


这里举一个例子来说明我的担忧

 Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); Unsafe unsafe = (Unsafe) theUnsafe.get(null); for (int t = 0; t < 10; t++) { System.gc(); Object[] objects = new Object[10]; for (int i = 0; i < objects.length; i++) objects[i] = new Object(); for (int i = 0; i < objects.length; i++) { if (i > 0) System.out.print(", "); int location = unsafe.getInt(objects, Unsafe.ARRAY_OBJECT_BASE_OFFSET + Unsafe.ARRAY_OBJECT_INDEX_SCALE * i); System.out.printf("%08x: hc= %08x", location, objects[i].hashCode()); } System.out.println(); } 

版画

 eac00038: hc= 4f47e0ba, eac00048: hc= 2342d884, eac00058: hc= 7994d431, eac00068: hc= 19f71b53, eac00078: hc= 2e22f376, eac00088: hc= 789ddfa3, eac00098: hc= 44c58432, eac000a8: hc= 036a11e4, eac000b8: hc= 28bc917c, eac000c8: hc= 73f378c8 eac00038: hc= 30813486, eac00048: hc= 729f624a, eac00058: hc= 3dee2310, eac00068: hc= 5d400f33, eac00078: hc= 18a60d19, eac00088: hc= 3da5f0f3, eac00098: hc= 596e0123, eac000a8: hc= 450cceb3, eac000b8: hc= 4bd66d2f, eac000c8: hc= 6a9a4f8e eac00038: hc= 711dc088, eac00048: hc= 584b5abc, eac00058: hc= 3b3219ed, eac00068: hc= 564434f7, eac00078: hc= 17f17060, eac00088: hc= 6c08bae7, eac00098: hc= 3126cb1a, eac000a8: hc= 69e0312b, eac000b8: hc= 7dbc345a, eac000c8: hc= 4f114133 eac00038: hc= 50c8c3b8, eac00048: hc= 2ca98e77, eac00058: hc= 2fc83d89, eac00068: hc= 034005e1, eac00078: hc= 6041f871, eac00088: hc= 0b1df416, eac00098: hc= 5b83d60d, eac000a8: hc= 2c5a1e6b, eac000b8: hc= 5083198c, eac000c8: hc= 4f025f9f eac00038: hc= 00c5eb8a, eac00048: hc= 41eab16b, eac00058: hc= 1726099c, eac00068: hc= 4240eca3, eac00078: hc= 346fe350, eac00088: hc= 1db4b415, eac00098: hc= 429addef, eac000a8: hc= 45609812, eac000b8: hc= 489fe953, eac000c8: hc= 7a8f6d64 eac00038: hc= 7e628e42, eac00048: hc= 7869cfe0, eac00058: hc= 6aceb8e2, eac00068: hc= 29cc3436, eac00078: hc= 1d77daaa, eac00088: hc= 27b4de03, eac00098: hc= 535bab52, eac000a8: hc= 274cbf3f, eac000b8: hc= 1f9fd541, eac000c8: hc= 3669ae9f eac00038: hc= 772a3766, eac00048: hc= 749b46a8, eac00058: hc= 7e3bfb66, eac00068: hc= 13f62649, eac00078: hc= 054b8cdc, eac00088: hc= 230cc23b, eac00098: hc= 1aa3c177, eac000a8: hc= 74f2794a, eac000b8: hc= 5af92541, eac000c8: hc= 1afcfd10 eac00038: hc= 396e1dd8, eac00048: hc= 6c696d5c, eac00058: hc= 7d8aea9e, eac00068: hc= 2b316b76, eac00078: hc= 39862621, eac00088: hc= 16315e08, eac00098: hc= 03146a9a, eac000a8: hc= 3162a60a, eac000b8: hc= 4382f3da, eac000c8: hc= 4a578fd6 eac00038: hc= 225765b0, eac00048: hc= 17d5176d, eac00058: hc= 26f50154, eac00068: hc= 1f2a45c7, eac00078: hc= 104b1bcd, eac00088: hc= 330e3816, eac00098: hc= 6a844689, eac000a8: hc= 12330301, eac000b8: hc= 530a3ffc, eac000c8: hc= 45eee3fb eac00038: hc= 3f9432e0, eac00048: hc= 1a9830bc, eac00058: hc= 7da79447, eac00068: hc= 04f801c4, eac00078: hc= 363bed68, eac00088: hc= 185f62a9, eac00098: hc= 1e4651bf, eac000a8: hc= 1aa0e220, eac000b8: hc= 385db088, eac000c8: hc= 0ef0cda1 

作为一个附注; 如果你看这个代码

 if (value == 0) value = 0xBAD ; 

看起来,0xBAD是​​正常的两倍,因为任何hashCode作为0被映射到这个值。 如果你跑得这么久,你就会看到

 long count = 0, countBAD = 0; while (true) { for (int i = 0; i < 200000000; i++) { int hc = new Object().hashCode(); if (hc == 0xBAD) countBAD++; count++; } System.out.println("0xBAD ratio is " + (double) (countBAD << 32) / count + " times expected."); } 

版画

 0xBAD ratio is 2.0183116992481205 times expected. 

这显然是特定于实施的。

下面我包含OpenJDK 7中使用的Object.hashCode()实现。

该函数支持六种不同的计算方法,其中只有两种计算方法需要注意对象的地址(“地址”是将C ++ oopintptr_t )。 两个方法之一使用地址原样,而另一个方法有点twiddling,然后用一个不经常更新的随机数混合结果。

剩下的方法中,返回一个常量(大概是为了testing),一个返回连续的数字,其余的则是基于伪随机序列的。

看起来这个方法可以在运行时select,而默认的方法是os::random()方法0。 后者是一个线性同余生成器 ,引发了所谓的竞争条件。:-)竞争条件是可以接受的,因为在最坏的情况下会导致两个对象共享相同的哈希码; 这不会破坏任何不variables。

计算是在第一次需要散列码时执行的。 为了保持一致性,结果将存储在对象的头文件中,并在随后调用hashCode() 。 caching是在这个函数之外完成的。

总之, Object.hashCode()是基于对象地址的概念在很大程度上是一个历史的人为现象,被现代垃圾收集器的属性所淘汰。

 // hotspot/src/share/vm/runtime/synchronizer.hpp // hashCode() generation : // // Possibilities: // * MD5Digest of {obj,stwRandom} // * CRC32 of {obj,stwRandom} or any linear-feedback shift register function. // * A DES- or AES-style SBox[] mechanism // * One of the Phi-based schemes, such as: // 2654435761 = 2^32 * Phi (golden ratio) // HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ; // * A variation of Marsaglia's shift-xor RNG scheme. // * (obj ^ stwRandom) is appealing, but can result // in undesirable regularity in the hashCode values of adjacent objects // (objects allocated back-to-back, in particular). This could potentially // result in hashtable collisions and reduced hashtable efficiency. // There are simple ways to "diffuse" the middle address bits over the // generated hashCode values: // static inline intptr_t get_next_hash(Thread * Self, oop obj) { intptr_t value = 0 ; if (hashCode == 0) { // This form uses an unguarded global Park-Miller RNG, // so it's possible for two threads to race and generate the same RNG. // On MP system we'll have lots of RW access to a global, so the // mechanism induces lots of coherency traffic. value = os::random() ; } else if (hashCode == 1) { // This variation has the property of being stable (idempotent) // between STW operations. This can be useful in some of the 1-0 // synchronization schemes. intptr_t addrBits = intptr_t(obj) >> 3 ; value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; } else if (hashCode == 2) { value = 1 ; // for sensitivity testing } else if (hashCode == 3) { value = ++GVars.hcSequence ; } else if (hashCode == 4) { value = intptr_t(obj) ; } else { // Marsaglia's xor-shift scheme with thread-specific state // This is probably the best overall implementation -- we'll // likely make this the default in future releases. unsigned t = Self->_hashStateX ; t ^= (t << 11) ; Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ; unsigned v = Self->_hashStateW ; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; Self->_hashStateW = v ; value = v ; } value &= markOopDesc::hash_mask; if (value == 0) value = 0xBAD ; assert (value != markOopDesc::no_hash, "invariant") ; TEVENT (hashCode: GENERATE) ; return value; } 

它通常对象的内存地址。 但是,首次在对象上调用hashcode方法时,整数将存储在该对象的头部,以便下一次调用将返回相同的值(如您所说,压缩垃圾回收可以更改地址)。 据我所知,它是如何在Oracle JVM中实现的。

编辑:挖掘到JVM源代码,这是什么显示(synchronizer.cpp):

 // hashCode() generation : // // Possibilities: // * MD5Digest of {obj,stwRandom} // * CRC32 of {obj,stwRandom} or any linear-feedback shift register function. // * A DES- or AES-style SBox[] mechanism // * One of the Phi-based schemes, such as: // 2654435761 = 2^32 * Phi (golden ratio) // HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ; // * A variation of Marsaglia's shift-xor RNG scheme. // * (obj ^ stwRandom) is appealing, but can result // in undesirable regularity in the hashCode values of adjacent objects // (objects allocated back-to-back, in particular). This could potentially // result in hashtable collisions and reduced hashtable efficiency. // There are simple ways to "diffuse" the middle address bits over the // generated hashCode values: // static inline intptr_t get_next_hash(Thread * Self, oop obj) { intptr_t value = 0 ; if (hashCode == 0) { // This form uses an unguarded global Park-Miller RNG, // so it's possible for two threads to race and generate the same RNG. // On MP system we'll have lots of RW access to a global, so the // mechanism induces lots of coherency traffic. value = os::random() ; } else if (hashCode == 1) { // This variation has the property of being stable (idempotent) // between STW operations. This can be useful in some of the 1-0 // synchronization schemes. intptr_t addrBits = intptr_t(obj) >> 3 ; value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; } else if (hashCode == 2) { value = 1 ; // for sensitivity testing } else if (hashCode == 3) { value = ++GVars.hcSequence ; } else if (hashCode == 4) { value = intptr_t(obj) ; } else { // Marsaglia's xor-shift scheme with thread-specific state // This is probably the best overall implementation -- we'll // likely make this the default in future releases. unsigned t = Self->_hashStateX ; t ^= (t << 11) ; Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ; unsigned v = Self->_hashStateW ; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; Self->_hashStateW = v ; value = v ; } value &= markOopDesc::hash_mask; if (value == 0) value = 0xBAD ; assert (value != markOopDesc::no_hash, "invariant") ; TEVENT (hashCode: GENERATE) ; return value; } 

所以在Oracle JVM中有6种不同的方式,其中之一就是我所说的…这个值通过调用get_next_hash方法存储在对象的头文件中(称为FastHashCode并从Object.hashCode()本地版本。

恕我直言,虽然依赖于实现,但JVM中的对象引用永远不是实际的内存地址,而是指向实际地址的内部JVM引用。 这些内部引用最初基于内存地址生成,但是它们仍然与对象关联,直到被丢弃。

我之所以这么说,是因为Java HotSpot GC正在复制某种forms的收集器,而他们通过遍历live objects并将其从一个堆复制到另一个堆,随后摧毁old heap 。 因此,当GC发生时,JVM不必更新所有对象中的引用,而只需更改映射到内部引用的实际内存地址。

Javadoc对Object.hashCode()的build议是通过从内存地址派生哈希码来实现的,这个build议已经过时了。

也许没有人打扰(或注意到)这个实现path不再是可能的whith复制垃圾回收器(因为它会改变哈希码,当对象被复制到另一个内存位置)。

存在复制垃圾回收器之前 ,以这种方式实现散列码是非常有意义的,因为这样可以节省堆空间。 非复制的GC(CMS)仍然可以在今天实现散列码。