聊聊 Java 引用

Demon.Lee 2023年12月11日 643次浏览

本文实践环境:

Operating System: macOS 14.1.1
Kernel: Darwin 23.1.0
Architecture: arm64

Java version: 17.0.8


看 JVM 垃圾回收相关内容时,总会提到引用,即强引用、软引用、弱引用以及虚引用。日常开发中,尤其是业务开发,我们时常只会接触到强引用。为了将这些基本概念搞明白,笔者决定把它们写下来。

基本概念

首先是基本概念,我们结合 Java Docs 来看看这些引用是如何被定义的。在 Java 1.2 之前,只有强引用这一种类型,到了 Java 1.2 才引入了软引用、弱引用以及虚引用等概念。

强引用

前面已经提到过,这是最常见的一种,作为开发人员的我们,一般无需关心该类引用对象的内存分配与回收,因为 GC 会按需处理。

Student stu = new Student("10001", "LiLei");

比如上面定义的 stu 变量被赋值为 Student 对象实例的引用,这就是强引用关系。只要 stu 一直存在,其引用的 Student 对象就不会被回收,直到 stu 离开其作用域(比如某个方法或代码块)或被强制赋值为 null (即 stu = null)。

软引用

Soft reference objects, which are cleared at the discretion of the garbage collector in response to memory demand. Soft references are most often used to implement memory-sensitive caches.

All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError. Otherwise no constraints are placed upon the time at which a soft reference will be cleared or the order in which a set of such references to different objects will be cleared. Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references.

软引用,比强引用稍弱一些的引用关系,通常用来实现对内存敏感的缓存。JVM 会根据内存使用情况,按需对这部分内存进行回收,具体来说则是:当内存不足并要抛出 OutOfMemoryError 之前会对这部分内存进行回收。比如,我们通过第三方接口拉取了一些业务记录,当使用之后,我们并不着急将其回收,而是放在那里,等到下次使用时再直接获取。如果因为内存不足,被清除了呢?对业务层面来说也是无感的,再拉一次数据便好了。

public void usage() {
  Student stu = new Student("10001", "LiLei");
  SoftReference<Student> studentSoftRef = new SoftReference<>(stu);
  Student cacheStu = studentSoftRef.get();
  if (Objects.isNull(cacheStu)) {
    // 若被回收,需要重新赋值
  }

  List<Student> students = new ArrayList<>(List.of(stu));
  SoftReference<List<Student>> studentsSoftRef = new SoftReference<>(students);
  List<Student> cacheStudents = studentsSoftRef.get();
  if (Objects.isNull(cacheStudents)) {
    // 若被回收,需要重新赋值
  }
}

上面的代码示例是常规的使用方式,下面笔者将用强引用和软引用两个小例子来验证一下回收的策略。

// 强引用示例
@SneakyThrows
public void outOfMemoryError4HardReference() {
  // 200MB
  byte[] hardRef = new byte[1024 * 1024 * 200];
  log.info("Hard Reference created: {}", Arrays.hashCode(hardRef));

  // 建议 JVM 触发垃圾回收
  System.gc();
  // 等待垃圾回收完成
  Thread.sleep(3000);

  log.info("After GC: {}", Arrays.hashCode(hardRef));

  // 创建一些对象以产生内存压力
  List<byte[]> memoryPressures = new ArrayList<>();
  try {
    for (int i = 1; ; i++) {
      // 每次分配 10MB
      memoryPressures.add(new byte[1024 * 1024 * 10]);
      log.info("{}, Hard Reference under pressure: {}", i, Arrays.hashCode(hardRef));
    }
  } catch (OutOfMemoryError error) {
    log.error("After running out of memory: {}", Arrays.hashCode(hardRef), error);
  }
}

首先是强引用代码示例,我们看一下输出的日志:

16:35:52.842 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - Hard Reference created: 1879048193
16:35:56.068 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - After GC: 1879048193
16:35:56.279 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 1, Hard Reference under pressure: 1879048193
16:35:56.491 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 2, Hard Reference under pressure: 1879048193
16:35:56.702 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 3, Hard Reference under pressure: 1879048193
...
...
16:36:01.303 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 25, Hard Reference under pressure: 1879048193
16:36:01.505 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 26, Hard Reference under pressure: 1879048193
16:36:01.707 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 27, Hard Reference under pressure: 1879048193
16:36:01.919 [Test worker] ERROR jvm.tech.demonlee.gc.reference.SoftReferenceUsage - After running out of memory: 1879048193
java.lang.OutOfMemoryError: Java heap space
	at jvm.tech.demonlee.gc.reference.SoftReferenceUsage.outOfMemoryError4HardReference(SoftReferenceUsage.java:66) ~[main/:?]
	at jvm.tech.demonlee.gc.reference.SoftReferenceUsageTest.outOfMemoryError4HardReference(SoftReferenceUsageTest.java:13) ~[test/:?]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
...
...

从日志中可以发现,不管是手动触发 GC,还是内存爆了,垃圾回收器都不会对强引用下手。接着,我们把强引用换成软引用试试:

// 软引用示例
@SneakyThrows
public void outOfMemoryError() {
  // 200MB
  SoftReference<byte[]> softRef = new SoftReference<>(new byte[1024 * 1024 * 200]);
  log.info("Soft Reference created: {}", getSoftReferenceHashCode(softRef));

  // 建议 JVM 触发垃圾回收
  System.gc();
  // 等待垃圾回收完成
  Thread.sleep(3000);

  log.info("After GC: {}", getSoftReferenceHashCode(softRef));

  // 创建一些对象以产生内存压力
  List<byte[]> memoryPressures = new ArrayList<>();
  try {
    for (int i = 0; ; i++) {
      // 每次分配 10MB
      memoryPressures.add(new byte[1024 * 1024 * 10]);
      log.info("{}, Soft Reference under pressure: {}", i, getSoftReferenceHashCode(softRef));
    }
  } catch (OutOfMemoryError error) {
    // 当内存不足时,软引用对象应该被回收
    log.error("After running out of memory: {}", getSoftReferenceHashCode(softRef), error);
  }
}

private Integer getSoftReferenceHashCode(SoftReference<byte[]> softRef) {
  return Optional.ofNullable(softRef.get()).map(Object::hashCode).orElse(null);
}

输出日志如下:

16:35:49.519 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - Soft Reference created: 798622145
16:35:52.534 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - After GC: 798622145
16:35:52.539 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 0, Soft Reference under pressure: 798622145
16:35:52.544 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 1, Soft Reference under pressure: 798622145
16:35:52.546 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 2, Soft Reference under pressure: 798622145
...
...
16:35:52.572 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 23, Soft Reference under pressure: 798622145
16:35:52.572 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 24, Soft Reference under pressure: 798622145
16:35:52.584 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 25, Soft Reference under pressure: null
16:35:52.585 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 26, Soft Reference under pressure: null
16:35:52.586 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 27, Soft Reference under pressure: null
16:35:52.586 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 28, Soft Reference under pressure: null
...
...
16:35:52.598 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 41, Soft Reference under pressure: null
16:35:52.600 [Test worker] INFO  jvm.tech.demonlee.gc.reference.SoftReferenceUsage - 42, Soft Reference under pressure: null
16:35:52.610 [Test worker] ERROR jvm.tech.demonlee.gc.reference.SoftReferenceUsage - After running out of memory: null
java.lang.OutOfMemoryError: Java heap space
	at jvm.tech.demonlee.gc.reference.SoftReferenceUsage.outOfMemoryError(SoftReferenceUsage.java:35) ~[main/:?]
	at jvm.tech.demonlee.gc.reference.SoftReferenceUsageTest.outOfMemoryError4SoftReference(SoftReferenceUsageTest.java:18) ~[test/:?]
...
...

可以看到,在第 25 次循环时,JVM 回收了软引用所指向的对象,也就是那 200MB 内存,所以后面又分配了十几次 10 MB 内存,直到内存爆掉。

弱引用

Weak reference objects, which do not prevent their referents from being made finalizable, finalized, and then reclaimed. Weak references are most often used to implement canonicalizing mappings.
Suppose that the garbage collector determines at a certain point in time that an object is weakly reachable. At that time it will atomically clear all weak references to that object and all weak references to any other weakly-reachable objects from which that object is reachable through a chain of strong and soft references. At the same time it will declare all of the formerly weakly-reachable objects to be finalizable. At the same time or at some later time it will enqueue those newly-cleared weak references that are registered with reference queues.

弱引用则是比软引用还要弱一些的引用关系,跟软引用一样,也是用来实现比较敏感的缓存逻辑。但与软引用不同的是,弱引用不能使对象豁免垃圾收集,它只能活到下一次垃圾回收之前。我们来看一个示例代码:

public static void main(String[] args) {
  WeakReference<Student> weakRef = new WeakReference<>(new Student("10001", "LiLei"));
  checkExists(weakRef);

  System.gc();

  checkExists(weakRef);
}

private static void checkExists(WeakReference<Student> weakRef) {
  Student cacheStu = weakRef.get();
  if (Objects.isNull(cacheStu)) {
    System.out.println("stu is killed.");
    return;
  }
  System.out.println("stu is alive: " + cacheStu);
}

输出日志:

stu is alive: Student{id='10001', name='LiLei'}
stu is killed.

可以看到,当我们开启一次 GC 之后,student 实例对象已经被回收掉了。那如果换成软引用呢?

public static void main(String[] args) {
  SoftReference<Student> studentSoftRef = new SoftReference<>(new Student("10001", "LiLei"));
  checkExists(studentSoftRef);

  System.gc();

  checkExists(studentSoftRef);
}

private static void checkExists(SoftReference<Student> studentSoftRef) {
  Student cacheStu = studentSoftRef.get();
  if (Objects.isNull(cacheStu)) {
    System.out.println("stu is killed.");
    return;
  }
  System.out.println("stu is alive: " + cacheStu);
}

输出日志为:

stu is alive: Student{id='10001', name='LiLei'}
stu is alive: Student{id='10001', name='LiLei'}

映射关系是弱引用经常应用的一个场景,比如 URL 和页面快照(Snapshot),当一个 URL 被判定为“死亡”后,其对应的 Snapshot 也应该被回收(它们的生命周期相同),如何实现?我们可以将它们放在同一个 Page 对象中,而 Page 通过弱引用关联 URL,当 URL 被回收后,Page 便可以通过某种方式(比如监听器)感知到,然后同步将 Snapshot 引用置为空。

下面是笔者写的一个示例:

@Log4j2
@Getter
public class Page {

    private String id;
    private WeakReference<Url> url;
    private Snapshot snapshot;

    public Page(String id, WeakReference<Url> url, Snapshot snapshot) {
        this.id = id + "-" + id.hashCode();
        this.url = url;
        this.snapshot = snapshot;
    }

    public void clearSnapshot() {
        log.info("clear snapshot now: {}", this);
        this.snapshot = null;
    }

    @Getter
    public static class Url {
        private final String path;

        public Url(String path) {
            this.path = path;
        }

        @Override
        public String toString() {
            return "Url{" +
                    "path='" + path + '\'' +
                    '}';
        }
    }

    public static class Snapshot {
        private final String name;
        private final String value;

        public Snapshot(String name, String value) {
            this.name = name;
            this.value = value;
        }

        @Override
        public String toString() {
            return "Snapshot{" +
                    "name='" + name + '\'' +
                    ", value='" + value + '\'' +
                    '}';
        }
    }

    @Override
    public String toString() {
        return "Page{" +
                "id=" + id +
                ",url=" + url.get() +
                ", snapshot=" + snapshot +
                '}';
    }
}

@Log4j2
public class WeakReferenceUsage {

    @SneakyThrows
    public void usage() {
        Page.Url url1 = new Page.Url("/index");
        Page.Url url2 = new Page.Url("/hi");
        ReferenceQueue<Page.Url> refQueue = new ReferenceQueue<>();
        List<Page> pages = mockPages(refQueue, url1, url2);
        checkAndCollectSnapshot(pages, refQueue);

        // 模拟 url 被回收
        url1 = null;
        TimeUnit.SECONDS.sleep(1);
        System.gc();

        // 模拟 url 被回收
        url2 = null;
        TimeUnit.SECONDS.sleep(1);
        System.gc();

        TimeUnit.SECONDS.sleep(5);
    }

    private List<Page> mockPages(ReferenceQueue<Page.Url> refQueue, Page.Url... urls) {
        return Stream.of(urls).map(u -> mockPage(refQueue, u)).collect(Collectors.toList());
    }

    private Page mockPage(ReferenceQueue<Page.Url> urlRefQueue, Page.Url url) {
        WeakReference<Page.Url> urlRef = new WeakReference<>(url, urlRefQueue);
        Page.Snapshot snapshot = new Page.Snapshot("Test" + url.hashCode(), UUID.randomUUID().toString());
        return new Page(url.getPath(), urlRef, snapshot);
    }

    // 检测 url 是否被回收掉(独立线程运行)
    private void checkAndCollectSnapshot(List<Page> pages, ReferenceQueue<Page.Url> urlRefQueue) {
        Map<Reference<Page.Url>, Page> urlPageMapping = pages.stream().collect(Collectors.toMap(Page::getUrl, p -> p));
        Runnable collectionTask = () -> {
            while (true) {
                log.info("begin collect snapshot...");
                try {
                    // remove() 方法有出队的语义,并且是阻塞操作
                    // Reference<? extends Page.Url> url = urlRefQueue.remove(1000);
                    Reference<? extends Page.Url> url = urlRefQueue.remove();
                    if (urlPageMapping.containsKey(url)) {
                        Page page = urlPageMapping.get(url);
                        page.clearSnapshot();
                    }
                } catch (InterruptedException e) {
                    log.error("get url from queue failed: ", e);
                }
            }
        };
        new Thread(collectionTask).start();
    }
}

@Log4j2
public class WeakReferenceUsageTest {
    @Test
    void usageForMappingClear() {
        WeakReferenceUsage weakReferenceUsage = new WeakReferenceUsage();
        weakReferenceUsage.usage();
    }
}

我们通过对引用队列(ReferenceQueue,下文将会介绍)的出队处理,来判断是否有 Url 对象被回收,然后再将 Snapshot 回收掉。测试日志如下:

08:15:30.592 [Thread-3] INFO  jvm.tech.demonlee.gc.reference.WeakReferenceUsage - begin collect snapshot...
08:15:31.611 [Thread-3] INFO  jvm.tech.demonlee.common.model.Page - clear snapshot now: Page{id=/index-1445916163,url=null, snapshot=Snapshot{name='Test2117099736', value='3ff7a705-820e-4ddf-9212-7c3d4ceea90a'}}
08:15:31.611 [Thread-3] INFO  jvm.tech.demonlee.gc.reference.WeakReferenceUsage - begin collect snapshot...
08:15:32.622 [Thread-3] INFO  jvm.tech.demonlee.common.model.Page - clear snapshot now: Page{id=/hi-48496,url=null, snapshot=Snapshot{name='Test2000648320', value='0446a67a-fbfe-44e6-8020-0066c1c59dcc'}}
08:15:32.622 [Thread-3] INFO  jvm.tech.demonlee.gc.reference.WeakReferenceUsage - begin collect snapshot...

虚引用

Phantom reference objects, which are enqueued after the collector determines that their referents may otherwise be reclaimed. Phantom references are most often used to schedule post-mortem cleanup actions.
Suppose the garbage collector determines at a certain point in time that an object is phantom reachable. At that time it will atomically clear all phantom references to that object and all phantom references to any other phantom-reachable objects from which that object is reachable. At the same time or at some later time it will enqueue those newly-cleared phantom references that are registered with reference queues.
In order to ensure that a reclaimable object remains so, the referent of a phantom reference may not be retrieved: The get method of a phantom reference always returns null. The refersTo method can be used to test whether some object is the referent of a phantom reference.

最后一种是虚引用,又称为幽灵引用,通过它并不能获得引用的对象(返回总是 null)。它的主要作用是当对象已被 finalize (下文会介绍),但还未被回收时(即对象已经被判定为“死亡”),应用程序可以收到一个系统通知。基于该通知,我们可以执行一些清理工作或记录相关日志等。需要说明的是,虚引用不能直接使用,需要配合引用队列(即 ReferenceQueue)。

如何收到通知,我们先通过一个代码示例来熟悉一下:

public static void main(String[] args) {
  PhantomReferenceUsage usage = new PhantomReferenceUsage();
  usage.usageViaNotify(new Student("10001", "LiLei"));
}

@SneakyThrows
private void usageViaNotify(Student stu) {
  ReferenceQueue<Student> referenceQueue = new ReferenceQueue<>();
  PhantomReference<Student> studentRef = new PhantomReference<>(stu, referenceQueue);
  log.info("student[{}] is: {}", stu.getName(), studentRef.get());

  // 将 stu 指向对象赋值为 null,并建议 JVM 触发垃圾回收
  stu = null;
  System.gc();

  // remove 是一个阻塞方法,当队列中有数据或设置的超时时间到了,才会返回
  Reference<? extends Student> cacheRef = referenceQueue.remove(3000L);
  if (Objects.isNull(cacheRef)) {
    log.info("no ref in Reference Queue...");
    return;
  }
  if (cacheRef == studentRef) {
    log.info("student was killed by GC, you can do sth now...");
    // do sth here
  }
  log.info("student is: {}", cacheRef.get());
}

输出日志为:

21:52:44.536 [main] INFO  jvm.tech.demonlee.gc.reference.PhantomReferenceUsage - student[LiLei] is: null
21:52:44.543 [main] INFO  jvm.tech.demonlee.gc.reference.PhantomReferenceUsage - student was killed by GC, you can do sth now...
21:52:44.543 [main] INFO  jvm.tech.demonlee.gc.reference.PhantomReferenceUsage - student is: null

当对象将被标记为可回收后,如果该对象有对应的虚引用,那么该虚引用将会被加入到其所关联的引用队列中,即 ReferenceQueue ,这就是所谓的通知。另外,前面说过,通过虚引用 get() 拿到的都会是 null,从输出的日志中可以看到符合预期。

为了进一步验证垃圾回收,我们把 GC 日志也打印出来(启动程序时增加 -Xlog:gc:file=./gc.log:time 配置,这是 Java 9 新调整的配置,Java 8 请使用 -XX:+PrintGC-XX:+PrintGCDetails-XX:+PrintGCDateStamp 等):

// `-Xlog:gc:time` or `-Xlog:gc*:time`
@SneakyThrows
public void checkGC() {
  byte[] _200MB = new byte[1024 * 1024 * 200];
  ReferenceQueue<byte[]> referenceQueue = new ReferenceQueue<>();
  PhantomReference<byte[]> objRef = new PhantomReference<>(_200MB, referenceQueue);
  log.info("object is: {}", objRef.get());

  _200MB = null;
  System.gc();

  Reference<?> cacheRef = referenceQueue.remove(3000L);
  if (Objects.isNull(cacheRef)) {
    log.info("no ref in Reference Queue...");
    return;
  }

  if (cacheRef == objRef) {
    log.info("object was killed by GC, you can clean up some resource now...");
    // do sth here
  }
  log.info("student is: {}", cacheRef.get());
}

输出日志为:

[2023-11-26T11:38:17.866+0800] Using G1
11:38:19.361 [Test worker] INFO  jvm.tech.demonlee.gc.reference.PhantomReferenceUsage - object is: null
[2023-11-26T11:38:18.129+0800] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 25M->4M(512M) 2.176ms
[2023-11-26T11:38:19.369+0800] GC(1) Pause Full (System.gc()) 229M->6M(44M) 6.035ms
11:38:19.370 [Test worker] INFO  jvm.tech.demonlee.gc.reference.PhantomReferenceUsage - object was killed by GC, you can clean up some resource now...
11:38:19.370 [Test worker] INFO  jvm.tech.demonlee.gc.reference.PhantomReferenceUsage - student is: null

再谈引用

finalize

说到这些非强引用,总会涉及到终结器(Finalizer),于是就不得不提 java.lang.Object 中的 finalize 方法。本来计划将相关内容写在这里,但是发现内容也不少,便单独开了一篇:《Java 之 finalize 终结》。

那么,finalize 与上面介绍的这些非强引用有什么区别呢?它们都是对对象生命周期的一种干预(或者说管理),但不同的是:finalize 是从对象内部来管理,毕竟它本身就是对象“之内”的方法,而非强引用则是通过外部视角来干预,把所指对象作为引用对象的一个属性。 这应该很好理解,一个对象是否还活跃,是否已经被判定为“死亡”,肯定是有一个外部引用指向它,然后才能查询到。相反,如果自己都已经挂了,怎么还能告诉其他人我已经挂了?这显然是自相矛盾的。

所以,为什么需要软引用/弱引用/虚引用?一句话总结就是:

通过使用对象“之外”的方法,来管理对象的生命周期。

对象生命周期

像下面这种定义,obj 变量对所指向的对象就是强可达,即该对象可以通过一个或多个线程访问到,并且不依赖前面介绍的各种引用。

Object obj = new Object();

但因为引入了非强引用,于是便扩展了对象生命周期的概念,它们与强可达是相对应的:

  • 软可达:一个对象不是强可达,但能通过至少一条含有软引用的路径可达,那这个对象就是软可达的。比如下面的示例,由于将 stu 变量调整为 null ,所以现在只有 stuSoftRef 这一条路到达对象 new Student("10001", "LiLei")

    Student stu = new Student("10001", "LiLei");
    SoftReference<Student> stuSoftRef = new SoftReference<>(stu);
    stu = null;
    
  • 弱可达:如果一个对象不是强可达,也不是软可达,但能通过至少一条含有弱引用的路径可达,那它就是弱可达的。

  • 幻象可达:如果一个对象不是强可达,不是软可达,也不是弱可达,但能通过至少一条含有幻象引用的路径可达,那它就是幻象可达的。幻象可达对象是已经被终结,但还没有被回收的对象。

  • 不可达:没有任何引用的路径可以到达该对象,意味着该对象可以被回收了。

对象是否重写了 finalize 方法(即是否有非默认终结器),其生命周期是不一样的,下面是大致的状态机示意图:

这里简单说明一下其中的几个状态:

  • A:对象刚被 new 出来,可以带非默认终结器,即重写了 finalize 方法;
  • B:对象完成了初始化;
  • C:对象的引用保存在应用上下文中,被强引用;
  • F:到这个对象的的软可达路径都被清除,但仍可通过弱引用可达;
  • H:到这个对象的的弱可达路径都被清除,对象有非默认终结器,准备好被终结;
  • J:终结时,可以把自己再拯救回来,具体可以参考《Java 之 finalize 终结》中的代码示例;
  • K:对象被终结,应用程序后续无法直接操纵该对象;
  • L:与 J 对应,针对终结时自救复活的对象,再次被回收时,会直接变成已终结;
  • M:一个应用程序无法访问的对象通过幻象引用成为幻象可达;

Reference

从上面的基本概念中,我们可以看到,除强引用外,它们都有一个父类,这个父类就是 Reference。笔者参考相关资料,简单绘制了一下:

我们先来看一下 Reference 类:

1)有两个构造器函数,一个关联了引用队列,一个没有关联,如下所示。

Reference(T referent) {
  this(referent, null);
}

Reference(T referent, ReferenceQueue<? super T> queue) {
  this.referent = referent;
  this.queue = (queue == null) ? ReferenceQueue.NULL : queue;
}

2)referent 属性:这个就是实际的引用对象。针对该属性,官方源码中给出的 Java Docs 就非常多,主要描述了两类状态:

  • 一是引用本身的激活状态:主要有 activependinginactive 三种。active 是引用对象的初始状态,此时引用的对象还未被垃圾回收器标记为可回收状态;当垃圾回收器决定回收该引用对象时,会将状态更新为 pending ,并加入到一个内部的 pending-Reference 链表中,等待 ReferenceHandler 线程进行处理;如果引用执行了clear()enqueue() 等方法(关于这两个方法,将在引用队列中进一步说明),或是垃圾收集器执行了 GC,那么状态可能会变更为 inactive

    ReferenceHandler 是一个高优先级线程,用于将 pending-Reference 链表中即将回收的引用对象加入到该引用关联的引用队列中,即通知其所有者,让它们执行相关操作。知乎上有一篇《Reference Handler线程的作用》的文章,可以参阅了解。

    /* High-priority thread to enqueue pending References
    */
    private static class ReferenceHandler extends Thread {
        // ...
        // ...
    }
    
  • 二是引用与引用队列交互的状态:主要有 registeredenqueueddequeued 以及 unregistered。如果没有关联引用队列,引用状态就是 unregistered;如果注册了,那么初始状态就是 registered,当引用对象被标记为可回收后,会被加入到引用队列中,此时的状态就变更成了 enqueued,准备被应用程序判断并执行相关操作(比如资源清理等);当应用程序从引用队列中拉取数据(使用 poll()remove() 方法)后,数据从引用队列中移除,此时的状态就变成了 dequeued

  • 需要说明的是,通过该字段来引用实际的所指对象,并不会导致所指对象保持活跃。也就是说,该字段只是一个指针,用来帮助应用程序判断对应的实际对象是否还活着,哪怕这个对象实际上已经被判定为“死亡”,依然可以通过该指针来获取它。

  • 源码中还给出了状态变更流程:

         * Initial states:
         *   [active/registered]
         *   [active/unregistered] [1]
         *
         * Transitions:
         *                            clear [2]
         *   [active/registered]     ------->   [inactive/registered]
         *          |                                 |
         *          |                                 | enqueue
         *          | GC              enqueue [2]     |
         *          |                -----------------|
         *          |                                 |
         *          v                                 |
         *   [pending/registered]    ---              v
         *          |                   | ReferenceHandler
         *          | enqueue [2]       |--->   [inactive/enqueued]
         *          v                   |             |
         *   [pending/enqueued]      ---              |
         *          |                                 | poll/remove
         *          | poll/remove                     | + clear [4]
         *          |                                 |
         *          v            ReferenceHandler     v
         *   [pending/dequeued]      ------>    [inactive/dequeued]
         *
         *
         *                           clear/enqueue/GC [3]
         *   [active/unregistered]   ------
         *          |                      |
         *          | GC                   |
         *          |                      |--> [inactive/unregistered]
         *          v                      |
         *   [pending/unregistered]  ------
         *                           ReferenceHandler
         *
         * Terminal states:
         *   [inactive/dequeued]
         *   [inactive/unregistered]
         *
         * Unreachable states (because enqueue also clears):
         *   [active/enqeued]
         *   [active/dequeued]
    

ReferenceQueue

前面一直提到 ReferenceQueue ,为什么需要这个队列呢?

Reference queues, to which registered reference objects are appended by the garbage collector after the appropriate reachability changes are detected.

引用队列,在检测到适当的可达性变化后,垃圾回收器会将已注册的引用对象添加到队列中。

以虚引用为例,get() 方法永远返回 null,如果再不结合引用队列,基本上就没有什么可操作的空间了。引用队列可以帮助应用程序感知到其感兴趣的对象是否已经变得不可访问,然后采取相应的业务动作。

关于 Reference 类中的 clear()enqueue() 方法,这里我们结合 ReferenceQueue 再来聊一聊。

public abstract class Reference<T> {
    // 省略部分代码  

    /**
     * Clears this reference object.  Invoking this method will not cause this
     * object to be enqueued.
     *
     * <p> This method is invoked only by Java code; when the garbage collector
     * clears references it does so directly, without invoking this method.
     */
    public void clear() {
        clear0();
    }

    /* Implementation of clear(), also used by enqueue().  A simple
     * assignment of the referent field won't do for some garbage
     * collectors.
     */
    private native void clear0();

    /**
     * Clears this reference object and adds it to the queue with which
     * it is registered, if any.
     *
     * <p> This method is invoked only by Java code; when the garbage collector
     * enqueues references it does so directly, without invoking this method.
     *
     * @return   {@code true} if this reference object was successfully
     *           enqueued; {@code false} if it was already enqueued or if
     *           it was not registered with a queue when it was created
     */
    public boolean enqueue() {
        clear0();               // Intentionally clear0() rather than clear()
        return this.queue.enqueue(this);
    }
  
    // 省略部分代码
}

首先需要说明的是,这两个方法有可能会被用户程序在实现的子类中被重写。如果重写的逻辑复杂,则可能导致垃圾回收器出现意想不到的结果。GC 显然不愿意冒这个险。为此,GC 不会调用我们实现的代码,而是直接在内部完成相应的操作。这就是为什么上面的注释中会写 ..., when the garbage collector enqueues/clears references it does so directly, without invoking this method

再看其中的逻辑,当一个软(或弱)引用对象的所指对象(有点绕,其实就是 Reference 类中的 referent 字段)对应用程序不可访问时,JVM 会 clear 掉 referent ,并将引用对象 enqueue 到 ReferenceQueue 中。幻象引用稍微有一点区别,它会在 clear referent 对象之前,将引用对象 enqueue 到队列中。但不管哪种方式,enqueue 之后,通过 get() 获取所指对象都为 null

最后,跟 finalize 一样,Java 中也没有明确给出调用 enqueue() 方法的时间点或时间期限。所以,如果是非常重要的资源需要释放,建议不要通过 ReferenceQueue 来处理。另外,当引用对象被 enqueue 到引用队列之后,这个引用对象何时被处理,是由应用程序决定的。一般的情况是,应用程序将引用对象出队后(调用 poll()remove() 方法),执行相关处理,然后丢弃引用对象,交给 GC 进行回收。

结合前面 Java Doc 中对引用对象的状态描述,借用《虚拟机设计与实现 (豆瓣)》中的一幅图(笔者进行了重绘),总结一下引用对象的状态变更流程。由于不是特别复杂,笔者就不进一步展开解释了。

总结

通过对引用的概念,生命周期,使用场景等相关内容的分析,我们对非强引用有了一定的认识。结合引用队列,我们可以在所指对象对应用程序不可见时,触发相应的处理机制。因为有了这些引用的存在,对象的生命周期会变得复杂不少,并且需要 JVM 额外处理这些引用对象本身,包括垃圾回收时。显然,为了让应用程序有更多选择,这是不得不付出的代价。

那么,在程序运行过程中,我们如何查看各种引用的情况呢?可以使用如下 JVM 参数配置:

// Java 8
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintReferenceGC

// Java 9 及之后版本(下面使用了 trace 级别,如果不用那么详细的话,可以调整为 debug)
-Xlog:gc,gc+ref*=trace:time:

下面是某个简单程序使用 Java 8 和 Java 17 运行后,分别截取的 GC 日志:

// java 8
0.550: [GC (System.gc())
  0.575: [SoftReference, 0 refs, 0.0000192 secs]
  0.575: [WeakReference, 272 refs, 0.0001333 secs]
  0.575: [FinalReference, 1175 refs, 0.0005849 secs]
  0.576: [PhantomReference, 0 refs, 0.0000035 secs]
  0.576: [JNI Weak Reference, 0.0000038 secs]
  [PSYoungGen: 126111K->2927K(153088K)] 330911K->310135K(502784K), 0.0263525 secs]
[Times: user=0.09 sys=0.03, real=0.03 secs]

0.576: [Full GC (System.gc())
  0.578: [SoftReference, 0 refs, 0.0000111 secs]
  0.578: [WeakReference, 50 refs, 0.0000235 secs]
  0.578: [FinalReference, 0 refs, 0.0000042 secs]
  0.578: [PhantomReference, 1 refs, 0.0000030 secs]
  0.578: [JNI Weak Reference, 0.0000028 secs]
  [PSYoungGen: 2927K->0K(153088K)] [ParOldGen: 307208K->105119K(349696K)] 310135K->105119K(502784K), [Metaspace: 7196K->7196K(1056768K)], 0.0117018 secs]
[Times: user=0.02 sys=0.01, real=0.01 secs]


// java 17
[2023-12-11T13:52:18.342+0800] Using G1
[2023-12-11T13:52:18.633+0800] GC(0) Skipped phase 1 of Reference Processing: no references
[2023-12-11T13:52:18.633+0800] GC(0) Phase 2 Soft before0 0 0 0 0 0 0 0 0 (0)
[2023-12-11T13:52:18.633+0800] GC(0) Phase 2 Weak before4 0 85 0 42 20 0 343 2 (496)
[2023-12-11T13:52:18.633+0800] GC(0) Phase 2 Final before0 0 0 0 0 0 0 0 0 (0)
[2023-12-11T13:52:18.633+0800] GC(0) ReferenceProcessor::execute queues: 1, RefProcThreadModel::Single, marks_oops_alive: false
[2023-12-11T13:52:18.633+0800] GC(0) Phase 2 Final after0 0 0 0 0 0 0 0 0 (0)
[2023-12-11T13:52:18.633+0800] GC(0) Skipped phase 3 of Reference Processing: no references
[2023-12-11T13:52:18.633+0800] GC(0) Phase 4 Phantom before0 0 0 148 0 0 0 0 0 (148)
[2023-12-11T13:52:18.633+0800] GC(0) ReferenceProcessor::execute queues: 1, RefProcThreadModel::Single, marks_oops_alive: false
[2023-12-11T13:52:18.634+0800] GC(0)     Reference Processing: 0.1ms
[2023-12-11T13:52:18.634+0800] GC(0)       Reconsider SoftReferences: 0.0ms
[2023-12-11T13:52:18.634+0800] GC(0)         SoftRef (ms):                  skipped
[2023-12-11T13:52:18.634+0800] GC(0)       Notify Soft/WeakReferences: 0.1ms
[2023-12-11T13:52:18.634+0800] GC(0)         SoftRef (ms):                  Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.0, Workers: 9
[2023-12-11T13:52:18.634+0800] GC(0)         WeakRef (ms):                  Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.0, Workers: 9
[2023-12-11T13:52:18.634+0800] GC(0)         FinalRef (ms):                 Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.0, Workers: 9
[2023-12-11T13:52:18.634+0800] GC(0)         Total (ms):                    Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.0, Workers: 9
[2023-12-11T13:52:18.634+0800] GC(0)       Notify and keep alive finalizable: 0.0ms
[2023-12-11T13:52:18.634+0800] GC(0)         FinalRef (ms):                 skipped
[2023-12-11T13:52:18.634+0800] GC(0)       Notify PhantomReferences: 0.0ms
[2023-12-11T13:52:18.634+0800] GC(0)         PhantomRef (ms):               Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.0, Workers: 9
[2023-12-11T13:52:18.634+0800] GC(0)       SoftReference:
[2023-12-11T13:52:18.634+0800] GC(0)         Discovered: 0
[2023-12-11T13:52:18.634+0800] GC(0)         Cleared: 0
[2023-12-11T13:52:18.634+0800] GC(0)       WeakReference:
[2023-12-11T13:52:18.634+0800] GC(0)         Discovered: 496
[2023-12-11T13:52:18.634+0800] GC(0)         Cleared: 346
[2023-12-11T13:52:18.634+0800] GC(0)       FinalReference:
[2023-12-11T13:52:18.634+0800] GC(0)         Discovered: 0
[2023-12-11T13:52:18.634+0800] GC(0)         Cleared: 0
[2023-12-11T13:52:18.634+0800] GC(0)       PhantomReference:
[2023-12-11T13:52:18.634+0800] GC(0)         Discovered: 148
[2023-12-11T13:52:18.634+0800] GC(0)         Cleared: 23
[2023-12-11T13:52:18.634+0800] GC(0) Pause Young (Normal) (G1 Evacuation Pause) 27M->6M(520M) 2.118ms
// 省略部分日志
[2023-12-11T13:52:18.893+0800] GC(3) Pause Full (System.gc()) 320M->110M(400M) 5.324ms

当然,也可以使用 -XX:StartFlightRecording=dumponexit=true,filename=./gc.jfr 等参数来配置飞行记录,然后通过 JMCVisualVM 等可视化工具进行分析。下图便是使用 VisualVM 查看引用的一个示例:

由于篇幅的关系,内存泄露等问题没来得及分析,笔者将在下一篇文章中,结合相关示例进行说明。

下回见。