Java 理解 ThreadPoolExecutor 实现原理

使用线程池（ThreadPoolExecutor）的好处是减少在创建和销毁线程上所花的时间以及系统资源的开销，解决资源不足的问题。如果不使用线程池，有可能造成系统创建大量同类线程而导致消耗完内存或者“过度切换”的问题。 – 阿里Java开发手册

版本

JDK 1.8

本节目标

理解线程池核心参数
理解线程池工作原理
理解线程池核心方法

线程池的核心参数和构造方法

ctl

// 线程池核心变量，包含线程池的运行状态和有效线程数，利用二进制的位掩码实现
private final AtomicInteger ctl = new AtomicInteger(ctlOf(RUNNING, 0));
private static final int COUNT_BITS = Integer.SIZE - 3;
private static final int CAPACITY   = (1 << COUNT_BITS) - 1;

// runState is stored in the high-order bits
// 线程池状态
private static final int RUNNING    = -1 << COUNT_BITS;
private static final int SHUTDOWN   =  0 << COUNT_BITS;
private static final int STOP       =  1 << COUNT_BITS;
private static final int TIDYING    =  2 << COUNT_BITS;
private static final int TERMINATED =  3 << COUNT_BITS;

// Packing and unpacking ctl
// 获取当前线程池运行状态
private static int runStateOf(int c)     { return c & ~CAPACITY; }
// 获取当前线程池有效线程数
private static int workerCountOf(int c)  { return c & CAPACITY; }
// 打包ctl变量
private static int ctlOf(int rs, int wc) { return rs | wc; }

/*
 * Bit field accessors that don't require unpacking ctl.
 * These depend on the bit layout and on workerCount being never negative.
 */

private static boolean runStateLessThan(int c, int s) {
    return c < s;
}

private static boolean runStateAtLeast(int c, int s) {
    return c >= s;
}

private static boolean isRunning(int c) {
    return c < SHUTDOWN;
}

JDK7 以后，线程池的状态和有效线程数通过 ctl 这个变量表示（使用二进制的位掩码来实现，这里我们不深究），理解上述几个方法作用即可，不影响下面的源码阅读

关于线程池的五种状态

RUNNING：接受新任务并处理队列中的任务
SHUTDOWN ：不接受新任务，但处理队列中的任务
STOP ：不接受新任务，不处理队列中的任务，并中断正在进行的任务（中断并不是强制的，只是修改了Thread的状态，是否中断取决于Runnable 的实现逻辑）
TIDYING ：所有任务都已终止，workerCount为0时，线程池会过度到该状态，并即将调用 terminate()
TERMINATED ：terminated() 调用完成；线程池中止

线程池状态的转换

RUNNING => SHUTDOWN ：调用 shutdown()
RUNNING / SHUTDOWN => STOP ：调用 shutdownNow() （该方法会返回队列中未执行的任务）
SHUTDOWN => TIDYING：当线程池和队列都为空时
STOP => TIDYING：当线程池为空时
TIDYING => TERMINATED：当 terminated() 调用完成时

构造方法

线程池最终都是调用如下构造方法

public ThreadPoolExecutor(int corePoolSize,
						  int maximumPoolSize,
						  long keepAliveTime,
						  TimeUnit unit,
						  BlockingQueue<Runnable> workQueue,
						  ThreadFactory threadFactory,
						  RejectedExecutionHandler handler) {
  // 省略
}

核心参数

我们来看一下线程池中的核心参数都是什么作用

private final BlockingQueue<Runnable> workQueue; // 阻塞队列，用于缓存任务

private final ReentrantLock mainLock = new ReentrantLock(); // 线程池主锁

private final HashSet<Worker> workers = new HashSet<Worker>(); // 工作线程集合

private final Condition termination = mainLock.newCondition(); // awaitTermination() 方法的等待条件

private int largestPoolSize; // 记录最大线程池大小

private long completedTaskCount; //用来记录线程池中曾经出现过的最大线程数

private volatile ThreadFactory threadFactory; // 线程工厂，用于创建线程

private volatile RejectedExecutionHandler handler; // 任务拒绝时的策略

private volatile long keepAliveTime; // 线程存活时间
									 // 当线程数超过核心池数时，或允许核心池线程超时，该参数会起作用。否则一直会等待新的任务

private volatile boolean allowCoreThreadTimeOut; // 是否允许核心池线程超时

private volatile int corePoolSize; // 核心线程池数量

private volatile int maximumPoolSize; // 最大线程池数量

workQueue

这个队列的作用，和之前的 Java 理解生产者-消费者设计模式中讲到的缓冲队列，作用很相似，或者说线程池就是生产者消费者模式的一种实现。

关于 handler

ThreadPoolExecutor.AbortPolicy:丢弃任务并抛出RejectedExecutionException异常。
ThreadPoolExecutor.DiscardPolicy：也是丢弃任务，但是不抛出异常。
ThreadPoolExecutor.DiscardOldestPolicy：丢弃队列最前面的任务，然后重新尝试执行任务（重复此过程）
ThreadPoolExecutor.CallerRunsPolicy：当前任务自己决定

corePoolSize 和 maximumPoolSize

如果你对这两个参数有疑问，看完下面的栗子你会清晰很多

下面我们来举个栗子来更好的理解一下线程池

理解线程池工作原理

假如有一个工厂，工厂里面有10个工人，每个工人同时只能做一件任务。

因此只要当10个工人中有工人是空闲的，来了任务就分配给空闲的工人做；

当10个工人都有任务在做时，如果还来了任务，就把任务进行排队等待；

每个工人做完自己的任务后，会去任务队列中领取新的任务；

如果说新任务数目增长的速度远远大于工人做任务的速度（任务累积过多时），那么此时工厂主管可能会想补救措施，比如重新招4个临时工人进来；

然后就将任务也分配给这4个临时工人做；

如果说着14个工人做任务的速度还是不够，此时工厂主管可能就要考虑不再接收新的任务或者抛弃前面的一些任务了。

当这14个工人当中有人空闲时，而新任务增长的速度又比较缓慢，工厂主管可能就考虑辞掉4个临时工了，只保持原来的10个工人，毕竟请额外的工人是要花钱的。

开始工厂的10个工人，就是 corePoolSize (核心池数量)；
当10个人都在工作时 (核心池达到 corePoolSize)，任务排队等待时，会缓存到 workQueue 中；
当任务累积过多时(达到 workQueue 最大值时)，找临时工；
14个临时工，就是 maximumPoolSize (数量)；
如果此时工作速度还是不够，线程池这时会考虑拒绝任务，具体由拒绝策略决定

理解线程池核心方法

execute()

线程池中所有执行任务的方法有关的方法，都会调用 execute()。如果你理解了上述的小例子，再来看这个会清晰很多

public void execute(Runnable command) {
	if (command == null)
		throw new NullPointerException();
	/*
	 * Proceed in 3 steps:
	 *
	 * 1. If fewer than corePoolSize threads are running, try to
	 * start a new thread with the given command as its first
	 * task.  The call to addWorker atomically checks runState and
	 * workerCount, and so prevents false alarms that would add
	 * threads when it shouldn't, by returning false.
	 *
	 * 2. If a task can be successfully queued, then we still need
	 * to double-check whether we should have added a thread
	 * (because existing ones died since last checking) or that
	 * the pool shut down since entry into this method. So we
	 * recheck state and if necessary roll back the enqueuing if
	 * stopped, or start a new thread if there are none.
	 *
	 * 3. If we cannot queue task, then we try to add a new
	 * thread.  If it fails, we know we are shut down or saturated
	 * and so reject the task.
	 */
	int c = ctl.get();
	if (workerCountOf(c) < corePoolSize) {
		if (addWorker(command, true))
			return;
		c = ctl.get();
	}
	if (isRunning(c) && workQueue.offer(command)) {
		int recheck = ctl.get();
		if (! isRunning(recheck) && remove(command))
			reject(command);
		else if (workerCountOf(recheck) == 0)
			addWorker(null, false);
	}
	else if (!addWorker(command, false))
		reject(command);
}

分析execute()

step 1

1）首先检查当前有效线程数 是否小于 核心池数量
if (workerCountOf(c) < corePoolSize)

2）如果满足上述条件，则尝试向核心池添加一个工作线程（addWorker() 第二个参数决定了是添加核心池，还是最大池）
if (addWorker(command, true))

3）如果成功则退出方法，否则将执行 step2

step 2

1）如果当前线程池处于运行状态 && 尝试向缓冲队列添加任务
if (isRunning(c) && workQueue.offer(command))

2）如果线程池正在运行并且缓冲队列添加任务成功，进行 double check（再次检查）

3）如果此时线程池非运行状态 => 移除队列 => 拒绝当前任务，退出方法
（这么做是为了，当线程池不可用时及时回滚）

1 2	if (! isRunning(recheck) && remove(command)) reject(command);

4）如果当前有效线程数为0，则创建一个无任务的工作线程（此时这个线程会去队列中获取任务）

step 3

1）当无法无法向核心池和队列中添加任务时，线程池会再尝试向最大池中添加一个工作线程，如果失败则拒绝该任务

1 2	else if (!addWorker(command, false)) reject(command);

图解execute()

根据上述的步骤画了如下的这个图，希望能帮助大家更好的理解

addWorker()

在分析execute() 方法时，我们已经知道了 addWorker() 的作用了，可以向核心池或者最大池添加一个工作线程。我们来看一下这个方法都做了什么

private boolean addWorker(Runnable firstTask, boolean core) {
	retry:
	for (;;) {
		int c = ctl.get();
		int rs = runStateOf(c);

		// Check if queue empty only if necessary.
		if (rs >= SHUTDOWN &&
			! (rs == SHUTDOWN &&
			   firstTask == null &&
			   ! workQueue.isEmpty()))
			return false;

		for (;;) {
			int wc = workerCountOf(c);
			if (wc >= CAPACITY ||
				wc >= (core ? corePoolSize : maximumPoolSize))
				return false;
			if (compareAndIncrementWorkerCount(c))
				break retry;
			c = ctl.get();  // Re-read ctl
			if (runStateOf(c) != rs)
				continue retry;
			// else CAS failed due to workerCount change; retry inner loop
		}
	}

	boolean workerStarted = false;
	boolean workerAdded = false;
	Worker w = null;
	try {
		w = new Worker(firstTask);
		final Thread t = w.thread;
		if (t != null) {
			final ReentrantLock mainLock = this.mainLock;
			mainLock.lock();
			try {
				// Recheck while holding lock.
				// Back out on ThreadFactory failure or if
				// shut down before lock acquired.
				int rs = runStateOf(ctl.get());

				if (rs < SHUTDOWN ||
					(rs == SHUTDOWN && firstTask == null)) {
					if (t.isAlive()) // precheck that t is startable
						throw new IllegalThreadStateException();
					workers.add(w);
					int s = workers.size();
					if (s > largestPoolSize)
						largestPoolSize = s;
					workerAdded = true;
				}
			} finally {
				mainLock.unlock();
			}
			if (workerAdded) {
				t.start();
				workerStarted = true;
			}
		}
	} finally {
		if (! workerStarted)
			addWorkerFailed(w);
	}
	return workerStarted;
}

这个方法代码看似很复杂，没关系，我们一步一步来分析

step 1
先看第一部分

retry:
for (;;) {
    int c = ctl.get();
    int rs = runStateOf(c);

    // Check if queue empty only if necessary.
    if (rs >= SHUTDOWN &&
        ! (rs == SHUTDOWN &&
           firstTask == null &&
           ! workQueue.isEmpty()))
        return false;

    for (;;) {
        int wc = workerCountOf(c);
        if (wc >= CAPACITY ||
            wc >= (core ? corePoolSize : maximumPoolSize))
            return false;
        if (compareAndIncrementWorkerCount(c))
            break retry;
        c = ctl.get();  // Re-read ctl
        if (runStateOf(c) != rs)
            continue retry;
        // else CAS failed due to workerCount change; retry inner loop
    }
}

这一部分代码，主要是判断，是否可以添加一个工作线程。

在execute()中已经判断过if (workerCountOf(c) < corePoolSize)了，为什么还要再判断？

因为在多线程环境中，当上下文切换到这里的时候，可能线程池已经关闭了，或者其他线程提交了任务，导致workerCountOf(c) > corePoolSize

1）首先进入第一个无限for循环，获取ctl对象，获取当前线程的运行状态，然后判断

if (rs >= SHUTDOWN &&
	! (rs == SHUTDOWN &&
	   firstTask == null &&
	   ! workQueue.isEmpty()))
	return false;

这个判断的意义为，当线程池运行状态 >= SHUTDOWN 时，向添加一个工作线程必须同时满足

rs == SHUTDOWN
firstTask == null
! workQueue.isEmpty()
三个条件，否则添加线程失败

所以当线程状态为SHUTDOWN时，线程池允许添加一个无任务的工作线程去执行队列中的任务。

2）进入第二个无限for循环

for (;;) {
	int wc = workerCountOf(c);
	if (wc >= CAPACITY ||
		wc >= (core ? corePoolSize : maximumPoolSize))
		return false;
	if (compareAndIncrementWorkerCount(c))
		break retry;
	c = ctl.get();  // Re-read ctl
	if (runStateOf(c) != rs)
		continue retry;
	// else CAS failed due to workerCount change; retry inner loop
}

获取当前有效线程数，if 有效线程数 >= 容量 || 有效线程数 >= 核心池数量/最大池数量，则return false; 添加线程失败

如果有效线程数在合理范围之内，尝试使用 CAS 自增有效线程数（CAS 是Java中的乐观锁，不了解的小伙伴可以Google一下），乐观锁自增成功，代表当前无其他线程竞争，相当于获取到锁了

如果自增成功，break retry; 跳出这两个循环，执行下面的代码

自增失败，检查线程池状态，如果线程池状态发生变化，回到第一个for 继续执行；否则继续在第二个for 中；

step 2
下面这部分就比较简单了

boolean workerStarted = false;
boolean workerAdded = false;
Worker w = null;
try {
	w = new Worker(firstTask);
	final Thread t = w.thread;
	if (t != null) {
		final ReentrantLock mainLock = this.mainLock;
		mainLock.lock();
		try {
			// Recheck while holding lock.
			// Back out on ThreadFactory failure or if
			// shut down before lock acquired.
			int rs = runStateOf(ctl.get());

			if (rs < SHUTDOWN ||
				(rs == SHUTDOWN && firstTask == null)) {
				if (t.isAlive()) // precheck that t is startable
					throw new IllegalThreadStateException();
				workers.add(w);
				int s = workers.size();
				if (s > largestPoolSize)
					largestPoolSize = s;
				workerAdded = true;
			}
		} finally {
			mainLock.unlock();
		}
		if (workerAdded) {
			t.start();
			workerStarted = true;
		}
	}
} finally {
	if (! workerStarted)
		addWorkerFailed(w);
}
return workerStarted;

1）创建工作线程对象Worker；

2）加锁，判断当前线程池状态是否允许启动线程；
如果可以，将线程加入workers（这个变量在需要遍历所有工作线程时会用到），记录最大值，启动线程；

3）如果线程启动失败，执行addWorkerFailed（从workers中移除该对象，有效线程数减一，尝试中止线程池）

Worker

Worker对象是线程池中的内部类，线程的复用、线程超时都是在这实现的

private final class Worker
	extends AbstractQueuedSynchronizer
	implements Runnable
{
	// 这里我们只关心Run()，省略了其他源码，感兴趣的同学可以自己看一下源码
	public void run() {
		runWorker(this);
	}
}

Worker 实现了 Runnable，我们这里只关心 Worker 的run方法中做了什么，关于 AbstractQueuedSynchronizer 有关的不在本文讨论

下面我们分析一下runWorker()

final void runWorker(Worker w) {
	Thread wt = Thread.currentThread();
	Runnable task = w.firstTask;
	w.firstTask = null;
	w.unlock(); // allow interrupts
	boolean completedAbruptly = true;
	try {
		while (task != null || (task = getTask()) != null) {
			w.lock();
			// If pool is stopping, ensure thread is interrupted;
			// if not, ensure thread is not interrupted.  This
			// requires a recheck in second case to deal with
			// shutdownNow race while clearing interrupt
			if ((runStateAtLeast(ctl.get(), STOP) ||
				 (Thread.interrupted() &&
				  runStateAtLeast(ctl.get(), STOP))) &&
				!wt.isInterrupted())
				wt.interrupt();
			try {
				beforeExecute(wt, task);
				Throwable thrown = null;
				try {
					task.run();
				} catch (RuntimeException x) {
					thrown = x; throw x;
				} catch (Error x) {
					thrown = x; throw x;
				} catch (Throwable x) {
					thrown = x; throw new Error(x);
				} finally {
					afterExecute(task, thrown);
				}
			} finally {
				task = null;
				w.completedTasks++;
				w.unlock();
			}
		}
		// 通过该变量判断是用户任务抛出异常结束，还是线程池自然结束
		completedAbruptly = false;
	} finally {
		processWorkerExit(w, completedAbruptly);
	}
}

1）

1
2
3

 while (task != null || (task = getTask()) != null) {
// ...
}

不对地通过getTask() 从队列中获取任务，可以间接通过getTask()的返回值控制线程的结束

2）

// If pool is stopping, ensure thread is interrupted;
// if not, ensure thread is not interrupted.  This
// requires a recheck in second case to deal with
// shutdownNow race while clearing interrupt
if ((runStateAtLeast(ctl.get(), STOP) ||
	 (Thread.interrupted() &&
	  runStateAtLeast(ctl.get(), STOP))) &&
	!wt.isInterrupted())
	wt.interrupt();

接下来这个判断，其实我是没有太理解的，暂且认为是保证当线程池STOP时，线程一定会被打断

3）执行Runnable

try {
	beforeExecute(wt, task);
	Throwable thrown = null;
	try {
		task.run();
	} catch (RuntimeException x) {
		thrown = x; throw x;
	} catch (Error x) {
		thrown = x; throw x;
	} catch (Throwable x) {
		thrown = x; throw new Error(x);
	} finally {
		afterExecute(task, thrown);
	}
} finally {
	task = null;
	w.completedTasks++;
	w.unlock();
}

beforeExecute(wt, task); 和 afterExecute(task, thrown); 默认是没有实现的，我们可以自己扩展

4）最后是当跳出while循环后（getTask() == null或者用户任务抛出异常），会去执行processWorkerExit(w, completedAbruptly);线程退出工作（该方法会根据线程池状态，尝试中止线程池。然后会考虑是结束当前线程，还是再新建一个工作线程，这里就不细说了）

我们再来看一下 getTask() 方法

private Runnable getTask() {
	boolean timedOut = false; // Did the last poll() time out?

	for (;;) {
		int c = ctl.get();
		int rs = runStateOf(c);

		// Check if queue empty only if necessary.
		if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
			decrementWorkerCount();
			return null;
		}

		int wc = workerCountOf(c);

		// Are workers subject to culling?
		boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

		if ((wc > maximumPoolSize || (timed && timedOut))
			&& (wc > 1 || workQueue.isEmpty())) {
			if (compareAndDecrementWorkerCount(c))
				return null;
			continue;
		}

		try {
			Runnable r = timed ?
				workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
				workQueue.take();
			if (r != null)
				return r;
			timedOut = true;
		} catch (InterruptedException retry) {
			timedOut = false;
		}
	}
}

1）第一段不做解释，满足该条件时，return null; 退出线程

// Check if queue empty only if necessary.
if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
	decrementWorkerCount();
	return null;
}

2）下面这段很有意思

int wc = workerCountOf(c);

// Are workers subject to culling?
// 是否允许线程超时
// 当我们设置了允许核心池超时 或者 有效线程数 > 核心池数量的时候
// 线程池会考虑为我们清除掉一些线程
boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

// (有效线程数 > 最大线程池数量 || (允许超时 && 超时) ) 
//	&& (有效线程数 > 1 || 或者队列为空时)
if ((wc > maximumPoolSize || (timed && timedOut)) // timedOut 表示当前线程超时，下文会说到
	&& (wc > 1 || workQueue.isEmpty())) {
	if (compareAndDecrementWorkerCount(c))
		return null;
	continue;
}

我在第一次看这段代码的时候，傻傻的以为 timedOut 不是永远为false吗，我以为JDK源码怎么写出这么个Bug。别忘了当前的getTask()方法也是在一个无限循环里

3）

try {
	Runnable r = timed ?
		workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
		workQueue.take();
	if (r != null)
		return r;
	timedOut = true;
} catch (InterruptedException retry) {
	timedOut = false;
}

根据 timed，决定调用使用poll() 或者 take()。

poll 在队列为空时会等待指定时间，如果这期间没有获取到元素，则return null
take 则在队列为空时会一直等待，直至队列中被添加新的任务，或者被打断；
这两个方法都会被shutdown() 或者 shutdownNow的 thread.interrupt()打断；
如果被打断则回到第一步

至此 execute() 方法所涉及的逻辑我们差不多分析完了

备注

线程池使用

public class Test {

    public static void main(String[] args) {
        ThreadPoolExecutor executor = new ThreadPoolExecutor(5, 10, 60, TimeUnit.SECONDS,
                new ArrayBlockingQueue<>(5));

        executor.execute(() -> {
            // 业务逻辑
        });

        executor.shutdown();
    }

}

合理配置线程池的大小

一般需要根据任务的类型来配置线程池大小：

　　如果是CPU密集型任务，参考值可以设为 N+1 （N 为CPU核心数）

　　如果是IO密集型任务，参考值可以设置为2*N

　　当然，这只是一个参考值，具体的设置还需要根据实际情况进行调整，比如可以先将线程池大小设置为参考值，再观察任务运行情况和系统负载、资源利用率来进行适当调整。