This post, the third in our production scheduling series, briefly describes queues in production and industry-tested methods of mitigating their impact, sprinkled with gratuitous Top Gun animated GIFs for no reason. For additional resources, check out:

Why do queues happen?

In short, queues happen because the arrival rate of work-in-progress (WIP) exceeds the capacity of a process to handle that WIP.

negative-ghostrider-the-pattern-is-fullRequesting permission for a fly-by? Too bad. F-14 Tomcat arrivals currently exceed the pattern's capacity.

Queues are caused by variability and high capacity utilization. Variability increases queues
linearly. Queue size increases exponentially as capacity utilization increases. In other words, high capacity utilization is the largest driver of queues, which are the root of adverse economic outcomes in manufacturing.

Running at higher-than-optimal capacity utilization is like flying in the Danger Zone, but not in the cool Top Gun way.

What's the big deal?

Manufacturers know queues matter. But it might be more than they think. Mismanaging queues can have considerable bottom-line economic impacts that adversely affect cycle time, quality, and efficiency. Optimizing queue size is a critical task.

need_for_speedMismanaging queues adversely affects your Need For Speed.

So what can we do about it?

Thankfully, smart people created Queuing Theory. Certain types of queues can be described mathematically, and those mathematical models can be put together to simulate real-world operations.

Before we dive in, let's define a few terms:

  • Inter-arrival time is the amount of time between two arrivals at a process.
  • Stationary arrival processes are those where the number of arrivals in any sub-interval only depends on the length and not when the interval happens. For example, an arrival process is stationary over a day if the number of expected arrivals over an hour (or any interval) is the same for any hour throughout the day.
  • Non-stationary arrival processes are the opposite of stationary ones. Arrival processes tend to be non-stationary over longer intervals (e.g., hours, days) and stationary over short intervals (e.g., minutes).

Modeling inter-arrival times

We'll use two measures to model inter-arrival times: average inter-arrival time and standard deviation, which is a rough measure for variability.

Take the example shown below. While Process A and Process B bother have the same average inter-arrival times, Process A has a higher standard deviation. Looking at the graph on the right, we can quickly tell that Process A's inter-arrival times have a greater variability.


But, of course it's never that simple. Standard deviation is an absolute measure of variability, meaning two different processes can have the same standard deviation, but one ends up being more variable than the other. Take Process A and B below. They both have the same standard deviation, but Process A is obviously more variable than Process B. This is especially evident when we plot both relative to their averages.

absolute_variability-1Credit: DeHoratius (2015).

That's why we use the coefficient of variation instead of standard deviation:


In the example above, Process A has a CVa of 10/10 = 1, while Process B has a CVa of 10/100 = 0.1. Process A exhibits higher arrival variability.

Measuring a queueing system

First, let's define some variables:

  • a is the average inter-arrival time.
  • p is the average activity time that the unit is being worked on.
  • m is the number of servers (i.e., the things doing the work). These can be machines or people.
  • Tq is the time in the queue.

Putting everything together give us this powerful model:


Let's walk through an example. Suppose average inter-arrival time is 45 seconds, average activity time is 120 seconds, there are 6 servers, utilization is 85%, CVa is 2, and CVp is 1.


Since flow time includes time in the queue and time being processed, the flow time is 213.5 seconds in the queue + 120 seconds average activity time = 333.5 seconds.

What happens if utilization is increased to 90%? Let's crunch the numbers.


Time in queue increases dramatically! Interestingly, if we calculated the time in queue for utilizations between 85-99%, using the same inputs from the example above, and plot Tq and utilization on a graph, it looks like this:


As mentioned above, queue size increases exponentially as capacity utilization increases. In other words, high capacity utilization is the largest driver of queues, which are the root of adverse economic outcomes in manufacturing. 

air_boss_flybyPattern at capacity? You might get an unauthorized flyby.

It's worth noting that utilization less than 100% means the system is stable and capacity can service all demand over time. Systems with utilization over 100% are unstable and queues will continue to grow over time.

So how can we manage queues?

In his book The Principles of Product Development Flow, Second Generation Lean Product Development, Donald Reinertsen outlines the relationships between capacity utilization, cycle time, and queue size, and elegantly presents two queue control principles.

The first queue size control principle

Don't control capacity utilization, control queue size. In manufacturing environments where capacity may be difficult to measure and track, using queue size as a control for capacity can be more straightforward.

Small changes to capacity utilization result in large changes to queues and cycle time. Using, as Reinertsen states, a "wide control band of queue size will force the system into a very tight range of capacity utilization."

The second queue size control principle

Don't control cycle time, control queue size. Cycle time is measured after a job leaves the system, making it a lagging indicator. Queues are a leading indicator; monitoring them helps identify potential problems more quickly.

queue_leading_indicatorCredit: Reinertsen (2009).

Adding servers

Adding servers, if that is a viable option, decreases utilization and time in queue. It doesn't change arrival time, process time, or the variability of those two processes. Flow time decreases dramatically at first, but then less so the more servers you add.


Pooling queues into a single queue reduces the overall time spent in the queue. Check out the below graphic from the 2011 WSJ article "The Science of Lines" - a single-file line leading to three cashiers is about three times as fast as having one line for each cashier.

wsj_the_science_of_linesCredit: Sudal and The Wall Street Journal (2011).

Cumulative Flow Diagram (CFD)

CFDs are a powerful tool that graphically depicts the flow of work as it passes through a manufacturing system. It provides information on lead time, cycle time, WIP, and queues – key performance indicators of your production scheduling processes. We'll provide a separate blog post that walks through how to set up a CFD.

Until next time

Well, it's been real. Leave a comment with a topic you'd like us to cover and we'll jump on it!

Your partner in production scheduling,



Submit a comment

You may also like

13 March, 2019

We mostly do software development on our laptops, which are obviously computers in a way that we are familiar with: scre...

Better Scheduling Software
Better Scheduling Software
3 February, 2020

What is the best way to quickly describe our powerful and full-featured software?

Level Up Production Bottleneck Hunting With These Tips
Level Up Production Bottleneck Hunting With These Tips
18 April, 2022

This post, the second in our production scheduling series (the first being Minimizing Makespan, Explained Using Attack H...