Box Plots

Box-and-whisker diagrams, or Box Plots, use theof the box. Find the inter-quartile range (IQR) by
concept of breaking a data set into fourths, or quartiles,subtracting the value of the first quartile boundary from
to create a display. The box part of the diagram isthat of the third quartile boundary.
based on the middle (the second and third quartiles) of
the data set. The whiskers are lines that extend from1. Smallest data point is bigger than or equal to Q1 -1.5
either side of the box. The maximum length of theIQR
whiskers is calculated based on the length of the box.2. Largest data point is less than or equal to Q3 +1.5
The actual length of each whisker is determined afterIQR
considering the data points in the first and the fourth3. Any points not in the interval [Q1-1.5 IQR; Q3+1.5 IQR]
quartiles.are plotted separately.
Although box-and-whisker diagrams present less11. Multiply the IQR by 1.5. (The use of 1.5 as a multiplier
information than histograms or dot plots, they do say ais a convention that has no exact statistical basis.
lot about distribution, location and spread of theMultiplying by this constant helps take into consideration
represented data. They are particularly valuablethe fact that the first and fourth quartiles will naturally
because several box plots can be placed next tohave a somewhat wider dispersion than the second
each other in a single diagram for easy comparison ofand third quartiles.)
multiple data sets.12. Subtract the value of 1.5(IQR) from the value of the
What can it do for you?first quartile boundary. Find the smallest data point in
If your improvement project involves a relatively limitedyour list that is equal to or larger than this value. Make
amount of individual quantitative data, aa tick mark representing this data point to the left of
box-and-whisker diagram can give you an instantyour box (or above, if you used a vertical scale). Draw
picture of the shape of variation in your process. Oftena line, the first whisker, from the side of the box to the
this can provide an immediate insight into the searchtick mark.
strategies you could use to find the cause of that13. Add the value of 1.5(IQR) to the value of the third
variation.quartile boundary. Find the largest data point in your list
Box-and-whisker diagrams are especially valuable tothat is equal to or smaller than this value. Make a tick
compare the output of two processes creating themark representing this data point to the right of your
same characteristic or to track improvement in a singlebox (or below, if you used a vertical scale). Draw
process. They can be used throughout the phases ofanother whisker to this tick mark.
the Lean Six Sigma methodology, but you will find14. It is possible that some data points in your list will lie
box-and-whisker diagrams particularly useful in theoutside of the ends of the whiskers you determined in
analyze phase.steps 12 and 13. These points are called outliers. Plot
How do you do it?any outliers as dots beyond the whiskers.
1. Decide which Critical-To-Quality (CTQ) characteristic[Note: steps 3 through 14 happen automatically if you
you wish to examine. This CTQ must be measurableuse Excel, Minitab, or JMP to create your
on a linear scale. That is, the incremental valuebox-and-whisker diagram. If you are familiar with these
between units of measurement must be the same. Forsoftware packages, their use can greatly simplify the
example, time, temperature, dimension and spatialprocess of making effective box-and-whisker
relationships can usually be measured in consistentdiagrams.]
incremental units.15. Title and label your box-and-whisker diagram.
2. Measure the characteristic and record the results. IfNow what?
the characteristic is continually being produced, such asThe shape that your box-and-whisker diagram takes
voltage in a line or temperature in an oven, or if theretells a lot about your process.
are too many items being produced to measure all ofOne way to help you interpret box plots is to imagine
them, you will have to sample. Take care to ensurethat the way a data set looks as a histogram is
that your sampling is random.something like a mountain viewed from ground level
3. Count the number of individual data points.and a box-and-whisker diagram is something like a
4. List the data points in ascending order.contour map of that mountain as viewed from above.
5. Find the median value. If there are an odd number ofIn a Skewed histogram and box plot compared
data points, the median is the data point that is halfwayThe second-quartile box is considerably larger than the
between the largest and the smallest ones. (Forthird-quartile box, and the whisker associated with the
example, if there are 35 data points, the median valuefirst quartile extends almost to the end of the 1.5 IQR
is the value of the 18th data point from either the toplimit. An outlier beyond the 1.5 IQR limit of the whisker
or the bottom of the list.) If there is an even number offurther emphasizes the fact that the data is strongly
points, the median is halfway between the two pointsskewed in this direction. On the other side of the
that occupy the center most position. (If there were 36distribution, the whisker associated with the fourth
points, the median would be halfway between point 18quartile is well within the 1.5 IQR. In fact, the
and point 19. To find the median value, add the valuesfourth-quartile whisker is shorter than the third-quartile
of points 18 and 19, and divide the result by 2.) If youbox. A histogram of this data would show a strongly
think of the list of data points being divided intoskewed distribution verging on a precipice that fell off
quarters (quartiles), the median is the boundaryat the high end of the values. This kind of data set
between the second and the third quartile.often occurs when there is a natural limit at one end of
Order Value Boundarythe distribution or a 100% screening is done for one
1 27.75specification limit.
2 37.35Although box-and-whisker diagrams can be oriented
3 38.35horizontally, they are more often displayed vertically,
4 38.35with lower values at the bottom of the scale.
5 38.75Normal distribution curve and box plot compared
Second Quartile 39.250The second- and third-quartile boxes are
6 39.75approximately the same size. The whiskers are similar
7 40.50to each other in length and extend close to the 1.5 IQR
8 41.00limit. If the data set were actually a combination of two
9 41.15different distributions, for example, material from two
10 42.55suppliers or two machines, it might form a histogram
Third Quartile 42.725that looked like a plateau or a mountain with twin
11 42.90peaks.
12 43.60Plateau histogram and box plot compared
13 43.85The box plot would show an even distribution, but
14 47.30would have relatively large boxes and relatively short
15 47.90whiskers. If there were a small amount of data from a
Fourth Quartile 48.025different distribution included in the data set, for
16 48.15example, if there were a short-term process
17 49.86abnormality or a data collection error, the histogram
18 51.25formed would look like a mountain with a small isolated
19 51.60peak.
20 56.00Isolated peak histogram and box plot compared
Data table divided into quartilesThe box plot for that data set would look like one for
6. The next step is to find the boundaries between thea normal distribution but with a number of outliers
first and second and the third and fourth quartiles. Thebeyond one whisker.
first quartile boundary is halfway between the lastSome final tips
data point in the first quartile and the first data point inA box-and-whisker diagram is an easy way to
the second quartile. (If one data point is on the median,compare processes or to chart the improvement
that data point is considered to be the last point in theprocess in one process. Box-and-whisker diagrams
second quartile and the first point in the third quartile.) Incan quickly give you a comparative feel of the
a similar way, find the third quartile boundary, thedistribution of sets of data. They show the distributional
halfway point between the last value in the thirdspread through the length of the box and the whiskers.
quartile and the first value in the fourth quartile.Some idea of the symmetry of the distribution can
7. Draw and label a scale line with values. The value ofalso be gained by comparing the two segments of the
the scale should begin lower than your lowest valuebox and the relative lengths of the whiskers. The
and extend higher than your highest value. The scaleexistence and displacement of outliers gives some
line may be either vertical or horizontal.indication of the level of control in the process.
8. Using the scale as a guideline, create a box aboveTwo or more box-and-whisker diagrams drawn side
or to the right of the scale. One end of the box will beby side to the same scale are an effective way to
the first quartile boundary; the other will be the thirdcompare samples in a way that is compact and
quartile boundary. (The width of the box is somewhatuncluttered. Many box plots can be added to a
arbitrary. Boxes tend to be long and thin. As an option,diagram without creating visual overload.
if you have multiple data sets with different numbersNot only can box-and-whisker diagrams help you see
of data points in each set, make the width of thewhich processes need improvement, by comparing
boxes so that they correspond roughly with theinitial box-and-whisker diagrams with subsequent ones,
relative quantity of data represented in each box.)they can also help you track that improvement. If
9. Draw a line through the box to represent the medianspecification limits or improvement targets are involved
(second quartile boundary).in your process, they can be added to the diagram to
10. The next step is to draw the whiskers on the endshelp visualize progress.