What is Boxplot | Box and Whisker Plot | 5 Advantages of Boxplot | Create Boxplot in Excel & R

In 21st century, probably box and whisker plot or boxplot is the mostly used graph in descriptive analytics. Not only in the field of statistics but also in other fields of data science. If you want to have clear understanding of boxplot, its uses, benefits and other details, then you are at right place. Also, creation of boxplot in Excel and R will be discussed.

What is Boxplot or Box and Whisker Plot?

    Among various analytical techniques, boxplot is one of the frequently used technique for descriptive statistics. boxplot is also called Five number summary though Median, 1st Quartile, 3rd Quartile, Maximum and Minimum number can be visualized in a single diagram. Also, outliers and skewness of data set can be easily identified.

    Use case– As an example, we have a data set and need to create regression model. So, first step is to remove any outlier form the data set so that the regression line should not be skewed. In such cases, boxplot is the simplest method to check whether data set is having outlier or nor and then remove outliers from the data set.

Why Box plot is called Box and Whisker Plot?

    Boxplot is also called Box and Whisker plot because of its structure. In the structure of Boxplot, Median, Q1 (1st Quartile) and Q3 (3rd Quartile) represented by a box. On the other hand, Q1 to minimum value and Q3 to maximum value is connected by a straight line which is looks like whisker. Though Boxplot is the combination of Box like structure and whisker, it is called Box and Whisker Plot.

Interpretation of Box and Whisker Plot/ How to read Boxplot?

Box and Whisker Plot
Box and Whisker Plot

    As discussed earlier that box and whisker plot is showing five-point summary of a data. So, let’s understand one by one by using the attached diagram of boxplot.

1st Quartile (25th percentile)– In the diagram, from where the box is starting from below, representing the 1st quartile or 25th percentile.

3rd Quartile (75th percentile)– As shown in the diagram, from where the box is ending towards upper side, representing 3rd Quartile or 75th percentile.

Median– There is a white line in between the box shown in the diagram which is representing Median or 2nd quartile.

Maximum value– As shown in the diagram, maximum value is visible in the boxplot which helps to understand the range of data set.

Minimum value– Like Maximum value, minimum value also visible in the boxplot.

    Apart from these five-points, there are upper and lower whisker which is connecting maximum and minimum value with 3rd and 1st Quartile respectively. Outlier can be visible beyond maximum and minimum value as dot. In the attached diagram, one outlier is clearly visible in the upper side of boxplot (red dot in the diagram).

What are the similarities and differences between Boxplot vs Bell Curve?

Similarities– Both Bell curve and Boxplot are showing spread of the data, skewness and outliers.

Differences– Boxplot is showing clear five-point summary where in Bell Curve it is not clearly visible through visualization.

What are the advantages of Boxplot?

Boxplot is being used for descriptive analytics and for the following advantages.

1. Find out outliers

2. Five-point summary of data in terms of Median, 1st Quartile (25th percentile), 3rd Quartile (75th percentile), Minimum value and Maximum value

3. Visualize the spread of the data

4. Identify the skewness of data

5. Side by side boxplot helps to visualize spread of different variables

History of Boxplot

Boxplot was first created in the year 1970 by the statistician John Tukey. After seven years, the methodology was published, and it started getting popular from 1977. Before that, basic structure of boxplot has been created in early 1950 in terms of showing the range of data. When John Tukey was first created Boxplot, it was drawn by hand after calculating different parameters manually. At that point of time it was a challenge for statistician to show summary of data in a small space where Boxplot solved the problem. Not only one variable but also multiple variable summary and comparison can be shown through Boxplot method.

How to create Boxplot in Excel?

In MS Excel, there is inbuilt boxplot chart option. Just follow the below steps to create Boxplot in Excel.

1. Select the entire data set. If you want to create one Boxplot select the one column else select multiple column to create side by side boxplot by using multiple variables.

2. Go to “Insert” Tab.

3. Go to “Insert Statistic Chart” and select Boxplot from the option, desired boxplot will appear in the worksheet.

How to create Boxplot in R?

Boxplot can be created in R by using “boxplot” function and without using any library. On the other hand, “ggplot2” can be used to create Boxplot. Here, “airquality” data set will be used as it is the inbuilt data set in R.

#### Normal Method ####

boxplot(airquality$Temp, col=’red’)

[From airquality data set, Temp column is being used to create the boxplot and colour defined red]

Boxplot in R
Boxplot in R (Normal Method)

#### By using ggplot2 library ####

library (ggplot2)

d<-ggplot(airquality, aes(x=Temp, y=Wind))+ geom_boxplot()

[From airquality data set, Temp (x) & Wind (y) columns are used to create the boxplot]

Boxplot in R (ggplot2)
Boxplot in R (ggplot2)

Notes on Boxplot in Python

To create boxplot in Python, matplotlib is the most appropriate library in python. As matplotlib is the visualization library, customized and good-looking boxplot can be created. If user wants to change the colour, size and overall looks of boxplot, matplotlib is the best library to use. Boxplot can also be created by using seaborn library of python.

Follow the below steps to create boxplot in python.

1. First step to create boxplot in python is to import required libraries. In this case, need to import numpy and matplotlib libraries as shown below.

import numpy as np

import matplotlib.pyplot as plt

2. Next step is to import the data set and give a name.

3. To create boxplot, plt.boxplot() function should be used as shown below. Also, column name should be specified.

plt.boxplot(column1)

Additionally, if user wants to show different column name as label, below code structure should be used.

plt.boxplot((column1), labels=[col])

4. To show boxplot, plt.show() function should be used

plt.show()

Please follow and like us:

Leave a Comment