In 21^{st} century, probably box and whisker plot or boxplot is the mostly used graph in descriptive analytics. Not only in the field of statistics but also in other fields of data science. If you want to have clear understanding of boxplot, its uses, benefits and other details, then you are at right place. Also, creation of boxplot in Excel and R will be discussed.

Table of Contents

**What is Boxplot or Box and Whisker Plot?**

Among various analytical techniques, boxplot is one of the frequently used technique for descriptive statistics. boxplot is also called Five number summary though Median, 1^{st} Quartile, 3^{rd} Quartile, Maximum and Minimum number can be visualized in a single diagram. Also, outliers and skewness of data set can be easily identified.

** Use case**– As an example, we have a data set and need to create regression model. So, first step is to remove any outlier form the data set so that the regression line should not be skewed. In such cases, boxplot is the simplest method to check whether data set is having outlier or nor and then remove outliers from the data set.

**Why Box plot is called Box and Whisker Plot?**

Boxplot is also called Box and Whisker plot because of its structure. In the structure of Boxplot, Median, Q1 (1^{st} Quartile) and Q3 (3^{rd} Quartile) represented by a box. On the other hand, Q1 to minimum value and Q3 to maximum value is connected by a straight line which is looks like whisker. Though Boxplot is the combination of Box like structure and whisker, it is called Box and Whisker Plot.

**Interpretation of Box and Whisker Plot/ How to read Boxplot?**

As discussed earlier that box and whisker plot is showing five-point summary of a data. So, let’s understand one by one by using the attached diagram of boxplot.

**1 ^{st} Quartile (25^{th} percentile)**– In the diagram, from where the box is starting from below, representing the 1

^{st}quartile or 25

^{th}percentile.

**3 ^{rd} Quartile (75^{th} percentile)**– As shown in the diagram, from where the box is ending towards upper side, representing 3

^{rd}Quartile or 75

^{th}percentile.

**Median**– There is a white line in between the box shown in the diagram which is representing Median or 2^{nd} quartile.

**Maximum value**– As shown in the diagram, maximum value is visible in the boxplot which helps to understand the range of data set.

**Minimum value**– Like Maximum value, minimum value also visible in the boxplot.

Apart from these five-points, there are upper and lower whisker which is connecting maximum and minimum value with 3^{rd} and 1^{st} Quartile respectively. Outlier can be visible beyond maximum and minimum value as dot. In the attached diagram, one outlier is clearly visible in the upper side of boxplot (red dot in the diagram).

**What are the similarities and differences between Boxplot vs Bell Curve?**

**Similarities**– Both Bell curve and Boxplot are showing spread of the data, skewness and outliers.

**Differences**– Boxplot is showing clear five-point summary where in Bell Curve it is not clearly visible through visualization.

**What are the advantages of Boxplot?**

Boxplot is being used for descriptive analytics and for the following advantages.

1. Find out outliers

2. Five-point summary of data in terms of Median, 1^{st} Quartile (25^{th} percentile), 3^{rd} Quartile (75^{th} percentile), Minimum value and Maximum value

3. Visualize the spread of the data

4. Identify the skewness of data

5. Side by side boxplot helps to visualize spread of different variables

**History of Boxplot**

Boxplot was first created in the year **1970** by the statistician **John Tukey**. After seven years, the methodology was published, and it started getting popular from 1977. Before that, basic structure of boxplot has been created in early 1950 in terms of showing the range of data. When John Tukey was first created Boxplot, it was drawn by hand after calculating different parameters manually. At that point of time it was a challenge for statistician to show summary of data in a small space where Boxplot solved the problem. Not only one variable but also multiple variable summary and comparison can be shown through Boxplot method.

**How to create Boxplot in Excel?**

In MS Excel, there is inbuilt boxplot chart option. Just follow the below steps to create Boxplot in Excel.

1. **Select the entire data set**. If you want to create one Boxplot select the one column else select multiple column to create side by side boxplot by using multiple variables.

2. Go to **“Insert”** Tab.

3. Go to **“Insert Statistic Chart”** and select Boxplot from the option, desired boxplot will appear in the worksheet.

**How to create Boxplot in R?**

Boxplot can be created in R by using “boxplot” function and without using any library. On the other hand, “ggplot2” can be used to create Boxplot. Here, “airquality” data set will be used as it is the inbuilt data set in R.

*#### Normal Method ####*

boxplot(airquality$Temp, col=’red’)

[From airquality data set, Temp column is being used to create the boxplot and colour defined red]

*#### By using ggplot2 library ####*

library (ggplot2)

d<-ggplot(airquality, aes(x=Temp, y=Wind))+ geom_boxplot()

[From airquality data set, Temp (x) & Wind (y) columns are used to create the boxplot]

**Notes on Boxplot in Python**

To create boxplot in Python, matplotlib is the most appropriate library in python. As matplotlib is the visualization library, customized and good-looking boxplot can be created. If user wants to change the colour, size and overall looks of boxplot, matplotlib is the best library to use. Boxplot can also be created by using seaborn library of python.

Follow the below steps to create boxplot in python.

1. First step to create boxplot in python is to import required libraries. In this case, need to import **numpy** and **matplotlib** libraries as shown below.

import numpy as np

import matplotlib.pyplot as plt

2. Next step is to import the data set and give a name.

3. To create boxplot, plt.boxplot() function should be used as shown below. Also, column name should be specified.

plt.boxplot(column1)

Additionally, if user wants to show different column name as label, below code structure should be used.

plt.boxplot((column1), labels=[col])

4. To show boxplot, plt.show() function should be used

plt.show()