Statistical functions are must for data analyst when using excel as tool and using statistical methods. There are 110 statistical functions in Microsoft Excel. Out of those, there are 25 statistical functions which are useful and must learn for data analyst. In this article, 25 statistical functions of excel will be on descriptive analytics, predictive analytics and inferential statistics. So, let’s start 25 helpful statistical functions on excel.
We have categorized the statistical functions into ten parts.
Measure of Dispersion
Central Tendency
Information
Distribution
Correlation
Regression
Forecasting
Hypothesis Testing
Inferential Statistics
Probability
Table of Contents
Measure of Dispersion:
STDEV.P/ STDEV.S
Standard deviation is the measure of dispersed of the data compared to mean or average. In other words, it is the measure of distribution of a data. If the standard deviation output is more, the data is distributed more. In excel, there is a function called STDEV to get the output of standard deviation.
There are three types of standard deviation functions in excel, STDEV, STDEV.P and STDEV.S.
STDEV is the normal standard deviation function. STDEV.P needs to be used when data is from a population. On the other hand, if the data is from a sample, STDEV.S needs to be used.
If we look into the mathematical formula for both STDEV.S and STDEV.P, n-1 is in the STDEV.S and n is in the STDEV.P. In normal STDEV function, it is only n-1.
Just put the function and select the data (number 1, number 2..), output will be generated.
Syntax: STDEV.S(number1, [number2], …)
VAR.S/ VAR.P
To get variance from standard deviation, need to square the value of standard deviation. As shown in the example of standard deviation output is 31. To get variance, just need to square it, 31^2= 938.
The use of standard deviation and variance are almost same. Only difference is, standard deviation cannot be zero. Where, variance can be positive and negative both based on the data set.
To get direct variance, use the function VAR. There are other functions available of variance, VAR.S and VAR.P. When you are dealing with population data, use VAR.P else use VAR or VAR.S.
Syntax: VAR.S(number1, [number2], …)
Syntax: VAR.P(number1, [number2], …)
Central Tendency:
AVERAGEIF/ AVERAGEIFS
I don’t think average concept needs any introduction. It is sum of the data set divided by count of numbers. Excel has three types of average functions, AVERAGE, AVERAGEIF, AVERAGEIFS. In AVERAGEIF and AVERAGEIFS, logic can be applied to get the average output. In AVERAGEIF, only one condition can be applied where in AVERAGEIFS, multiple logics can be applied.
Median is the measure of central tendency to find out middle number in data set which is sorted. Median should be used if data set has an outlier or data set is skewed.Syntax: MEDIAN(number1, [number2], …)
MODE
Mode is another measure of central tendency. Mode function is to find out the frequently happening number in a data set.
Syntax: MODE(number1, [number2], …)
Information:
MAX/MIN & MAXIFS/ MINIFS
Max and Min are basic statistical function in excel to find maximum and minimum value in a dataset. Additionally, one or more condition can be applied to find the maximum and minimum value. If one condition needs to be applied, MAXIF function should be used. On the other hand, for multiple condition or logic, MAXIFS function should be used.
Count function concept is simple, if want to count the numbers in a dataset, use count function. Now there are options available if you want to count the numerical data based on one or more logics. If the logic is one COUNTIF function is used and if logics are more than one, COUNTIFS function should be used.
As an example, if you want to count the numerical values in a data set if it is more than 100 and less than 150. In this case, COUNTIFS function is useful.
Syntax: COUNT(value1, [value2], …)
Syntax: COUNTIF(range, criteria)
Syntax: COUNTIFS(criteria_range1, criteria1, …)
FREQUENCY
Frequency function is to find out the frequency distribution at defined frequency value.
Syntax: FREQUENCY(data_array, bins_array)
RANK.EQ
Rank EQ function is useful when you want to assign rank based on the numeric value in the data set. If duplicate values are present in the data, same rank will be assigned by the function to the duplicated values. In the below example, 12 and 13 column has same value as 99 where same 7th rank has been assigned by the RANK.EQ function.
Syntax: RANK.EQ(number, ref, [order])
Distribution:
QUARTILE.INC
Quartile function is to identify the 1st/2nd or 3rd quartile value from a data set. Additionally. QUARTILE.INC function has the capability to identify maximum and minimum value. Quartiles are basically the percentile or distributing the data set into three parts. So, 25th (1st quartile), 50th (2nd quartile) and 75th (3rd quartile) percentile of a data set can be got easily by QUARTILE.INC function.
There is another function as QUARTILE.EXC, which can not return to max and min value. So, its better to use QUARTILE.INC function.
Syntax: QUARTILE.INC(array, quart)
Skew function helps to identify the skewness of distribution in a data set. As shown below, data 1 tail off is skewed towards right and data 2 tail off is skewed towards left. By looking into the output of skew function, skewness can be identified, plotting the data is not required.
If output is showing as positive, distribution of the tail off is to right. If output is negative, distribution of the tail off is to left.
There is another version of SKEW function, that is SKEW.P. If you are measuring skewness of a population, SKEW.P function should be used.
Syntax: SKEW(number1, [number2], …)
Syntax: SKEW.P(number1, [number2], …)
KURT
Kurtosis is the measure of distribution in terms of tailedness. In other words, it tells that data is normally distributed or heavy/ light tailed. If the KURT function output is positive, it means the distribution is relatively peaked distribution. If the output is negative, it means the distribution is relatively flat distribution.
Syntax: KURT(number1, [number2], …)
NORM.DIST
NORM.DIST function is stands for normal distribution. This function used to return the values for probability distribution function (PDF) or cumulative distribution function (CDF). As a use case, probability distribution function is needed to draw bell curve in excel.
As an input for the function, mean and standard deviation are required. After that CDF or PDF input needs to be input as cumulative space in the argument. If you want PDF, mention FALSE, if want CDF, mention TRUE.
To calculate binomial distribution probability. Here you can select the output like cumulative distribution function or probability mass function. In the below example, we are considering the number of trial 10 and probability is 0.1657.
Correlation is one of the popular statistical concepts in the field of data analytics. Correlation is denoted by r and result interpreted by the value of it. So, CORREL function directly show the output of r whether it is positive, negative or zero.
Syntax: CORREL(array1, array2)
Regression:
SLOPE
In the regression function, there are two important parameters apart from X and Y, those are intercept and slope. Slope can be directly known by using SLOPE function in excel by using known ys and xs.
Syntax: SLOPE(known_ys, known_xs)
INTERCEPT
As mentioned earlier that slop and intercept are important in a regression line. To find the intercept, directly INTERCEPT function can be used.
Syntax: INTERCEPT((known_ys, known_xs)
LINEST
LINEST is a powerful function which show the output of all the statistical output of a regression line or linear regression. Earlier we discussed separate functions to get R square, slope or intercept. LINEST function is capable to give all these statistical outputs and also much more parameter output.
When using the function, known ys and know xs need to input. Also in the stats option, if you select TRUE, all the output will be showed in output.
LOGEST function is almost similar as LINEST function. Only difference is, LINEST function applicable to straight line and LOGEST function fits for exponential curve.
RSQ is to find out R square which is one accuracy parameter of regression. To find out R square within two variables, need not to run regression model. Only RSQ function will produce the required output.
Select the range of dependent variable first as y and then independent variable as x, RSQ output will be generated.
Syntax: RSQ(known_ys, known_xs)
Forecasting:
FORECAST
Predictive analytics is type of analytics to predict future with the help of past data. Excel has created the simple function to do it in the spread sheet. There are 6 types of Forecast functions to deal with any such work related to forecasting.
Syntax: FORECAST(x, known_ys, known_xs)
Hypothesis Testing:
F.TEST
F test is used to check hypothesis testing. It is to check the similarity between two population group. To use the function, only two array input is required. The function will generate the output of p-value. If the output or p-value <0.05 null hypothesis can be rejected.
Syntax: F.TEST(array1, array2)
T.TEST
T test is used for hypothesis testing in statistics which used two group data to compare means. When we say hypothesis testing it is to reject the null hypothesis.
To work with the function, first need to select two arrays which needs to be compared. Then, next input is required as one tailed distribution or two tailed distribution. Lastly, the type of t test needs to be inserted like 1 for Paired, 2 for Two sample equal variance (homoscedastic) and 3 for Two sample unequal variance (heteroscedastic).
Syntax: T.TEST(array1, array2, tails, type)
Inferential Statistics:
CHISQ.TEST
CHISQ.TEST is stands for chi-square test. Its is a method or tool for inferential statistics to check the statistical significance of an experiment. The output will give the test result for independence. If the output is low, it means that both actual and expected data are independent. If both actual and expected range are similar, output will be 1.
Syntax: CHISQ.TEST(actual_range, expected_range)
Probability:
PROB
PROB function stands for probability. It helps to do the calculation of probability by using the given range of instructions. One example mentioned to understand it better.