Market Basket Analysis with R | Simple Apriori Algorithm in R | 8 usefulness of Market Basket Analysis

Retail business is one of the growing and profitable business in the world. There are many global players in the retail business like Walmart, Amazon, Costco etc. Data analytics is playing a major role in the retail industry in terms of segmentation and targeting of appropriate customers, forecasting of sales and many more. Among all other analytics techniques, market basket analysis is one of the key techniques among the big retailers to find out the association of items purchased by the customers.

Market basket analysis is unsupervised machine learning technique to find out the products which customers are buying together. In other words, it’s to find out the relationship of products that go together. We’ll discuss Market basket analysis with R in addition to the R codes to perform the same and its interpretation.

Usefulness of Market Basket Analysis

  • Understanding of customer behaviour and buying pattern
  • Find out fast and slow mover products
  • Arrange the product shelf or store layout as per the customer choice
  • Maintain inventory of products and forecasting
  • Placement of content
  • Design marketing messages
  • Cross selling of products
  • To detect fraud

So, market basket analysis a key methodology to serve the customer as per the need. As customer behaviour is getting changed and influence by many factors, it is ideal to do the analysis after a specific time interval as well as during different time in a year. Also, the exercise will add value to the organization, dealing with retail business. Many such market basket analysis has done in retail industry to identify and act as per customer behaviour. In many fields and industry, it has been applied.

Market basket analysis is to find out the products which are buying together by customer in a single visit to a store or an ecommerce platform. Also helps marketers to design promotional marketing campaign. The methodology is based on the association rule which is measured with few statistical parameters.

Advertisement

For market basket analysis, association rule is used and frequency of purchase. In this study, grocery product purchase data has been used to identify the products which customers are purchasing together. First, need to identify the frequency of product purchase and then identify the products which are associated with the mostly purchased product/ products. To achieve the same, Apriori algorithm will be used to find out association rule.

Define Problem Statement

Now a day’s retailers are having thousands of products and SKUs in the store and it is nearly impossible to identify the products which are being purchased by the customers. It is helpful for retailers, if able to find out the associated products which customers are buying in a single visit to the store and the product association with most frequently purchased products. Also, with this knowledge, better marketing campaigns or offers ca be designed.

Market Basket Analysis with R

As mentioned earlier that without a proper methodology, it is nearly impossible to find out the product association, that’s why Market Basket Analysis came into the picture to solve such problem.

In case of market basket analysis, product association can be found with help of Apriori algorithm and three statistical measures, support, confidence and lift.

  • Support is being measured as frequency of purchase of a particular product. In other words, support can be found by the following formula.

Support = number of times product purchase/ total number of transactions

  • Confidence is the measure of power of accuracy and measured by following formula.

Confidence (Product 1à Product 2) = Support (Product 1, Product 2) / Support (Product 1)

Lift is the measure of ratio by which the confidence of rule exceeds the expected confidence. It is a probability of occurrence.

Working with data

Apriori algorithm recommendation in R is being used to get association rule. To achieve that, “arules”, “arulesViz” and “datasets” packages has been used. The insight generation will be done based on three statistical measures, support, confidence and lift.

Load the data and libraries

# Load the libraries

library(arules)

library(arulesViz)

library(datasets)

# Load the data set

data(Groceries)

Explore the data

Data visualization give more clarity on the data set as compared to descriptive analytics. In this case, data visualization is to explore the data in terms of frequency of item purchase.

# Item frequency for the top 20 items
itemFrequencyPlot(Groceries,topN=20,type=”absolute”)

Comments: As it is visible in the bar graph that Whole milk is the most frequently purchased item followed by other vegetables.

Here, the two logic has been set in terms of support and confidence. So, minimum support set as 0.001, minimum confidence set as 0.8. Also, focused on top 5 rule.

Get Rules

# Get the rules

rules <- apriori(Groceries, parameter = list(supp = 0.001, conf = 0.8))

# Show the top 5 rules, but only 2 digits

options(digits=2)

inspect(rules[1:5])

Comments: As visible in the output that the 92% customers those are purchasing soup and beer also purchase whole milk. Also, the other combinations are visible in the output to identify the products combinations, customers are purchasing together. So, total 410 rules generated with the confidence, support, coverage and count.

Create Summary

# Creating summary

summary(rules)

Comments: Summary report helps to understand the number of transactions, support and confidence of the rule. Next step is to find out most relevant rule which can be achieved by sorting it.

Sorting

# Sorting out

rules<-sort(rules, by=”confidence”, decreasing=TRUE)

options(digits = 2)

inspect(rules[1:10])

Comments: So, the top 5 rules are showing in the output that rice and sugar are being purchased with whole mile. On the other hand, rule 4 is longest, containing 4 products.

Limit the Rules

#To limit the number of rules#

rules<-apriori(Groceries,parameter = list(supp=0.001,conf=0.8,maxlen=3))

inspect(rules[1:10])

Targeting

The objective of targeting is to answer following two questions.

i) What customers are buying before buying whole milk?

ii) If customers are buying whole milk, what other products they are buying?

So, here Apriori algorithm will be used to get answer of these questions.

# Targetting items

rules<-apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.08),

               appearance = list(default=”lhs”,rhs=”whole milk”),

               control = list(verbose=F))

rules<-sort(rules, decreasing=TRUE,by=”confidence”)

inspect(rules[1:5])

So, next step is to mention whole milk in the left-hand side and other items in the right-hand side. The confidence has been set to 0.15 and minimum length of 2.

rules<-apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.15,minlen=2),
appearance = list(default=”rhs”,lhs=”whole milk”),
control = list(verbose=F))
rules<-sort(rules, decreasing=TRUE,by=”confidence”)
inspect(rules[1:5])

Data Visualization

Data visualization will further help to understand the association rule. Here, “arulesViz” package has been used to make the beautiful looking and insightful visualization.

# Data Visualization

library(arulesViz)

plot(rules,method=”graph”,interactive=TRUE,shading=NA)

Summary of Result and Conclusion

As shown in the statistical result and data visualization result that whole milk is the most frequently purchased product and other vegetables are associated with it. There are other two products which are also associated after other vegetables, rolls/ buns and yogurt.

In the beginning, the objective of the study was to identify the most frequently purchased product which is whole milk as shown in the output. Next step was to find out the products which customers are likely to buy with most frequently purchased product (whole milk). So, as per the result, other vegetables are purchased by customers as suggested by the Apriori algorithm and association rule.

To conclude with statistical output, three measures will be useful, support, confidence and lift. As shown in the targeting point, other vegetables as product is having 0.075 support which is highest among all associated products. Moreover, confidence is 0.29 and lift is 1.5 with the count of 736.

So, it is concluded that among all the products, other vegetables are being purchased by customers when purchased other vegetables on a single visit at store.

Recommendation based on output

So, marketers can design a store layout or product shelf where customers can find out whole milk and vegetables at nearby places. In other words, from whole milk shelf, vegetables should be visible, and it should be within the eye range of customers. Also, retailers should not put offers in both whole milk and vegetable simultaneously and separately to get more profit, but combo offer can be designed to experience more product lifting.Moreover, rolls/buns and yogurt products should be in radar of marketers while designing any marketing campaign or offers.

Please follow and like us:

Leave a Comment