🍎Market Basket Analysis🍞- Association Rule Mining with visualizations

Ben Roshan
Analytics Vidhya
Published in
6 min readDec 22, 2020

--

1. Overview

Introduction

The era has come where the computer knows better about us than we do. Our device is so powerful that it knows what we are doing right now and what are we going to do in the future. The following application of AI is calleda as Market Basket Analysis which is widely used in the Retail stores where the application predicts the closely associated items we are likely to buy along with the product we purchased.

Project Detail:

In this project, we use Groceries dataset, which has the dataset with 38765 rows of the purchase orders of people from the grocery stores. The dataset has only one csv file.

In these dataset above, I have analysed the dataset with visualizations and perform A rule mining with the help of Apriori algorithm. I have never realized or questioned myself why these items are kept closely in the supermarket, thought that it was for customer’s convenience but little did I know that it had a business impact.

You can also get this code on my GitHub wall github page.

Goal of this notebook

  • Getting and cleaning the data
  • Perform A-rule mining using Apriori algorithm
  • Visualizing the results of association between items

2. Import libraries

3. Getting the data

groceries=pd.read_csv('../input/groceries-dataset/Groceries_dataset.csv')
print(f'Groceries_dataset.csv : {groceries.shape}')
groceries.head()

According to dataset information, it has the following features :

  • Member_number: This is like a customer id given to the customer post purchase transaction
  • Date: This is the date at which purchase/ transaction was made
  • itemDescription: Name of the item which was purchased

📌 It is important to look at the number of non-null records and their data types.Often the date doesn’t have the datetime format.

groceries.info()

From the information we can identify that

  • We don’t have any null records in the dataset. BAM !
  • Date column is an object data type. small bam!

4. Pre-processing

Renaming column

#Renaming the columns to simple words
groceries.rename(columns = {'Member_number':'id','itemDescription':'item'}, inplace = True)

Date information

6. Associate Rule Mining with Apriori Algorithm

Source:DataCamp

What is market basket analysis?

Have you ever wandered around super market and wondered all the sections and racks are designed in a way that the products are related ? like you can get bread and butter in nearby racks; brush and toothpaste in same racks. These products are associated. If you buy a brush the likelihood of you buying the paste is high. These are marketing tactics to make you fill up the basket with products with their associated items thereby increasing sales revenue. Few business introduce discount in the associated item or combine both the products and sell at a lower rate inorder to make you buy the item+item associated to it

What is association rule mining?

Association rule mining is the technique used to unveil the association between items, where the items we purchase are denoted as X->Y

Here X is the item we buy and Y is the item we most likely to buy (More like if->then) it is also called as

  • X- Antecedent
  • Y- Consequent

Association rule mining helps in designing the rules for the association of items. These rules are formed with the help of three terminologies

1.Support: It signifies the popularity of the item, if an item is less frequently bought then it will be ignored in the association.

2.Confidence: It tells the likelihood of purchasing Y when X is bought.Sounds more like a conditional probability. Infact it is ! But it fails to check the popularity(frequency) of Y to overcome that we got lift.

3.Lift: It combines both confidence and support.A lift greater than 1 suggests that the presence of the antecedent increases the chances that the consequent will occur in a given transaction. Lift below 1 indicates that purchasing the antecedent reduces the chances of purchasing the consequent in the same transaction.

For example Assume there are 100 customers where 10 of them bought milk, 8 bought butter and 6 bought both of them. We need to check the association of bought milk => bought butter

  • support = P(Milk & Butter) = 6/100 = 0.06
  • confidence = support/P(Butter) = 0.06/0.08 = 0.75
  • lift = confidence/P(Milk) = 0.75/0.10 = 7.5

What is Apriori?

Apriori algorithm uses frequent itemsets to get association rules,but on the assumptions that

  • All subsets of frequent itemsets must be frequent
  • Similarly incase of infrequent subset their parent set is infrequent too The algorithm works in such a way that a minimum support value is set and iterations happen with frequent itemsets. Itemsets and subsets are ignored if their support is below the threshold till there can’t be any removal.

Later lift of these selected itemsets(rules) are calculated and if the value is below the threshold the rules are eliminated since algorithm may take time to compile if we take all rules

Preparing the data

Before proceeding with apriori we have to prepare the data in a sparse matrix format where products are in column and id as index . Initially we group by based on the quantity purchased and later we encode it with 0s and 1s

Applying Apriori

Here we apply apriori algorithm and get all the frequent itemsets(with 70% as support threshold) and apply association rules function to derive rules where we use lift metric

Building dynamic function to customize rules

7. Visualizing the results

The results in tabular form will not convey much insights into our algorithm so let’s visualize the rules

Relationship between the metrics

Insights

  • Support and confidence has a bleak linear relationship, which means that the most frequent items have some other items associated to it
  • When it come to lift the relationship is squashed in support when it goes beyond 0.10 and in confidence there is no relationship
  • In antecedent and consequent support relationship there is no linear relationship but it’s rather inverse ,when consequent support increases the antecedent support fades out -can we consider this phenomenon as when butter quantity of purchase increases the quantity of bread fades?

Network diagram of rules

Here we make network diagram of specified number of rules where we can see the antecedents and consequents connected to the rules

Insights

  • It is simpler to visualize with the help of network diagram than seeing in a tabular format
  • The arrow coming to the rules(yellow circle) is from antecedents and the arrows going from rules circle are towards consequents.

Strength of association using heatmap

We have discovered the association of items, what’s good if we don’t know the strength of their relationship

Insights

  • We have a strong relationship between yogurt,milk and veggies
  • Roll buns are highly correlated with whole milk.

8. Conclusion

  • The model built with Apriori algorithms
  • It is considered that a apriori algorithm is effective in association rules application and gives constant results everytime

In this analysis, the model wasn’t evaluated with any test data, the following viewpoints should be added to the evaluation.

  • Detection of signs of anomalies(rare item)-can help in pushing that item with the associated item
  • Balance of importance of support and confidence with lift
  • Model interpretability with visualizations

Please visit my Kaggle notebook , it has all the EDA performed on the dataset

--

--