Recommendation System using Association Rule Mining for Implicit Data

Darshan Majithiya
DataDrivenInvestor
Published in
4 min readNov 7, 2019

--

E-commerce platforms generate a great chunk of their revenue because of the personalized recommendations they provide to users. The integration of the recommendation system plays a vital role in the improvement of business as well as benefits the users in finding a better-suited product.

Implicit Data Vs Explicit Data

Explicit Data is the data provided by users intentionally i.e. ratings. Implicit Data is the data generated based on the user’s activities on the site such as visiting the product listing page, viewing the product, buying the product, etc.

Let’s face it — relying just on explicit data is not realistic because generating it requires extra efforts on the user’s end. But implicit data is easier to collect. The simple act of viewing or adding the item to the cart can be considered as an endorsement for that item.

Selecting the Algorithm

Alternating Least Square(ALS) is a popular Collaborative filtering algorithm for implicit data. But the problem here is — what if the taste of users ain’t the same every time they visit the site? (Generally, such type of behavior is observed on gifting sites where the user only visits on a particular occasion or a festival). In such a scenario, ALS will fail. To overcome this, we should focus more on session-based recommendation algorithms.

Association Rule Mining(ARM) can be used to provide session-based recommendations and Apriori is one such widely accepted ARM algorithm.

Dataset

For the simplicity of this story, let’s consider that there are only 4 user activity steps — visit_product_listing_page, view_product, add_to_cart, and buy_product.

We will use the visit_product_listing_page for generating the association rules and rest of the steps to calculate the product score — for this, we can assign step_weights to each step as:

view_product: 1
add_to_cart: 2
buy_product: 3

We can easily calculate the score for each product using step_weights and qty/no_of_visits of the product. As we have the product pool ready now, let’s concentrate on the generation of the association rules.

Processed dataset for association rules must be in a similar format as below

General Idea behind the Apriori Algorithm

Let’s say User A visited the product listing page with ID 1 in a single session and there’s User B who visited the product listing page with ID 1, also visited the product listing page with ID 2 in a single session. So, a rule will be generated to suggest products to User A which belong to the product listing page with ID 2 along with the product list page with ID 1.

There are 3 main components of the Apriori algorithm —

  1. Support — Probability of records containing both product listing page with ID 1 and ID 2.
  2. Confidence — Conditional Probability of records containing product listing page with ID 2 given ID 1. [P(ID2|ID1)]
  3. Lift — Ratio of Confidence to Support. If the lift is < 1 then product listing page with ID 1 and ID 2 are negatively correlated (doesn’t belong together in recommendations) else positively correlated.

Implementation Issues

There are two popular choices for implementation of the Apriori algorithm — mlxtend and apyori.

  • Mlxtend accepts dataframe as an input. The disadvantage of this implementation method is that for a large dataset, dataframe will try to reserve a big memory block which is not ideal for a production environment.
  • Apyori accepts a list of lists as an input.

It’s best to use Apyori implementation of the Apriori algorithm for large datasets to ignore the Memory Error or int32 overflow type errors.

Okay, let’s code it!

The output format of apriori is —

[RelationRecord(items=frozenset({'2', '3'}), support=0.4020899591094957, ordered_statistics=[OrderedStatistic(items_base=frozenset({'2'}), items_add=frozenset({'3'}), confidence=1.0, lift=1.0521032504780115)])

Providing Product Recommendations

Once the product pool and association rules are generated, we have everything we need for providing the recommendations.

For providing the recommendations, we can find the products which belong to product listing pages suggested in association rules and sort them by product score in the descending order (Of course, we can have a variation to this where we have multiple rules generated for a single user. In this case, multiply the product score by rule’s confidence and then sort by the final score).

Conclusion

Association rule mining is a great way to implement a session-based recommendation system. Of course, the algorithm must be decided based on the use-case and the user’s mindset.

Thank you for reading! If you enjoyed this article, feel free to clap many times (you know you want to!) and share it with a friend.

--

--

Data Scientist @ PharmEasy | Google Cloud Certified Professional Data Engineer