How many of you have read The Cuckoo’s Calling by the previously unknown author Robert Galbraith? The answer is not many, until it came out that Robert Galbraith was none other than blockbuster best-selling author JK Rowling. Sales then skyrocketed. Rowling recently published some of the rejection letters she received as Robert Galbraith. If algorithms had been applied, could they have done any better at recognizing potential of the book?
In this post, I want to discuss one set of algorithms called recommendation engines – also called collaborative filtering – and their various guises. They’re typically used when you have many prospects, customers and products.
For example, let’s say you want to recommend a video to a prospect from a vast range of potential videos. The recommendation engine looks at videos people have bought in the past to make the best recommendation.
You may come across recommendation engines in a wide range of places:
- Online shopping – most major online retailers use recommendation engines to suggest other products.
- Film and TV – I did a project a few years ago where we were recommending TV programmes based on what people had streamed. Needless to say, we beat the editor’s recommendation by a fair margin.
- News articles – people who read these articles might also want to read these ones.
- Dating sites – more controversially, these sites can use algorithms to suggest potential dates.
- Crime – I’ve used similar techniques to detect crimes committed by serial offenders.
There are many types of models that all fall under the heading "recommendation engines," and they all have their benefits and drawbacks:
- Simple models. We can use simple models that are easy to create, maintain and adjust for recency of purchase, i.e. people who bought X also bought Y. The drawbacks are that the recommendations aren’t individual and don’t take into account whether the person actually liked the product. If they're based on a small amount of data, simple models can produce slightly spurious results, but the advantage is that simple models can be implemented quickly.
- Association analysis. This is a more sophisticated model and involves identifying significant associations across multiple products and sequences of purchase. These models produce recommendations that are a bit more personal. It’s not quite "you," but "people like you" bought these kinds of products, so we thought you might be interested.
- Long tail analysis. These models help identify new opportunities. For example, if I know you like the band Coldplay, and they release a new album, it’s not a great analytical leap to suggest that you might like their new album. However, if I know that people who like Coldplay also like this new upcoming band "The Data Geeks," then I’ve introduced you to something you hadn’t considered. This is useful and hopefully profitable as well, for both the organisation and the customer. In the field of recommendation engines this is called serendipity. Chris Anderson popularized the notion of the "long tail," explaining how these kind of techniques can find profitable, niche markets.
Recommendation engine types
- Implicit and explicit recommendation models don’t really care what the product is. These models just use a unique ID and information that these people bought these types of products. In this case it’s relatively easy to add in new products, remove old ones and not worry too much (though may wish to exclude recommending offensive or age restricted items). The model is also quick to adapt when demand suddenly increases.
- Content-based recommendation models may scrape/capture metadata about the product (volume sold, seasonality, price, size, author, description, number of pages, genre, age rating, leading actors, etc.). We could then use the product attributes to suggest recommendations. This is more complicated and requires a combination of manual input, text mining and machine learning to maintain and develop.
It's difficult to say which model type is better. With implicit/explicit recommendation models you’re relying on your customers to find the new products that you can use to recommend more products. With the content-based models you may find some interesting combinations that should appeal.
The more sophisticated methods combine both explicit ratings (customer has positively and/or negatively endorsed the product) and implicit ratings (customer has regularly visited / read / forwarded) to then use #algorithm such as Slope One, clustering, association analysis, SVD or combinations of these.
Other factors to consider are weighting of items. You may wish to consider weighting popular items less heavily than rarer items (inverse frequency) to enhance the distinctiveness and get to that niche market and the long tail.
Measurement of performance
Each interaction with the customer should help improve the recommendations, and we should learn each time. You can also use holdout samples to check how models perform on unseen data and then compare against the true position. Models we’ve created at SAS have produced great improvements in performance.
Where to start
A basic model might say, "25% of people that bought/viewed product A also bought/viewed product B." This is a fairly straightforward algorithm to implement on standard SQL databases and should give most organisations a start in developing their recommendation engines. At some point, a more sophisticated approach will be needed, and for this the organisation will need some form of analytical platform.
Getting back to the The Cuckoo’s Calling and the question: Would recommendation engines have picked it as a best-seller? The implicit/explicit method would cope reasonably well with such a situation, but the content-based method would probably get confused. However, as both are driven by data on sales and preferences, it’s unlikely that either method would perform particularly well. A text mining approach looking at language might yield better results. Visit our text mining blog for more details.
What recommendation approaches has your organisation used? Please comment below. And to learn more, download this white paper: Recommendation Systems – An Overview of System Types and Benefits.
You may also connect with me on Twitter via @sukcag, to discuss this or related topics.