29.4.09

Recommender Systems

Internet users encounter what are referred to as recommender systems on a daily basis. Sites like Amazon, Pandora, Netflix, Last.fm are a few of the pioneers in employing highly-functioning, advanced and accurate recommender systems. A recommender system takes information from an individual user, whether its purchasing history, habits or reports likes and dislikes, and has the ability to suggest items that the user will likely enjoy. This is achieved through extensive databases that contain information about songs, movies, tastes, genres, artists, possible user traits, etc. The databases are accessed and utilized through algorithms that quickly search through data and generate the recommendations. There are four major types of recommender systems: collaborative filtering, cluster models, search-based, and item-to-item.


Each of these systems represent a sort of evolution within the field of recommender systems. Traditionally collaborative filtering bases recommendations on similar users, which does not do much in the way of refining recommendations, leading to recommendations that are less likely to fit the taste of the custom. Cluster models assign users to groups based on a consistent quality that all of the users possess, however this can also lead to less specific recommendations because these users aren't necessarily the most alike. Search-based systems look at user history and recommend items with the same or similar traits. The main constraint in search-based systems is that recommendations have the potential to be extremely limited, and there is not much room for discovery. The fourth most common, and probably the most advanced in recommendations, is the item-to-item system. This focuses on the traits of items and recommending new things to users without the implicit inclusion of other users' traits, but simply based on item traits.

Photobucket
Amazon is the forerunner in e-commerce sites, with its recommendation system to thank for sales and user trust. Amazon operates with an item-to-item recommendation system, which allows for the usability and accessibility to site's vast amount of items for sale. Items are not only classified by what their basic description is, for example a CD, but also classified by the type of music, the artist, the year it was released, to name a few potential traits. This allows users to get in-depth and more tailored recommendations in spite of the gigantic database of items. Recommendations also take into account user-generated item rankings, which can make items more recommended and entice more purchasing. Additionally, Amazon includes some recommendations based upon what other users have purchased when two or more users have a purchase in common. However, this differs from basing recommendations solely on other users because users aren't bound together simply because of a few similarities.

Photobucket
Internet personalized radio service and music recommendation generator Pandora began as The Music Genome Project, which is essentially a database of songs broken down into descriptions with the smallest of traits, or "genes." There are 2,000 potential traits that can be assigned to songs. Pandora's recommender would also qualify as an item-to-item recommendation system because recommendations come through similar qualities of songs, not similar qualities in users. Interestingly, Pandora has a partnership with Amazon so that users can purchase albums or MP3s of songs they have heard on their personalized stations. Pandora allows users to rank songs through the labeling of "thumbs up" or "thumbs down," in addition to banning tracks. All of these options affect potential recommendations for each user. The more feedback that Pandora receives, as with most sites generating recommendations, the more personalized recommended songs become.

Photobucket
Last.fm is an online music recommendation service, as well as a social network. It does base a fair amount of its recommendations on similarities between users. The site generates both personalized radio stations, as well as stations based on artists or genres. User likeness is ranked and listeners are grouped together based upon artists listened to that they share. Last.fm, much like Amazon, also tracks user activity, and in this case it is the listening habits both online and offline, plus searches and rankings. This fusion of clustering and item-to-item recommendations, which is more apparent here than at Amazon, hints that Last.fm, like all others, is a work in constant progress. In addition to the Last.fm database, a user can tag artists with traits of the user's choice, which impacts can impact whether or not other users might hear artists. This is dependent on other users' activities on the site, but makes users stakeholders in ensuring that they are tagging responsibly.

Photobucket
Netflix is an online DVD rental service with one of the most enviable recommendation systems in existence. Its personalized movie recommendations are very similar to Amazon's system of trait-based databases, while simultaneously employing user-generated ratings. The main operator here is an item-to-item system. Netflix's algorithm for generating recommendations is called Cinematch, and the company is holding a contest, with a $1,000,000 prize, for someone to improve the Cinematch algorithm by 10%. The improvement in the system is not for higher speed or holding more data, but improving the likelihood of a user getting a recommendation that he will actually enjoy by 10%.

Netflix's realization that, although they have a highly successful recommendation system, someday the system will be surpassed is truly the future in this field. The brilliance behind holding a contest to improve its own system will prevent people from developing algorithms for their own ventures and keeping Netflix on the cutting edge of recommendation system technology. In this online world of recommendation systems, the user is king. Senseless and inaccurate recommendations have the potential to drive users away. This makes incessant innovation an absolute necessity. As evidenced through the merging of item-to-item filtering and recommendations based on similar users, this field is burgeoning and nowhere near the finish line in terms of any company having a flawless recommendation system. Methods will continue to evolve and bleed into one another, and perhaps all become somewhat antiquated eventually. In seeing how these systems have already morphed into stronger and more reliable ones, its easy to imagine that recommendation systems will be a topic constantly in the stage of innovation. That's absolutely where the field is at the moment.

23.4.09

The Netflix Prize

Photobucket
"Winning the Netflix Prize improves our ability to connect people to the movies they love."
The Netflix Prize is extremely indicative of how crucial recommendation systems are to the functioning of businesses relying on e-commerce. The contest invites people to create a system that will improve the likelihood of users loving the movies recommended based on their preferences. Improving Netflix's current system, Cinematch, by 10% will get one dedicated winner $1 million. The desire to improve one of the best recommendation systems in existence absolutely indicative of how much competition is involved in making sure that users have a reason to return to a site. I'm trying to get an interview with someone from Netflix to further discuss its recommendation system, and its continuing improvements.

21.4.09

How Amazon operates: item-to-item

Photobucket
Amazon has created perhaps the most highly functioning and relevant recommendation system developed in-house, called item-to-item collaborative filtering.

"Rather than matching the user to similar customers, item-to-item collaborative filtering matches each of the user's purchased and rated items to similar items, then combines those similar items into a recommendation list," according to an industry report describing Amazon's methods.

The system revolves around a table of similar items based on what items Amazon users are purchasing at the same time. Each item a user purchases is looked at individually and recommendations are generated from comparing each one to the similar-items table.

Item-to-item collaborative filtering generates extremely user-relevant content, and can do so even with little user input because of how it uses similar items instead of similar users.

20.4.09

Search-Based Methods for Recommendation Systems

The article from Amazon indsiders has been so helpful in my search for explanations of how different recommendation systems work. It's chock-full of information, so I've been breaking it down into little bits, in hopes that I fully understand it and that it makes at least some sense to those of you reading.

Search-based methods look at what a user has shown preference to, either through purchase, rating or other habits unique to individual sites, for instance listening on Pandora or Last.fm. It takes those preferences, looks at their traits and then finds other items that have the same or similar traits.

The issue here is that users most likely will not "discover new, relevant and interesting items," because their recommendations are based only on what they already like. The article explains this well: if someone bought The Godfather, then their results could range from best-selling drama DVDs or movies all from Francis Ford Coppola. There's either too much of a gap between what's liked and what's recommended, or too little.

17.4.09

Cluster Models as Recommendation Systems

PhotobucketCluster models are another often-used approach to making a functioning recommendation system, according to an article by three Amazon insiders. Cluster models divide users into groups, or clusters, with the most similar users. The clusters are generated and each assigned a random user at first, and then other users are compared and classified.

The issue with cluster models is that grouping users into these clusters doesn't mean that they are grouped with the most similar other users. This can lead to recommendations of less quality.

15.4.09

Traditional Collaborative Filtering

Traditional collaborative filtering is a type of system used to generate user-specific recommendations based on common items between two similar users. Items most highly ranked by similar users come as the most recommended items. The major problem with this type of system is that, while recommendations are very targeted, they are also incredibly limited as far as what a user will find in his recommendations. This is due to the very limited number of similar users to which one is compared. Problems arise when users have a very narrow pool of items to draw from, and then recommendations that should ideally be similar in liking to a user do not fit that user's taste because recommendations are based on such a small amount of information. Basing recommendations solely on the taste of a small group of users with a common interest makes branching out in exposure to new things and things that are most likely to coincide with taste less and less likely.

14.4.09

Holovaty on the importance of databases

Photobucket
Per Adrian Holovaty, of, among many notable endeavors, EveryBlock, made an appearance in my Digital Media Entrepreneurship class a week or so back and we got to discussing databases and recommendation systems. He told me a bit about a site that he co-founded while in college called Lawrence.com, which is incredibly similar to a project that I'm working on called Collegization.

Basically, both projects aim at getting students into nightlife, while heavily relying on databases of information in order to perform that service, and allowing people to find things that they might like based on location and habits. Both databases include restaurants, venues and events. He told me that the key to running this system, and incorporating a recommendation function into it, is having massive amounts of organized data. Holovaty said that the data, while minute in individual form, is incredibly valuable when amassed for the purposes of Lawrence.com and Collegization. Without these tediously formed databases, it would be impossible for either project to function.