Sunday, March 07, 2010

Science Sunday: Algorithms That Know You Better Than You Know Yourself

It is not unreasonable to say that algorithms, in some respects, rule our lives.

At their most basic, algorithms are instructions for solving a problem using a finite sequence of instructions. They are commonplace in calculating and data processing, but where they touch our lives with increasing frequency is when we, for example, buy a book from Amazon.com or order a DVD from Netflix.

I do not offer those two examples randomly. I have been buying stuff from Amazon ranging from used paperbacks to televisions practically since the online merchant's inception in 1995. The algorithms that Amazon has used to recommend merchandise that might interest me based on merchandise that I have previously purchased
once seemed pretty whip smart to me, but over the years has seemed increasingly mundane and much to presumptive for a difficult-to-pigeonhole consumer like me.

Netflix is a different kettle of fish and some, if by no means all, of its recommendations are surprisingly interesting.

Which is not surprising because Netflix is seen as the industry leader in pushing the boundaries of algorithms, which is to say that just because I liked When Harry Met Sally doesn't necessarily mean that I would like other movies that explore whether two friends can sleep together and still love each other in the morning.

In fact, I found When Harry Met Sally, let alone the entire genre, to be less than memorable, but the latest iteration of Netflix' recommendation algorithm is fascinating.

According to an article by Casey Johnson in Ars Technica, the new algorithm has increased Netflix' recommendation accuracy by 10 percent because of the diversity and not the uniformity of its recommendations. This begins to overcome an inherent handicap in such recommendation systems -- that the field of interest for users is narrowed the more the system is used.

Netflix' improved system is based on research that found the most interesting recommendations originate from "weak ties" in a system; that is, between customers that are somewhat similar but disparate enough that they can introduce novelty to each other.

To widen the potential field of customer interest, researchers developed a hybrid of two algorithms.

Explains Johnson:

"One combined an algorithm that based its recommendations on random walks between highly connected users and material; the other mirrored the process of heat diffusion, spreading ratings at a decreasing level of potency as the recommendation had to travel further. The heat diffusion algorithm can be thought of as a system that has users connected in a network with the objects they have interacted with and evaluated, and values are passed among the items in this network to develop ratings.

"The head diffusion model uses values of 1 or 0 for the material to be recommended -- either a user liked something or he didn't -- and takes an average of the total resources a user had assigned to an object to give the user a value. For example, if a user liked two things and disliked two others, the value assigned to the user would be one-half.

"The algorithm then averaged these values for any users connected to an object, and this became the object's value in the system (for example, if two users were attached to an object and one had a value of one-half and the other had zero, the new value assigned to the object would be one quarter)."

All of this can be done using a small set of data, meaning the heat diffusion algorithm can make diverse yet relevant recommendations based on sparse data in one pass.

The upshot is that by combining the heat diffusion approach with the safer and more accurate random walk, researchers found that they could create a body of recommendations that combined novelty items and safer, more accurate recommendations.

The ultimate recommendation? Maybe something like The Godfather Likes It Hot or It's a Wonderful Psycho. Or how about The Sound of Pulp Fiction?



No comments: