## Friday, October 27, 2006

#### Netflix Challenge

Netflix posted a challenge around the start of the month to improve their collaborative filtering algorithm for recommending movies to their customers. Roughly speaking, they're offering a \$50k incentive to the best each year, and \$1M to the first to beat a set target. Over 10,000 teams have registered. The target they have set may or may not be attainable--human inconsistency itself creates a theoretical upper bound on prediction accuracy, and their target may be set beyond that. Time will tell.

The most interesting aspect is they released a data set of over 100 million movie ratings by half a million customers (anonymous) covering a little over 17,000 movies. In short, it's an awesome database for testing CF algorithms in general, and also just a fun resource to play with.

Singular value decomposition is the obvious baseline approach to this task. I say baseline because it's a linear model which probably doesn't map well in the end to how humans actually choose their ratings, but it's a good first approximation just to get a feel for the data.

The end result of SVD is essentially a list of inferred categories, sorted by relevance. Each category in turn is expressed simply by how well each user and movie belong (or anti-belong) to the category. So, for instance, a category might represent action movies, with movies with a lot of action at the top, and slow movies at the bottom, and correspondingly users who like action movies at the top, and those who prefer slow movies at the bottom. It's completely symmetric in the sense that you could just as well call it the slow movie category and reverse the lists.

The actual categories it discovers are... whatever the data implies. The algorithm has no inherent concept of action, and doesn't even get to see the movie titles let alone any details about the movie itself. All it gets is a hundred million examples of: user 17538 gives movie 4819 a rating of 3. So it's interesting to see what it comes up with.

Normally, taking the SVD of a matrix (ratings for each user/movie) with 8.5 billion entries would be a bit of a headache, but this paper shows how to do it in two lines of C code with no storage beyond the singular vectors themselves. Hurray for simplicity. I had to tweak it a bit to make it treat the missing ratings (99 out of 100) as missing rather than as zeros, but the meat of it is still effectively two short lines of C code.

So I typed them in and fired up my lap warmer for about 11 hours before I had to swing by the internet cafe to upload my results before they closed, which I just now did and find myself at rank 8 out of 490 teams. :) Who woulda thunk? I'd probably be at 7 if I'd had time to train it a little longer; I doubt it would have reached slot 6. Anyway, I'm done playing with this for now since I've got other fish to fry (and I'm not going to win with this approach--the best few scores are quite a bit better). But if any of you higher rankers on the leaderboard happen to be reading this, I'd love to know how you're doing it--in the interest of furthering my understanding of AI, not winning the prize--so please teach me. :)

Here are the top few movies from the two sides of the first thirteen categories infferred by the algorithm. Comments and observations welcome!

Category 1:

Pearl Harbor (2001)
Coyote Ugly (2000)
The Wedding Planner (2001)
Armageddon (1998)
Maid in Manhattan (2002)
On Deadly Ground (1994)
The Fast and the Furious (2001)
Gone in 60 Seconds (2000)
How to Lose a Guy in 10 Days (2003)
Speed 2: Cruise Control (1997)
Jack Frost (1998)
Sweet Home Alabama (2002)
Miss Congeniality (2000)
S.W.A.T. (2003)
Eddie (1996)

Vs.

Lost in Translation (2003)
The Royal Tenenbaums (2001)
Dogville (2004)
Eternal Sunshine of the Spotless Mind (2004)
Punch-Drunk Love (2002)
Before Sunset (2004)
The Life Aquatic with Steve Zissou (2004)
The Mother (2003)
Primer (2004)
Sideways (2004)
Brothers (2005)
Napoleon Dynamite (2004)
Fahrenheit 9/11 (2004)
Sin City (2005)
Category 2:

Fear and Loathing in Las Vegas (1998)
Wake Up (2004)
Super Troopers (2002)
House of 1 (2003)
Jay and Silent Bob Strike Back (2001)
Sin City (2005)
Jackass: The Movie (2002)
Team America: World Police (2004)
I Heart Huckabees (2004)
Half Baked (1998)
Orgazmo (1998)
Anchorman: The Legend of Ron Burgundy (2004)
Freddy Got Fingered (2001)
Natural Born Killers (1994)
Little Nicky (2000)

Vs.

The Best of Friends: Season 3 (1996)
Friends: Season 5 (1998)
Friends: Season 4 (1997)
The Best of Friends: Season 4 (1997)
Friends: Season 8 (2001)
The Best of Friends: Vol. 2 (1994)
Friends: Season 6 (1999)
The Best of Friends: Season 1 (1994)
The Best of Friends: Season 2 (1994)
Friends: The Series Finale (2004)
Friends: Season 7 (1999)
Friends: Season 1 (1994)
Friends: Season 3 (1996)
Friends: Season 9 (2002)
The Best of Friends: Vol. 1 (1994)
Category 3:

Sex and the City: Season 4 (2001)
Sex and the City: Season 6: Part 1 (2003)
Sex and the City: Season 2 (1999)
Sex and the City: Season 5 (2002)
Sex and the City: Season 6: Part 2 (2004)
Sex and the City: Season 3 (2000)
The Hours (2002)
Sex and the City: Season 1 (1998)
Queer as Folk: Season 2 (2002)
Angels in America (2003)
Fahrenheit 9/11 (2004)
Beaches (1988)
Bowling for Columbine (2002)
Steel Magnolias (1989)
Queer as Folk: Season 1 (2001)

Vs.

Michael Moore Hates America (2004)
FahrenHYPE 9/11 (2004)
Celsius 41.11 (2004)
The Three Stooges: Dizzy Doctors (1937)
The Three Stooges: Spook Louder (1943)
Conan the Barbarian (1981)
Dragon Ball Z: Vol. 17: Super Saiyan (1998)
The Three Stooges Double Feature (1947)
The Three Stooges: Merry Mavericks (1951)
The Adventures of Ford Fairlane (1990)
The Three Stooges: Sing a Song of Six Pants (1947)
Caligula (1980)
The Story of O (1975)
The Three Stooges: Nutty but Nice (1940)
Battlestar Galactica (1978)
Category 4:

Rocky (1976)
National Lampoon's Vacation (1983)
Die Hard (1988)
National Lampoon's Animal House (1978)
Rocky II (1979)
Fletch (1985)
The Terminator (1984)
Braveheart (1995)
Lethal Weapon (1987)
Jaws (1975)
The Green Berets (1968)
Beverly Hills Cop (1984)
Rambo: First Blood: Ultimate Edition (1982)
Indiana Jones and the Last Crusade (1989)

Vs.

D.E.B.S. (2004)
In the Cut (2003)
Alexander: Director's Cut (2004)
Birth (2004)
A Dirty Shame (2004)
Eye of the Beholder (2000)
The Company (2003)
Romance (1999)
The Brown Bunny (2004)
Head in the Clouds (2004)
Tempo (2003)
A Day Without a Mexican (2004)
Alexander: Theatrical Cut (2004)
Spice World (1998)
Dr. T & the Women (2000)
Category 5:

8 1/2 (1963)
Above the Law (1988)
Rashomon (1950)
Marked For Death (1990)
Brazil (1985)
Dr. Strangelove (1964)
Ran (1985)
Annie Hall (1977)
Under Siege 2: Dark Territory (1995)
On Deadly Ground (1994)
The Glimmer Man (1996)
Manhattan (1979)
Star Trek: The Original Series: Vols. 16-28 (1967)
Fire Down Below (1997)
Citizen Kane (1941)

Vs.

Love Actually (2003)
Crash (2005)
Moulin Rouge (2001)
Garden State (2004)
Friends: Season 2 (1995)
Vanilla Sky (2001)
Ocean's Eleven (2001)
Confidence (2003)
American Pie (1999)
Closer (2004)
Meet the Fockers (2004)
Spanglish (2004)
Dodgeball: A True Underdog Story (2004)
Forrest Gump (1994)
Anchorman: The Legend of Ron Burgundy (2004)
Category 6:

Master and Commander: The Far Side of the World (2003)
The Last Samurai (2003)
The Passion of the Christ (2004)
Road to Perdition (2002)
Saving Private Ryan (1998)
The Patriot (2000)
We Were Soldiers (2002)
Open Range (2003)
Gangs of New York (2002)
Kingdom of Heaven (2005)
The Alamo (2004)
Troy* (2004)
Gods and Generals (2003)
King Arthur (2004)

Vs.

Troop Beverly Hills (1989)
She Devil (1989)
Elvira (1988)
Don't Tell Mom the Babysitter's Dead (1991)
Teen Wolf / Teen Wolf Too (Double Feature) (1985)
National Lampoon's Christmas Vacation: Special Edition (1989)
House Party 2 (1993)
Dirty Dancing (1987)
Mannequin (1987)
Pee-Wee's Big Adventure (1985)
House Party (1990)
Coming to America (1988)
The Brady Bunch Movie (1995)
Ernest Goes to Camp (1987)
Category 7:

The Best of Friends: Vol. 3 (1994)
The Best of Friends: Vol. 2 (1994)
Friends: Season 5 (1998)
The Best of Friends: Vol. 4 (1994)
The Best of Friends: Season 2 (1994)
Friends: Season 7 (1999)
Friends: Season 6 (1999)
The Best of Friends: Season 4 (1997)
Friends: Season 8 (2001)
Buffy the Vampire Slayer: Season 3 (1998)
Buffy the Vampire Slayer: Season 1 (1997)
Friends: Season 1 (1994)
Friends: Season 9 (2002)
The Best of Friends: Season 1 (1994)
Friends: Season 4 (1997)

Vs.

Very Bad Things (1998)
Natural Born Killers (1994)
Fatal Attraction (1987)
Leaving Las Vegas (1995)
Monster's Ball (2001)
Indecent Proposal (1993)
Eyes Wide Shut (1999)
Basic Instinct (1992)
Scarface (1983)
Born on the Fourth of July (1989)
Deliverance (1972)
8MM (1999)
9 1/2 Weeks (1986)
Training Day (2001)
Showgirls (1995)
Category 8:

Friends: Season 7 (1999)
Friends: Season 5 (1998)
The Best of Friends: Season 4 (1997)
Friends: Season 3 (1996)
Friends: Season 6 (1999)
The Best of Friends: Season 2 (1994)
Friends: Season 9 (2002)
Friends: The Series Finale (2004)
Curb Your Enthusiasm: Season 1 (2000)
The Best of Friends: Season 3 (1996)
Friends: Season 8 (2001)
The Best of Friends: Season 1 (1994)
Friends: Season 4 (1997)
The Best of Friends: Vol. 4 (1994)
Friends: Season 1 (1994)

Vs.

Men in Black (1997)
Tomb Raider (2001)
The Mummy (1999)
Independence Day (1996)
The Mummy Returns (2001)
Men in Black II (2002)
Sister Act (1992)
Galaxy Quest (1999)
Jurassic Park (1993)
Moulin Rouge (2001)
Lord of the Rings: The Fellowship of the Ring (2001)
The Addams Family (1991)
The Fifth Element (1997)
Mars Attacks! (1996)
Beetlejuice (1988)
Category 9:

Star Trek VI: The Undiscovered Country (1991)
Star Trek: The Next Generation: Season 3 (1989)
Star Trek: Generations (1994)
Star Trek: First Contact (1996)
Star Trek: Insurrection (1998)
Star Trek: The Next Generation: Season 1 (1987)
Star Trek III: The Search for Spock (1984)
Labyrinth (1986)
Star Trek V: The Final Frontier (1989)
Star Trek: The Next Generation: Season 7 (1993)
Star Trek: The Next Generation: Season 5 (1991)
What Dreams May Come (1998)
Star Trek IV: The Voyage Home (1986)
Star Trek: The Next Generation: Season 2 (1988)
Star Trek: The Next Generation: Season 4 (1990)

Vs.

The Passion of the Christ (2004)
The Office: Series 1 (2001)
The Office Special (2001)
The Office: Series 2 (2002)
Diary of a Mad Black Woman (2005)
Curb Your Enthusiasm: Season 1 (2000)
Arrested Development: Season 1 (2003)
Because of Winn-Dixie (2005)
City of God (2002)
Curb Your Enthusiasm: Season 2 (2001)
Madea's Class Reunion (2003)
Barbershop 2: Back in Business (2004)
The Fast and the Furious (2001)
Shark Tale (2004)
The Wire: Season 1 (2003)
Category 10:

Open Water (2004)
Titanic (1997)
Sideways (2004)
Elf (2003)
In the Bedroom (2001)
To Die For (1995)
Jaws (1975)
Freaky Friday (2003)
Spider-Man 2 (2004)
Cabin Fever (2003)
Election (1999)
Monster (2003)
Birth (2004)
Jurassic Park III (2001)

Vs.

The Boondock Saints (1999)
Cowboy Bebop Remix (1999)
Snatch (2000)
Four Rooms (1995)
Fear and Loathing in Las Vegas (1998)
Dogma (1999)
Lock (1998)
Hudson Hawk (1991)
With Honors (1994)
Harlem Nights (1989)
Where the Buffalo Roam (1980)
Clerks (1994)
Pink Floyd: The Wall (1982)
Swing Kids (1993)
Category 11:

Snatch (2000)
Amelie (2001)
Lock (1998)
Crouching Tiger (2000)
The Big Lebowski (1998)
The Boondock Saints (1999)
The Passion of the Christ (2004)
City of God (2002)
Life Is Beautiful (1997)
Fear and Loathing in Las Vegas (1998)
Gone in 60 Seconds (2000)
Kill Bill: Vol. 1 (2003)
Lethal Weapon 4 (1998)
The Motorcycle Diaries (2004)

Vs.

Friends: Season 1 (1994)
Friends: Season 3 (1996)
Friends: Season 4 (1997)
The Best of Friends: Vol. 1 (1994)
Friends: Season 2 (1995)
The Best of Friends: Season 2 (1994)
The Best of Friends: Season 1 (1994)
Friends: Season 5 (1998)
Spider-Man 2 (2004)
The Best of Friends: Season 3 (1996)
Phone Booth (2003)
One Hour Photo (2002)
The Best of Friends: Vol. 2 (1994)
House of Sand and Fog (2003)
Category 12:

Boogeyman (2005)
The Grudge (2004)
Cellular (2004)
Dawn of the Dead (2004)
The Village (2004)
Tremors (1990)
White Noise (2005)
Secondhand Lions (2003)
The Forgotten (2004)
Saw (2004)
Jeepers Creepers (2001)
Wrong Turn (2003)
The Butterfly Effect: Director's Cut (2004)
Because of Winn-Dixie (2005)

Vs.

Fahrenheit 9/11 (2004)
Bowling for Columbine (2002)
Queer as Folk: Season 2 (2002)
Queer as Folk: Season 1 (2001)
Star Wars: Episode I: The Phantom Menace (1999)
The Matrix: Reloaded (2003)
The Matrix: Revolutions (2003)
Star Wars: Episode II: Attack of the Clones (2002)
Madonna: The Video Collection 1993-1999 (1999)
Mission: Impossible II (2000)
Tomb Raider (2001)
Star Trek V: The Final Frontier (1989)
Charlie's Angels: Full Throttle (2003)
Oz: Season 2 (1998)
Kill Bill: Vol. 1 (2003)
Category 13:

Fahrenheit 9/11 (2004)
Tomb Raider (2001)
Bowling for Columbine (2002)
The Fast and the Furious (2001)
The Matrix: Reloaded (2003)
Kill Bill: Vol. 1 (2003)
Gone in 60 Seconds (2000)
Kill Bill: Vol. 2 (2004)
Swordfish (2001)
Lethal Weapon 4 (1998)
The Royal Tenenbaums (2001)
Star Wars: Episode II: Attack of the Clones (2002)
Lara Croft: Tomb Raider: The Cradle of Life (2003)
The Matrix: Revolutions (2003)
Bad Boys II (2003)

Vs.

The Notebook (2004)
Cellular (2004)
National Treasure (2004)
The Terminal (2004)
The Village (2004)
Spanglish (2004)
Saw (2004)
The Butterfly Effect: Director's Cut (2004)
Hitch (2005)
13 Going on 30 (2004)