The ugliest hack I've ever pulled off

jwp · on June 12, 2007

Yeah, yeah, so the engineering part wasn't that great. But were the results good? Which algorithms did you use? What were your features? Seems like an interesting experiment, especially with all the gratuitous "friending" people do. Reminds me of a related paper: http://www.hpl.hp.com/research/idl/papers/facebook/facebook.pdf

adamsmith · on June 12, 2007

Check out the report -- http://www.scribd.com/doc/747/Friendship-Prediction-on-Facebook

Cliffs notes version:

It turns out that how many friends of friends you have in common is the best predictor. After that, it's the number of photos you appear together in, and how many photos your friends of friends appear in. Following that is the number of classes you have in common.

All of the traits (like religious views, what state you're from, guy/girl, etc) are secondary.

It worked pretty well.

(Coincidentally, the facebook friendship prediction was my answer to the last question on the YC app.)

jwp · on June 12, 2007

The non-friend vs anti-friend distinction didn't even occur to me, but it's clearly an important part of the experiment. I like all the discussion of data. Fun problem.

I did not realize squaring an adj matrix tells you what it does. Thanks for edjumacating me.

Did you go past f2hops? Seems like 3 would be reasonable and predictive. Since 1/2 of your tree is so small, and there were 12k nodes in the tree, that suggests to me a pretty easy task. Do you agree? It would be interesting to see if PCA or LDA pick the same features as the decision trees did. Just a click away in Weka, after all.

(An aside, and neat hack: Buddy of mine just walked in and saw the document on my screen. He saw the decision tree and said, "I remember those. In grad school I printed out decision trees as C if/else statements. Part of running my decision tree was a call to gcc.")

adamsmith · on June 13, 2007

Hi jwp,

I think I tried or wanted to do f3hops. Either I did it and I was stretching the 2GB memory limit, or I couldn't. As you exponentiate the sparse matrix more and more you get a less and less sparse matrix. I was really hurting for memory.

Yeah, I really wish I could have spent more time exploring the algorithms side. If I didn't start Xobni during summer 2006 the plan was to write a book on machine learning, in practice. One of these days..

P.S. Would love to get in touch. Can you post your email or send me a note? My email is adam dot smith foo xobni.com where foo == @.