Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In practice we are using an image size of 17x16 which will result in a hash size of 256 bits and currently it seems to work pretty well. I ran the algorithm through the whole dataset (about 330.000+ icons) and I would say that from all the duplicate matches about 1% where false positives.

Also, we will be integrating this into the reviewing process for an iconset, where we also do a manual quality check, showing possible matches to something currently uploaded so skimming over one or two false positives isn't such a big deal and we where more interested in the speed of the algorithm.



That's pretty impressive performance given the hash size and speed. Thanks for sharing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: