Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Beginner(?) question: why is the model

  map<term_id, 
      list<pair<document_id, positions_idx>>
     > inverted_index;
and not

  map<term_id, 
      map<document_id, list<positions_idx>>
     > inverted_index;
(or using set<> in lieu of list<> as appropriate)?


This is to be seen as metaphorical to give a mental model for the actual data structures on disk so there's some tradeoff to finding the most accurate metaphor for what is happening.

I actually think you are right, list<pair<...>> is a bit of a weird choice that doesn't quite convey the data structures quite well. Map is better.

The most accurate thing would probably be something like map<term_id, map<document_id, pair<document_id, positions_idx>>>, but I corrected it to just a map<document_id, positions_idx> to avoid making things too confusing.


Currently it looks like this:

    map<term_id, 
      map<pair<document_id, positions_idx>>
      inverted_index;
list<positions> positions;

Think you also meant to remove the pair in map<pair>?


Haha, apparently very hard to get this right. Fixed again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: