Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> For production, is there a good database system that can index this graph structure?

For awhile, we were storing this in a (very large) MySQL database, sharded with Vitess. The sharding behavior worked great (since repo ID gives you a nice sharding key), but we found that it wasn't elastic enough for our needs, since we quickly filled up the available capacity of the machines that we had reserved.

Since then we've switched over to storing this data in Azure Blob Storage, basically using it as a glorified key/value store. We had to write custom logic for deciding how to structure our data so that we can efficiently write it at index time and read it at query time, but so far it's been working quite nicely!

> for incremental update, how do you prune deprecated part of the graph

Short version is that we're storing everything on a per-file basis. So whenever a file is changed, we generate a new stack graph snippet for that file. There might be lots of content in that stack graph that is identical to the stack graph of the previous version of the file, but we don't try to do any structural sharing more fine-grained than the file.

Right now we aren't going in any pruning old files that aren't being touched by any active queries, but we could. Or move it to a colder storage tier in Blob Storage, something like that. At least for now, the marginal costs of storing the data for longer aren't our cost bottleneck.



For at least some languages, it might even be important to have access to older versions of a file.

As a concrete example, Go imports (at least for module-enabled code) is version-locked and the HEAD of the referenced code may no longer be representative of the code that would actually end up being compiled.

On the other hand, just having an easy navigational tool to get to roughly the right place is a very good help.


Right now code nav on GitHub only works within a repository, and so every link you follow keeps you within the commit that you’re already viewing. As we move to cross-repo code nav, you’re right that it will be difficult to determine the right commit to take you to when following a cross-repo link.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: