Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I hate hierarchical filesystems.

A lot of my older files, sadly, are stored in "SORT/Sort Me/To be sorted/Old computer/Sort again/Miscellaneous..." and the like. My server has an mlocate index, so I'll use mlocate, and I'll use find sometimes. I make sure to preserve metadata like last-modified/created dates, so I can use that to narrow things down.

Newer stuff, I try to keep a bit more organized, but I still have lots of unmanaged stuff floating around. For big projects, or big files, that's easy enough; my photos are sorted into a Y/M/D hierarchy, my VHS digitization projects are fairly well organized, some other things have their own structure. For my scanned documents, I just dump them all into a mess of folders, but then have a custom Django app with a management command that indexes them and gives me a nice "document management" website, and then I just search based on OCR'd text or title or date.

I really hate hierarchical filesystems. After using computers for this long, I'm convinced that hierarchy-optional, metadata-driven stuff is the only future I'll be happy in. I long for the ability to save things without really having to say anything about where it's saved, and still be able to find it... So, sorry, I don't think I have a satisfactory answer for you, as I don't think there's a good solution to this problem as long as we have filesystems where the organizational primative is a hierarchy. Even with tag-based systems that build on top of that, it's usually clunky and you still fundamentally have to figure out where to save something "first", even if you plan to access it via tag/metadata later. Such a pain.



My own approach, if you're interested, is to treat the filesystem as a repository of bytestreams, loosely organised by YYYY folders and then a single level below that, A-Z. I then read everything into a database, deduplicating by file hash and have a 3NF-modelled metadata layer (with 6NF history tables based on the anchor modelling concept) in Postgresql, also with a Django front-end. Only the file hash is stored in the database, not the binary blob. I keep things in sync using Dropbox's delta API.

Or at least, that's the plan. I've only implemented it as far as photo storage is concerned. Haven't yet figured out if Dropbox can be part of the general solution - security and privacy concerns.


Very cool.

I wrote up a spec very similar to this (though I just used the hash itself for the folders, as in HA/HAS/HASH structure [there's probably a name for that scheme]), but haven't gotten around to implementing it. My main problem with actually implementing such a system is that I don't really like depending on Django or web-based interfaces; I'm a huge fan of files, and UNIX-style tools that operate on them, I just don't like the hierarchical filesystem. I've considered that a FUSE frontend to such a system would probably address most of my concerns, but at that point it's still a big huge abstraction layer that I start to feel uncomfortable for nebulous reasons aside.

But, very nice. It's nice to hear that I'm not the only person driven to such extremes. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: