Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Internet Archive Wants to Digitize 40,000 VHS and Betamax Tapes (2014) (fastcompany.com)
99 points by edward on Aug 3, 2015 | hide | past | favorite | 20 comments


I worked on the digital archiving project at the Library of Congress. In that case, it was mostly wire spool recordings, wax cylinders, and LPs, but many of the same principles apply:

- This media can be fragile. Less so with VHS or Betamax, but it's something to keep in mind.

- It's not usually just a single copy and drop it into an archive. Most of the time, they'll want a super high quality, uncompressed master for long term preservation. This is with the hope that they never have to go back to the physical media.

- Once they have that copy, they create lower quality versions for day to day display, usage, study, etc.

- When you consider the "work" (in the artistic/archival sense, not the effort sense), it's not just the video. Quite often they'll want image scans of the media itself, container, etc. This is because the video itself may have been suffered damage due to physical damage.

- Next, you need to document the whole process down to the brand and settings of every single device. If it's known (or discovered later), that a particular piece of equipment distorts media (plays faster/slower, changes color palettes, etc), you need to know that and potentially correct for it at a later time.

- And finally, don't forget the metadata! A collection isn't useful if someone doesn't capture the who, what, and when. This is often the most painful part because the settings above are often set once and forgotten. This requires manually transcribing information written on the media, watching it, or a variety of other methods. Most libraries farm this out to interns. My new company - Clarify.io - hopes to help with this aspect.

If you're interested more in the mechanics, please feel free to let me know.


Would you consider doing a reddit AMA? HN doesn't quite seem like the right place/format for it.

Although, I'm gonna cut right in line and ask: what's the most interesting thing you've archived? If you're not sure what we'd find interesting, some suggestions are - content, medium and difficulty.


I've considered it before but wasn't sure anyone would care. Now as I'm seeing more of these archiving efforts discussed, I'll consider it. :)

For the most interesting thing, I'll share two answers:

- First, there were the truly historic pieces of media. Things like Thomas Edison's first motion pictures and reporters' wire spool recordings of D-Day hours after the invasion "ended" were always the most amazing. You read about this stuff, but hearing it is something else entirely.

- On the other end of the spectrum, you have to remember that everything that has a registered copyright has been sent to the Library of Congress. That includes porn. My group didn't archive it but the group down the hall had to catalogue it. No, they didn't have private offices. ;)


Wow what an interesting coincidence...I just moved to Austin and am working on preserving some extremely historic recordings. Emailing you now!


Could y'all try to find a way of getting an AMA link back to HN if you end up doing one? This is fascinating stuff.


I'll definitely be doing an AMA about my specific project. I'm not at all an expert in the field but I'll share the story and my experiences.


If/When it happens, I'll make sure to publicize it via our Twitter account - @ClarifyIO. I can't promise more atm.


> - Once they have that copy, they create lower quality versions for day to day display, usage, study, etc.

I think the Internet Archive's systems do that automatically.


"Most of the time, they'll want a super high quality, uncompressed master for long term preservation."

Is this really practical for video?


If the alternative is going back to a known-to-be degrading physical copy, it doesn't matter how practical it is. They'll work to figure out how to do it.


Hmmm, I thought there might be more information on what programs are included. And was also expecting some sort of probably sad discussion on content ownership.

On a side note, the Internet Archive HQ building is pretty cool: https://www.google.com/maps/place/300+Funston+Ave,+San+Franc...


Why does it cost $12 per tape to digitize it? I recently digitized a giant pile of old home movies, and all it took was an $80 usb video capture device, $20 thrift store VCR, and a few minutes each. About every dozen tapes I had to take the cover off and clean the heads, but that was no big deal.

Some older tapes will need a time base corrector to rebuild the sync signals, but they aren't that expensive.


If I remember correctly, the IA also likes to encode the video several times in different qualities and formats and that results in large archives. A 1 hour TV show might end up taking 100 gigs of space. With RAID and replication, an enterprise 4tb drive that costs $200 might end only holding 2tb of data, or 20 TV shows. That ends up being $10 in equipment just by itself for storage. Then add on the costs of power and equipment to encode it and the labor, I can see things costing $13 per tape.


Pay someone $15/hr to babysit the player, type in the episode and guest names.


$12 per tape is cheap to make it available to everyone...


Despite my message, I am very happy that these tapes are being made available.


This seems like a trivially parallelisable problem. You can buy used VCRs by the pallet cheaply[1], although servicing and possibly retrofitting them into good condition is probably the major bottleneck here.

[1] https://govdeals.com/index.cfm?fa=Main.Item&itemid=1045&acct...


Many of the tapes are BetaMAX.


There are maybe about 100 betamax VCRs for sale on ebay at the moment.

http://www.ebay.com/sch/VCRs-/15088/i.html?_nkw=betamax


I envisage much grainy audio of intellectual (or pseudo intellectual?) discussions featuring on soundcloud sad-core and post rock accounts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: