The Explosion of Data and Where To Put It

by James Mathers
Cinematographer and Founder of the Digital Cinema Society

(Excerpted from the Digital Cinema Society eNewsletter, March 2016)

DataStorageTitleThis is a time of year when Cine equipment manufacturers are preparing to present their new gear at NAB. I don’t think there’s a conspiracy, or any sort of collusion between manufacturers, but there seems to be a definite theme each year at NAB. There’s always a couple of new cameras, but recent NABs have been all about 3D, then 4K, and I think this year we’re going to be hearing a lot more about HDR and VR.

My view into the crystal ball is informed and clarified by recent trips to CES and the HPA Tech Retreat. Both are good primers for what to expect at NAB. I have already written extensively on these topics, but as a result of the rise of each of these aforementioned technologies, there is another subject that I don’t think is getting the attention it deserves. In this essay, I’m going to write about Storage. It is perhaps not the sexiest of subject matter, but as we amass evermore data, it is of critical importance. The growth is exponential, especially in the Entertainment Industry with our penchant for backing everything up, the seemingly ad infinitum versioning, and streaming on demand at ever higher bandwidths.

Digital dilemmaIt has been almost a decade since the Academy’s Science and Technology Council published “The Digital Dilemma” which evaluated the state of digital preservation with the goal to better understand what problems our industry faces and what can be done to avoid full-fledged data disasters down the road. They followed up with “The Digital Dilemma 2” which reported on digital preservation issues facing those of us who do not have the resources of large corporations such as independent filmmakers, documentarians and nonprofit audiovisual archives. These are still relevant and valuable studies which are available as free downloads from the Academy’s website:

While the Entertainment business is the center of my universe, let’s also take a more macro view of data in general and try to imagine just how much is being generated around the world.  One bit, or binary digital, is the basic unit of computer data and expresses one of two states, either a 1 or 0…on or off. Every time we add a bit to these values we double the possible states: 2 bits = 4 possible states, 3 bits = 8 possible states, etc. (It’s easy to see that a Digital Cinema camera outputting 16 bit uncompressed 4K is one hell of a lot of data.)

Bits are generally arranged and stored in eight-bit multiples called bytes. One thousand bytes equals a kilobyte or KB. One million bytes equals a megabyte or MB. One billion bytes equals a gigabyte or GB. One trillion bytes equals a terabyte or TB.

StackOfDVDYou probably all have a vague idea of how much you can store on a one-terabyte drive, but as another physical example, let’s use the familiar single layer DVD, which has a capacity of 4.7GB. If we wanted to burn one terabyte of data onto DVD, it would take 213 discs. Recording 4K uncompressed RAW, that would only afford us 34 minutes of recording time.  We’re obviously going to run through that pretty quickly, so what comes next? A petabyte or one PB is equal to 1,000 terabytes. An exabyte or EB is equal to a million terabytes, and a zettabyte or ZB is equal to one billion terabytes.

The Internet is, of course, the grand daddy of data hogs. Take Twitter as just one example of social media. They are expected to soon raise their character limit per message to 10,000, but even with the current 140 character limit, a year’s worth of Tweets add up to 4 petabytes. It is estimated that on an average day, Google processes over 20 petabytes of information as a result of over 4 million search queries per minute from the over two and a half billion global internet users.

Also occurring each minute:
• Facebook users share nearly 2.5 million pieces of content.
• Twitter users tweet nearly 300,000 times.
• Instagram users post nearly 220,000 new photos.
• YouTube users upload 72 hours of new video content.
• Apple users download nearly 50,000 apps.
• Email users send over 200 million messages.

UploadingDataA study published by Newstex estimates that five exabytes of content were created between the birth of the world and 2003. By 2013, 5 exabytes of content were created each day. The total internet is expected to generate over 500 Zettabytes by 2019, which is 49 times current cloud traffic. But let’s not stop there; 500 zettabytes is equal to half a yottabyte, and a yottabyte or YB is equal to one trillion terabytes, or one septillion bytes, or one million-million-million kilobytes! Forgive the pun, but that’s a “yotta bytes!”

SpinningDiskHDSo where do we put it all? Flash memory is pretty great for acquisition, but still too expensive for anything but very near-term handling of the data. Spinning discs are currently the norm in the midterm, but as anyone who has ever suffered the heartbreak of a hard drive crash knows, discs have their risks. Of course, these risks can be mitigated by various RAID schemes, backup, and periodic migration. Throughout the post production pipeline, these solutions serve our industry very well; but what about the longterm?

Media companies, (even government and financial institutions, for that matter,) have long been dependent on tape-based solutions. There are various generations of LTO tape deployed, and the lifespan of data stored on these tapes is usually quoted as 30 years. However, tape is extremely sensitive to storage conditions, and the life expectancy numbers cited by tape manufacturers assume ideal storage conditions — a constant temperature of about 70 degrees Fahrenheit and 40% relative humidity.

DataToFilmWhat’s old is new again in that celluloid film is being deployed as a good alternative for longterm motion picture archive, even for movies acquired digitally. Black and White separations are created and carefully stored which are estimated to have a lifespan of around one hundred years. Even digital information in the form of a bar code has been recorded on celluloid as a best of both worlds answer for longterm storage.

OpticalDiscOptical storage is also gaining ground again as an alternative. Far advanced from the previous example of a 4.7GB DVD or even a Blu-ray at 50GB, companies such as Panasonic and Sony are creating optical disc technology that is said to last for 100 years with the capability of storing up to 1TB per disc. Panasonic’s optical disc system, developed in conjunction with Facebook, is already starting to be adopted in Silicon Valley.

Optical discs have the advantage of not needing a special storage environment with constant temperature or humidity, and retrieval can be speedier with random access as opposed to the linear tape formats. Perhaps they learned their lesson as rivals in the 1980’s Betamax vs. VHS format war. Although they will market the discs separately under their brands, Sony and Panasonic are working together to avoid any such future conflicts as the format evolves. They are currently pushing the optical discs for cloud service companies and archival services with no near term plans to sell to consumers.

CloudGraphicWe might be able to send our data up to the cloud, (which basically just just means storing our data on someone else’s computer,) but that doesn’t abdicate our responsibility. Even with the best of solutions, protecting our assets and figuring out how to store them will never be a “set it and forget it” proposition. It will continue to be an ongoing process and a subject we need to give our careful attention to. The subject of storage may not be sexy, but the alternative of losing our individual work or our collective motion picture heritage is simply not acceptable.

A compressive combination of solutions with adequate backup on differently located systems is probably best. If you need help developing a strategy, don’t be shy about reaching out to an expert. Several DCS sponsors such as OWC and Daystrom have short and longterm solutions available at a variety of levels, from desktop to full cloud integration, and I know they’ll be happy to advise you.

