Sunday 3 February 2013

Amazon must invest in bio-data centers


For all the scramble that we've had over cloud computing the past few years, there could be a gentle evolution underway.  Last August, George Church's Regenesis became the first book to be stored in DNA and was replicated 40billion times (it was fun am sure).  It cost about $1,000 to print onto the first DNA chip and then another $50 to replicate via polymerase chain reaction (PCR).   

How it's done
A book constitutes words and images that fit into an HTML file (Church's book has 53,246 words and 11jpeg images).  The file itself occupies a few hundred KB in traditional information storage terms.  Broken down, the file represents a series of 0s and 1s that can be encoded into ATCG equivalents (primary nucleobases found in DNA).  Church's team assigned 0 to As and Cs and 1s to Gs and Ts.  When we do this, we get a few million base pairs and a few thousand DNA segments.  Once the ATCG equivalent of the book is available on a computer, it can be taken to a lab to synthesize.  The result is an extremely compact storage mass with massive capacity to store - an oligonucleotide chip.  But direct encoding of AT, CG into 0s and 1s provides errors and hence newer experiments use a ternary system (not binary) with 0,1 and 2 that try not to confuse DNA readers.  However, we must just remember that the book in our hands will now be all sticky and gooey.  

A DNA Kindle
Just as we need a Kindle for books (basically a hardware device with software capable of interpreting 0s and 1s in a human-readable form), we will now need rapid DNA-readers (sequencers).  Cost of sequencing is expected to drop to less than $1,000 but scientists are still working with humans in mind, not books and other data that we may wish to store and read.  

A device that can take gooey, sequence it to get the ATCGs and then present it in human readable form could soon be a much needed Christmas toy.  Without stretching my imagination too far, I see a possibility where we have bio-printers and readers at home.  We would get our gooey in a dark box in the mail (it has to be protected from sunlight), we pour it, sequence it and print it (from tissue to book to whatever!).

Data centers with a 10,000 year shelf-life
Our current storage systems have poor shelf-life.  A book has wear and tear.  A CD breaks and scratches.  Data servers heat up and die and require expensive backups.  And DNA?  DNA stays.  Think that every DNA base pair in our cells has had an original existence tying us right to the very beginning of life (some 3,900 million years ago when single cell organisms first discovered that sex was a good idea).  It's always been there - not in its current diversity but in its base forms.  I learnt that we now have the complete DNA code of our brethren neanderthals - a genome consisting of 4billion nucleotides.  Read this fascinating scientific study that also explores whether we (humans) messed around with them.

It's possible therefore to foresee the advent of an entirely new industry to store humanity's data in biological form.  Instead of large physical data centers housing hundreds of thousands of servers, coolers and other goodies, we will have micro-scale biological storage systems in cold, dark places.  It's so micro that a single gram of DNA holds 2.2million gigabytes of data.  The Economist says that you can fit the world's data in a lorry (truck for some).

*
Amazon Web Services started with a small group of people in South Africa who imagined a brave new world running storage and computing off the Internet (AWS today is an est. $3.8B business).  Amazon hopefully knows it's time to let its engineers mess with biology.  Hopefully Facebook doesn't.