Something I’ve been thinking about came up several times at BioIT World last week. It was mentioned during the cloud computing workshop and even Deepak Singh’s keynote. It’s the notion of service oriented architectures that offer boutique or non-commodity infrastructure as a service. Clearly the reason that Amazon Web Services was able to shake things up in this space was due to economies of scale. This is a core tenet of cloud computing. Companies like Google and Amazon can leverage their massive operational scale and purchasing power in order to democratize access to compute and storage.
The buzz around the cloud and high-performance computing is almost always in reference to scale-out or horizontal scaling architectures. While I am known to have drunk the kool-aid, I also take issue with the idea that this is the only way to scale applications. Now that the cloud is becoming less hip and anyone with knowledge of a scripting language is able to crunch terabytes of data on thousands of CPU’s, I’m left wondering where the next challenge is.
Big clusters and big storage are essentially solved problems. And that’s why it’s reached a level of abstraction that allows me to manage it all from my laptop and a web browser. A lot of the best practices that arose in the cloud era are finding their way back into the HPC space. The cloud has accelerated efforts in automation, data-intensive compute frameworks, asynchronous programming, and has blurred the line between the developer and the sysadmin. That’s what makes the cloud awesome.
For scientific computing in particular it has finally provided agile and experimental IT to match the experimental essence of science. A bioinformatician or computational biologist too often handles responsibility in the wet lab as well as the machine room. Research should not have to wait 4 to 6 months to acquire new hardware or spend 5 days configuring a relational database. This stifles scientific progress and takes the researcher away from doing what she does best.
What do we do when the problem is special and the resource requirements aren’t consumer grade compute and storage. Are we back to the drawing board? How can we take all the awesomeness of the cloud and make it work in those scenarios? I’m talking about IaaS offerings that include ASICs, GPGPUs, Infiniband, NUMAlink, and the stuff that gives you real performance and not just throughput. It doesn’t have to stop with IT infrastructure. What if cloud sequencing had an API? What if anyone could have easy remote access to a mass spec or a synchrotron?
This is technology we have in hand today. You can partner with big universities to use their high end lab instruments. You can even apply for time on special supercomputers like the Anton from DESRES. I would like to see these types of resources open up in the same way that Amazon enabled anyone with a credit card to spin up a big IT infrastructure. This is critical for small companies to compete and innovate. It would also create new business models and platforms built on top of these services. The cloud gives us some of this today, but what will it look like in the future?