Just in case you need another -omics in your biotech vocabulary. Dynameomics is an effort by the Dagget group at the University of Washington to
characterize the native-state dynamics and folding/unfolding pathways of representatives of all known protein folds by way of molecular dynamics simulations
Three successive articles have been published in Protein Engineering Design & Selection to describe over 3000 long molecular dynamics simulations, the computational workflow, and data mining capabilities of Dynameomics. Dynameomics has applets for visual analysis and even high-quality movies of their MD trajectories!
This afternoon I sat through a presentation from a few guys at Netezza. They were here to discuss their system for high-performance data analytics. What they’ve effectively done is build a large database machine with some special hardware to accelerate database queries via parallel processing nodes. These are some notes I jotted down:
Architecture:
SMP Host
100+ specialized processing units per cabinet (they named them SPU’s for “snippet processing units”)
SPU’s have their own PPC CPU, commodity disk, memory, and an FPGA
GigE networks between SPU’s
SMP Host partitions queries and broker activity to the processing nodes
Hardware fault-tolert (SPU’s can be hotswapped)
I’ll admit my skepticism tends to mount against any speaker that spends a lot of time at the outset with a marketing pitch when the audience is full of scientists. Do scientists need to be reminded that data sizes are growing? Or that enterprise X, Y, and Z are already using your product? Just show me how at works.
I did a quick search across my feeds to see if anyone has written about Netezza and (not surprisingly there is a post over at Computing at Scale. It appears there are similar efforts from Teradata, Greenplum, and DATAllegro in this space. I can imagine how a systems like Netezza’s might complement more traditional supercomputing. There’s certainly a big effort to commercialize the “new era of HPC” but the technologies that come out of it are business-driven and not science-driven.
Nature archive visualized - a Processing sketch to visualize the keywords from Nature over the last 30 years. Some of the more spurious terms could probably be cleaned up but even as a draft the effect is pretty neat.
Research streaming is born. Mike from Bioinformatics Zen is auto-publishing his svn commit messages and uploading figures he generates to Flikr. This would be well suited to someone like me who has too many projects going on to stop and dedicate time to blog about them here.
The End of the Relational era, is SQL dying? Bill McColl of Computing at Scale says it is. I would argue that relational databases have received the golden hammer treatment over the years. But I totally agree with his prediction that SQL will ultimately be replaced by DSL’s having implicit data-parallelism.
The Youtube API has been updated with some significant improvements for developers. Uploads, comments, and video playlists can all be manipulated outside of youtube. This makes a convincing case to leverage the massive youtube userbase if your site deals with video content.
Tech
I’ve finally moved most of my projects from SVN to Git. I’m now a ‘branch-a-holic’ and git definitely fits my workflow better than subversion now that I’m used to it.
Capistrano is typically used for Rails deployment, but I’m finding it’s good for just about anything you want to run across multiple remote hosts. This is a great mini-language for cluster admins who don’t want to struggle with something like mpirun
These things are being developed by the Robotics Institute on our campus. I’m partially amazed and partially terrified. I’ve heard they work wirelessly and they want to have snakes where each module has a camera so they can break apart into independent pieces, spread, and reassemble automatically. Some of the climbing behavior is pretty impressive…
Backward compatibility ≠ Forward Scalability, Intel places some caveats on the free-lunch, legacy software won’t take advantage of future architectures unless they’re redesigned. This shouldn’t be a surprise but it hasn’t always been the case. The trend of adding more cores and changing memory architectures means that some applications may get left in the dust unless they’re optimized for these new paradigms.
Two topics that I have been been reading a lot lately, Domain Specific Languages in Ruby and screencasting have converged to create a very cool little project called Castanaut. I found Castanaut via Peter Cooper of Ruby Inside. Castanaut is essentially a programming language for screencasts. So in castanaut you can write things like this:
launch "Safari", at(10, 10, 800, 600)
type "http://www.inventivelabs.com.au"
hit Enter
pause 2
move to(100, 100)
move to(200, 100)
move to(200, 200)
move to(100, 200)
move to(100, 100)
say "I drew a square!"
Thanks to the flexibility of Ruby, you can write your screenplay as a script and run it to automatically create a screencast. How cool is that? While this might take some of the personal touch away from screencasts, it could also be a powerful tool for those who need to create them in a more systematic way.
How a bunch of Sony PS3s have become the largest component of the world’s fastest computer
The challenges of distributed computing, and in particular how data storage and CPU usage can actually complement each other
After the hype in the 80s around computational modeling of protein structure, the computational power available today could finally make that hype a reality
How to take a non-parallel task and transform it into a series of computational chunks (a.k.a. how to make a baby in 1 day with 270 women)
How modeling of protein structure will be able to get more into the dynamics of protein conformational changes
What would you do if you had 250,000 CPUs?
I really like the final point, “What would you do with 250,000 CPU’s”, because it’s an important question. Petascale computing has arrived but most applications aren’t ready to scale to thousands or millions of cores. Folding@Home is as a distributed computing project as it is biomedical. What they’ve been able to do is treat simulations as data and use bayesiandata mining techniques to put together the whole picture with suprising efficiency. A clever workaround for Folding@Home’s “supercomputer”, which is severely limited by network latencies and individual agents with slow hardware compared to ‘real’ supercomputers. Finally he reports that PS3’s and GPU’s are achieving 20-30x acceleration. Exciting stuff!
Learning to ignore things is one of the great paths to inner peace.
-Robert J. Sawyer, Calculating God
Over the holidays I used the time off to finally read the excellent book by tech entrepreneur Timothy Ferris entitled The 4-Hour Workweek. Among his many techniques for increasing effectiveness and lifestyle design, Tim prescribes a “Low-Information Diet”. Being away from the lab was a perfect opportunity to test out an immediate one-week media fast. The rules are pretty simple:
No newspapers, magazines, or nonmusic radio
No news at all
No television
No reading except one hour of fiction
No Web surfing
This really exposed a bad habit of mine, unnecessary reading. My attention is almost constantly consumed by Google Reader as I unenthusiastically scour blogs, news, forums, and journals for several hours per day rendering me much less effective for the most important tasks. Following the rules above for over a week I feel rejuvenated. There’s a 9-day information gap in my Google Reader stats that I am quite proud of
Is aging just a disease? Metabolism and aging appear to be very tightly coupled. Even more interesting is how calorie restriction (starvation) has a direct impact on reproduction and lifespan. Dr. Leonard Guarente identified the SIR2 gene 6 or 7 years ago. Sirtuin enzymes (pictured above) are NAD-dependent and activated under special conditions like starvation. The biochemistry suggests that therapies could be possible. Knockout the SIRT1 gene in mice and the quality of life is greatly decreased. Starve any organism from yeast to chimps and they live longer by activating specific pathways. There’s 7 genes in mammals so there’s obviously much left to be understood. Trying to inhibit specific enzymes is pretty common, but targeting pathways and complex signaling in cellular metabolism can be tricky. The signals exist in nature. The technology developed in systems biology and metabolomics could really help answer some important questions in the next 10 years or so.
Avalon pharmaceuticals has a technology for generating pathway signatures, so rather than screening compounds against a single target they claim to be able to target entire pathways. They also have an advanced candidate for solid tumor cancers that’s an IMPDH inhibitor. IMPDH is another NAD-dependent enzyme that I have done a small amount of work on.