Archive

Dynameomics: Mass annotation of protein dynamics

Just in case you need another -omics in your biotech vocabulary. Dynameomics is an effort by the Dagget group at the University of Washington to

characterize the native-state dynamics and folding/unfolding pathways of representatives of all known protein folds by way of molecular dynamics simulations

Three successive articles have been published in Protein Engineering Design & Selection to describe over 3000 long molecular dynamics simulations, the computational workflow, and data mining capabilities of Dynameomics. Dynameomics has applets for visual analysis and even high-quality movies of their MD trajectories!

Papers:

Video: 4PGA unfolding movie

High-performance data appliances (Netezza)

This afternoon I sat through a presentation from a few guys at Netezza. They were here to discuss their system for high-performance data analytics. What they’ve effectively done is build a large database machine with some special hardware to accelerate database queries via parallel processing nodes. These are some notes I jotted down:

Architecture:

  • SMP Host
  • 100+ specialized processing units per cabinet (they named them SPU’s for “snippet processing units”)
  • SPU’s have their own PPC CPU, commodity disk, memory, and an FPGA
  • GigE networks between SPU’s
  • SMP Host partitions queries and broker activity to the processing nodes
  • Hardware fault-tolert (SPU’s can be hotswapped)

I’ll admit my skepticism tends to mount against any speaker that spends a lot of time at the outset with a marketing pitch when the audience is full of scientists. Do scientists need to be reminded that data sizes are growing? Or that enterprise X, Y, and Z are already using your product? Just show me how at works.

I did a quick search across my feeds to see if anyone has written about Netezza and (not surprisingly there is a post over at Computing at Scale. It appears there are similar efforts from Teradata, Greenplum, and DATAllegro in this space.  I can imagine how a systems like Netezza’s might complement more traditional supercomputing.  There’s certainly a big effort to commercialize the “new era of HPC” but the technologies that come out of it are business-driven and not science-driven.

Around the web 3/21/08

quarternion_jmol

Around the web, week of March 21, 2008

    Journals
    Big science from Andrei Sali and David Baker

  • The molecular architecture of the nuclear pore complex
  • De Novo Computational Design of Retro-Aldol Enzymes
  • Blogs

  • Nature archive visualized - a Processing sketch to visualize the keywords from Nature over the last 30 years. Some of the more spurious terms could probably be cleaned up but even as a draft the effect is pretty neat.
  • Research streaming is born. Mike from Bioinformatics Zen is auto-publishing his svn commit messages and uploading figures he generates to Flikr. This would be well suited to someone like me who has too many projects going on to stop and dedicate time to blog about them here.
  • Universal Parallel Computing Research Centers are being heavily funded by Microsoft and Intel. One at University of Illinois at Urbana-Champaign, well known for the CHARMM++ parallel library and the super-scalable NAMD molecular dynamics package built on top of it. The other will be located at UC Berkeley.
  • The End of the Relational era, is SQL dying? Bill McColl of Computing at Scale says it is. I would argue that relational databases have received the golden hammer treatment over the years. But I totally agree with his prediction that SQL will ultimately be replaced by DSL’s having implicit data-parallelism.
  • The Youtube API has been updated with some significant improvements for developers. Uploads, comments, and video playlists can all be manipulated outside of youtube. This makes a convincing case to leverage the massive youtube userbase if your site deals with video content.
  • Tech

  • I’ve finally moved most of my projects from SVN to Git. I’m now a ‘branch-a-holic’ and git definitely fits my workflow better than subversion now that I’m used to it.
  • Capistrano is typically used for Rails deployment, but I’m finding it’s good for just about anything you want to run across multiple remote hosts. This is a great mini-language for cluster admins who don’t want to struggle with something like mpirun

Biorobotics: Snake Robots! [Video]


These things are being developed by the Robotics Institute on our campus. I’m partially amazed and partially terrified. I’ve heard they work wirelessly and they want to have snakes where each module has a camera so they can break apart into independent pieces, spread, and reassemble automatically. Some of the climbing behavior is pretty impressive…

Read more about this technology at the Modsnake website.

Around the web 3/7/08

rb-processing

Around the web, week of March 7, 2008

A domain specific language for screencasting

castanaut

Two topics that I have been been reading a lot lately, Domain Specific Languages in Ruby and screencasting have converged to create a very cool little project called Castanaut. I found Castanaut via Peter Cooper of Ruby Inside. Castanaut is essentially a programming language for screencasts. So in castanaut you can write things like this:


launch "Safari", at(10, 10, 800, 600)

type "http://www.inventivelabs.com.au"

hit Enter

pause 2

move to(100, 100)

move to(200, 100)

move to(200, 200)

move to(100, 200)

move to(100, 100)

say "I drew a square!"

Thanks to the flexibility of Ruby, you can write your screenplay as a script and run it to automatically create a screencast. How cool is that? While this might take some of the personal touch away from screencasts, it could also be a powerful tool for those who need to create them in a more systematic way.

What would you do with a million CPU’s?

ps3folding

There’s a new podcast on Futures in Biotech with Dr. Pande from Folding@Home. Macresearch summarized it well:

  • How a bunch of Sony PS3s have become the largest component of the world’s fastest computer
  • The challenges of distributed computing, and in particular how data storage and CPU usage can actually complement each other
  • After the hype in the 80s around computational modeling of protein structure, the computational power available today could finally make that hype a reality
  • How to take a non-parallel task and transform it into a series of computational chunks (a.k.a. how to make a baby in 1 day with 270 women)
  • How modeling of protein structure will be able to get more into the dynamics of protein conformational changes
  • What would you do if you had 250,000 CPUs?
  • I really like the final point, “What would you do with 250,000 CPU’s”, because it’s an important question. Petascale computing has arrived but most applications aren’t ready to scale to thousands or millions of cores. Folding@Home is as a distributed computing project as it is biomedical. What they’ve been able to do is treat simulations as data and use bayesian data mining techniques to put together the whole picture with suprising efficiency. A clever workaround for Folding@Home’s “supercomputer”, which is severely limited by network latencies and individual agents with slow hardware compared to ‘real’ supercomputers. Finally he reports that PS3’s and GPU’s are achieving 20-30x acceleration. Exciting stuff!

    image taken from Flikr, CC licence

    The Low-Information Diet

    Learning to ignore things is one of the great paths to inner peace.
    -Robert J. Sawyer, Calculating God

    Over the holidays I used the time off to finally read the excellent book by tech entrepreneur Timothy Ferris entitled The 4-Hour Workweek. Among his many techniques for increasing effectiveness and lifestyle design, Tim prescribes a “Low-Information Diet”. Being away from the lab was a perfect opportunity to test out an immediate one-week media fast. The rules are pretty simple:

    • No newspapers, magazines, or nonmusic radio
    • No news at all
    • No television
    • No reading except one hour of fiction
    • No Web surfing

    This really exposed a bad habit of mine, unnecessary reading. My attention is almost constantly consumed by Google Reader as I unenthusiastically scour blogs, news, forums, and journals for several hours per day rendering me much less effective for the most important tasks. Following the rules above for over a week I feel rejuvenated.  There’s a 9-day information gap in my Google Reader stats that I am quite proud of

    google reader fast

    Most Innovative Use of HPC in Life Sciences

    The PSC and NRBSC have made the news again, this time in HPCwire. They’ve posted the Readers and Editors Choice awards for SC07 and the WiiMD demo earned us “Most Innovative Use of HPC in Life Sciences”.

    wiimd_bowling

    Further Reading:
    WiiMD: Bowling on Big Ben
    Engadget: wiimote used in buckyball bowling and other educational simulations

    Anti-aging drugs, metabolism, and disease

    1s7g

    Since today is my birthday I find myself thinking about aging. I went back and listened to one of the first FIB podcasts. It features Leonard Guarente, who helped start Elixir pharmaceuticals, a company specifically targeting aging and metabolic disease. Here’s a link to the mp3.

    Is aging just a disease? Metabolism and aging appear to be very tightly coupled. Even more interesting is how calorie restriction (starvation) has a direct impact on reproduction and lifespan. Dr. Leonard Guarente identified the SIR2 gene 6 or 7 years ago. Sirtuin enzymes (pictured above) are NAD-dependent and activated under special conditions like starvation. The biochemistry suggests that therapies could be possible. Knockout the SIRT1 gene in mice and the quality of life is greatly decreased. Starve any organism from yeast to chimps and they live longer by activating specific pathways. There’s 7 genes in mammals so there’s obviously much left to be understood. Trying to inhibit specific enzymes is pretty common, but targeting pathways and complex signaling in cellular metabolism can be tricky. The signals exist in nature. The technology developed in systems biology and metabolomics could really help answer some important questions in the next 10 years or so.

    Avalon pharmaceuticals has a technology for generating pathway signatures, so rather than screening compounds against a single target they claim to be able to target entire pathways. They also have an advanced candidate for solid tumor cancers that’s an IMPDH inhibitor. IMPDH is another NAD-dependent enzyme that I have done a small amount of work on.