Archive for the 'Bioinformatics' Category

Using Ruby for Bioinformatics Applications

bioruby

When I started working in a bioinformatics research lab I quickly discovered the wonderful dynamic language that is Perl. I’ve spent a couple of years with Mastering Perl for Bioinformatics somewhere on or around my desk. Perl itself was designed with text-processing and reporting in mind so naturally it’s become widely used when handling biological data.

So everything bioinformatics should be coded in Perl, right? A couple of years ago I might have agreed, but now I feel differently. My first “Perl, I’m leaving you.” moment came when I discovered the way that Rails does web programming. Ruby is the magic in Rails, but I soon discovered Ruby goes much beyond web frameworks. To quote Ezra:

“I came for the Rails, but I stayed for the Ruby”

I wanted to compile some links to show how an active community is positioning Ruby to be a powerful language for bioinformatics programming:

BioRuby – open source bioinformatics library

BioRuby on Github

Web Frameworks

Ruby on Rails – the famous MVC framework that made ruby popular

Merb – fast, lightweight MVC framework

Camping – 5k microframework

Sinatra – web development DSL

Ramaze – simple, light, and modular web application framework

Rack – Webserver interface

Distributed/Parallel Computing

DRb- Distributed Ruby

SkyNet- Map Reduce in Ruby

Rinda – Linda parallel programming model in Ruby

rxgrid – Xgrid batch language

MPI Ruby – MPI bindings for Ruby

amazon-ec2 – Amazon EC2 API

Testing/Spec

RSpecBDD framework

Test::Unit – Unit testing in the Ruby standard library

Integration with other programming languages

JRuby – JVM ruby implementation

SWIG and Ruby – automatically generate C interfaces

Ruby C extensions

Math/Statistics

Ruby-GSL – wrapper for the GNU Scientific Library

RSRuby- R statistics package in Ruby

SciRuby

Ruby NArray – similar to NumPy

Visualization/Graphics

Ruby Gnuplot – Gnuplot bindings

Ruby-Processing – The Processing language in Ruby

ruby-opengl – OpenGL bindings

Gruff – Graph API

Ruby-SVG – SVG Graphics

Ruby Gnuplot

Machine Learning

AI Related Ruby Extensions

Support Vector Machines in Ruby

Fast Artificial Neural Network library

Blogs about bioinformatics and Ruby

Saaien Tist – Jan Aerts, on bioinformatics and personal productivity

Bioinformatics Zen – Micheal Barton

Be sure to visit the Ruby for Bioinformatics room on FriendFeed for even more Ruby goodness.

Can the energetic value of food be personalized?

Listening to the latest FIB podcast,

What is the nutrient and caloric value in food? Is it an absolute term? … a box of cheerios says 110 calories per serving. Now does each person that eats a serving of cheerios extract 110 calories or are there subtle differences in our caloric harvest?

Dr. Jeffrey Gordon talks about microbial metagenomics. His research looks to make a huge impact on our knowledge of nutrition and dieting. Gut microbial communities have a dynamic influence on host genomes in mice. They sampled obese mouse genomes and found some clear indications that the massive amount of bacterial organisms living inside each of us are impacting our health in way that is mutually beneficial.

I see two big opportunities here. The treatment of malnutrition and obesity should be more focused on the physiological conditions imposed by these micro-communities. Second, the ability to engineer new bacterial components with the complexity and efficiency we are seeing in nature is forward thinking. Mash up detailed human microbe metabolomic data against all the new ocean community microbe genomes and you have quite an exciting study!

Let me further define mash up as to not sound too journalistic.

  • Rebuild those trees, phylogenetics is getting much more interesting
  • Metabolic pathway ontologies better be in order
  • Networks, networks, networks. Signals from microbial genomes are coordinated with our own microphysiology.

Protein Design is inverse Structure Prediction

If we imagine protein folding as the exploration by a human explorer navigating a rugged landscape in search of the lowest elevation point, then how do we describe protein design? If we are still baffled by structure prediction in the ‘midnight zone’ how can we have a bioengineering strategy that is worth the title ‘protein design’?

When does Bioinformatics become Bioengineering? Is protein design really just inverse structure prediction? Does that work to simplify the problem?

I have always looked at the problem from the perspective of predicting ternary structure from primary structure. It almost seems more practical to flip the problem around. Given a structure and function, can we predict the amino acid sequence?

DNA synthesizers are rapidly improving. We still do not understand how a change in the amino acid sequence affects changes in protein structure. Template based modeling is progressing, anything beyond a sequence identity of 30% can be reliably modeled with a good pipeline.

Reliable function prediction is obviously the next milestone, but I think the gap from structure to function is still enormous. How do we standardize function specification? Enzyme functions can be described by the reaction they catalyze or the pathway they are part of. However not all proteins are enzymes. And not all enzymes are proteins. Hormones, transcription factors, membrane receptors all have some characteristic structure that supports their function.

Function prediction needs about 10 years before it can provide the type of infrastructure for robust bioengineering and truly rational drug design.

CryoEM and Comparative Modeling

I read a great paper by Såli et al this week, Refining Protein Structures by Iterative Comparative Modeling and CryoEM Density Fitting.

2BLD

-image taken from viperdb.scripps.edu

I remember Matthew Baker, one of the authors on this paper, spoke at CASP7 last year. This is an exciting application of modeling since CryoEM can provide the structure of large virus assemblies and membranes. Where comparative modeling becomes important is increasing the resolution to pseudo-atom structures so you can really see what’s happening in terms of chemistry. With simulated electron density maps, Sali et al benchmarked a comparative modeling pipeline which added a density fit score to the DOPE potential.

The genetic algorithm runs for 15 hours on a 50 node dual-PIII cluster.