Archive for the 'Bioinformatics' Category

Using Ruby for Bioinformatics Applications

bioruby

When I started working in a bioinformatics research lab I quickly discovered the wonderful dynamic language that is Perl. I’ve spent a couple of years with Mastering Perl for Bioinformatics somewhere on or around my desk. Perl itself was designed with text-processing and reporting in mind so naturally it’s become widely used when handling biological data.

So everything bioinformatics should be coded in Perl, right? A couple of years ago I might have agreed, but now I feel differently. My first “Perl, I’m leaving you.” moment came when I discovered the way that Rails does web programming. Ruby is the magic in Rails, but I soon discovered Ruby goes much beyond web frameworks. To quote Ezra:

“I came for the Rails, but I stayed for the Ruby”

I wanted to compile some links to show how an active community is positioning Ruby to be a powerful language for bioinformatics programming:

BioRuby - open source bioinformatics library

Web Frameworks

Rails

Merb

Camping

Sinatra

Distributed/Parallel Computing

DRb- Distributed Ruby

SkyNet- Map Reduce in Ruby

Rinda

rxgrid- Xgrid batch language

MPI Ruby

amazon-ec2

Testing/Spec

RSpecBDD framework

Test::Unit

Integration with other programming languages

JRuby

SWIG and Ruby

Ruby C extensions

Math/Statistics

RSRuby- R statistics package in Ruby

Ruby NArray- similar to NumPy

Visualization/Graphics

Ruby-Processing

ruby-opengl

Gruff - Graph API

Ruby-SVG

Ruby Gnuplot

Machine Learning

Support Vector Machines in Ruby

Fast Artificial Neural Network library

[To be continued]

Can the energetic value of food be personalized?

Listening to the latest FIB podcast,

What is the nutrient and caloric value in food? Is it an absolute term? … a box of cheerios says 110 calories per serving. Now does each person that eats a serving of cheerios extract 110 calories or are there subtle differences in our caloric harvest?

Dr. Jeffrey Gordon talks about microbial metagenomics. His research looks to make a huge impact on our knowledge of nutrition and dieting. Gut microbial communities have a dynamic influence on host genomes in mice. They sampled obese mouse genomes and found some clear indications that the massive amount of bacterial organisms living inside each of us are impacting our health in way that is mutually beneficial.

I see two big opportunities here. The treatment of malnutrition and obesity should be more focused on the physiological conditions imposed by these micro-communities. Second, the ability to engineer new bacterial components with the complexity and efficiency we are seeing in nature is forward thinking. Mash up detailed human microbe metabolomic data against all the new ocean community microbe genomes and you have quite an exciting study!

Let me further define mash up as to not sound too journalistic.

  • Rebuild those trees, phylogenetics is getting much more interesting
  • Metabolic pathway ontologies better be in order
  • Networks, networks, networks. Signals from microbial genomes are coordinated with our own microphysiology.

Protein Design is inverse Structure Prediction

If we imagine protein folding as the exploration by a human explorer navigating a rugged landscape in search of the lowest elevation point, then how do we describe protein design? If we are still baffled by structure prediction in the ‘midnight zone’ how can we have a bioengineering strategy that is worth the title ‘protein design’?

When does Bioinformatics become Bioengineering? Is protein design really just inverse structure prediction? Does that work to simplify the problem?

I have always looked at the problem from the perspective of predicting ternary structure from primary structure. It almost seems more practical to flip the problem around. Given a structure and function, can we predict the amino acid sequence?

DNA synthesizers are rapidly improving. We still do not understand how a change in the amino acid sequence affects changes in protein structure. Template based modeling is progressing, anything beyond a sequence identity of 30% can be reliably modeled with a good pipeline.

Reliable function prediction is obviously the next milestone, but I think the gap from structure to function is still enormous. How do we standardize function specification? Enzyme functions can be described by the reaction they catalyze or the pathway they are part of. However not all proteins are enzymes. And not all enzymes are proteins. Hormones, transcription factors, membrane receptors all have some characteristic structure that supports their function.

Function prediction needs about 10 years before it can provide the type of infrastructure for robust bioengineering and truly rational drug design.

CryoEM and Comparative Modeling

I read a great paper by Såli et al this week, Refining Protein Structures by Iterative Comparative Modeling and CryoEM Density Fitting.

2BLD

-image taken from viperdb.scripps.edu

I remember Matthew Baker, one of the authors on this paper, spoke at CASP7 last year. This is an exciting application of modeling since CryoEM can provide the structure of large virus assemblies and membranes. Where comparative modeling becomes important is increasing the resolution to pseudo-atom structures so you can really see what’s happening in terms of chemistry. With simulated electron density maps, Sali et al benchmarked a comparative modeling pipeline which added a density fit score to the DOPE potential.

The genetic algorithm runs for 15 hours on a 50 node dual-PIII cluster.