Archive for the 'Programming' Category

Using Ruby for Bioinformatics Applications

bioruby

When I started working in a bioinformatics research lab I quickly discovered the wonderful dynamic language that is Perl. I’ve spent a couple of years with Mastering Perl for Bioinformatics somewhere on or around my desk. Perl itself was designed with text-processing and reporting in mind so naturally it’s become widely used when handling biological data.

So everything bioinformatics should be coded in Perl, right? A couple of years ago I might have agreed, but now I feel differently. My first “Perl, I’m leaving you.” moment came when I discovered the way that Rails does web programming. Ruby is the magic in Rails, but I soon discovered Ruby goes much beyond web frameworks. To quote Ezra:

“I came for the Rails, but I stayed for the Ruby”

I wanted to compile some links to show how an active community is positioning Ruby to be a powerful language for bioinformatics programming:

BioRuby - open source bioinformatics library

Web Frameworks

Rails

Merb

Camping

Sinatra

Distributed/Parallel Computing

DRb- Distributed Ruby

SkyNet- Map Reduce in Ruby

Rinda

rxgrid- Xgrid batch language

MPI Ruby

amazon-ec2

Testing/Spec

RSpecBDD framework

Test::Unit

Integration with other programming languages

JRuby

SWIG and Ruby

Ruby C extensions

Math/Statistics

RSRuby- R statistics package in Ruby

Ruby NArray- similar to NumPy

Visualization/Graphics

Ruby-Processing

ruby-opengl

Gruff - Graph API

Ruby-SVG

Ruby Gnuplot

Machine Learning

Support Vector Machines in Ruby

Fast Artificial Neural Network library

[To be continued]

O’Reilly Book: Programming Collective Intelligence

I love me some O’Reilly books. After recently reading Beautiful Code (I should write a review soon). I am eagerly awaiting a new book I just bought from Amazon called Programming Collective Intelligence.

It was written by Toby Segaran, a developer at Genstruct

“This fascinating book demonstrates how you can build web applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you’ve found it.”

Some of the most successful spots on the web are winning on this exact principle. Google, Facebook, Digg, MySpace, Flikr, Twitter, YouTube, and the list goes on and on. As much as I hate the almost cliched phrase Web 2.0, there is clearly a difference in how successful web applications are designed today as opposed to 5 years ago. We need more community based efforts in biology. Educational efforts like Bioscreencast. Or competitions like CASP which I have participated in. The tools are in place. Given the grand challenges to our health and understanding of life, we can no longer afford to work alone.

Google Summer of Code

Google has their annual Summer of Code. Plenty of great projects and extra incentives for students to participate. Including $5,000 per accepted student developer, of which $4,500 goes to the student and $500 goes to the mentoring organization.

A surprising lack of bioinformatics projects though… I only noticed a few:

Phyloinformatics with ideas for GBrowse, AJAX, and improving BioJava.

Michigan’s CSCS is working on running batch simulations with different parameter settings on a grid architecture.

UCSF’s GenMAPP project wants a structured wiki for gene pathways.

That’s the only 3 I found at first glance. Other non-bio projects worth noting are FANN (looking to go parallel/multi-threaded) and there’s even big projects like Apache, MySQL, and PHP.

Some really good things have come from SoC in the past. It looks like this year will be no exception.