Archive for the 'Programming' Category

Using Ruby for Bioinformatics Applications

bioruby

When I started working in a bioinformatics research lab I quickly discovered the wonderful dynamic language that is Perl. I’ve spent a couple of years with Mastering Perl for Bioinformatics somewhere on or around my desk. Perl itself was designed with text-processing and reporting in mind so naturally it’s become widely used when handling biological data.

So everything bioinformatics should be coded in Perl, right? A couple of years ago I might have agreed, but now I feel differently. My first “Perl, I’m leaving you.” moment came when I discovered the way that Rails does web programming. Ruby is the magic in Rails, but I soon discovered Ruby goes much beyond web frameworks. To quote Ezra:

“I came for the Rails, but I stayed for the Ruby”

I wanted to compile some links to show how an active community is positioning Ruby to be a powerful language for bioinformatics programming:

BioRuby – open source bioinformatics library

BioRuby on Github

Web Frameworks

Ruby on Rails – the famous MVC framework that made ruby popular

Merb – fast, lightweight MVC framework

Camping – 5k microframework

Sinatra – web development DSL

Ramaze – simple, light, and modular web application framework

Rack – Webserver interface

Distributed/Parallel Computing

DRb- Distributed Ruby

SkyNet- Map Reduce in Ruby

Rinda – Linda parallel programming model in Ruby

rxgrid – Xgrid batch language

MPI Ruby – MPI bindings for Ruby

amazon-ec2 – Amazon EC2 API

Testing/Spec

RSpecBDD framework

Test::Unit – Unit testing in the Ruby standard library

Integration with other programming languages

JRuby – JVM ruby implementation

SWIG and Ruby – automatically generate C interfaces

Ruby C extensions

Math/Statistics

Ruby-GSL – wrapper for the GNU Scientific Library

RSRuby- R statistics package in Ruby

SciRuby

Ruby NArray – similar to NumPy

Visualization/Graphics

Ruby Gnuplot – Gnuplot bindings

Ruby-Processing – The Processing language in Ruby

ruby-opengl – OpenGL bindings

Gruff – Graph API

Ruby-SVG – SVG Graphics

Ruby Gnuplot

Machine Learning

AI Related Ruby Extensions

Support Vector Machines in Ruby

Fast Artificial Neural Network library

Blogs about bioinformatics and Ruby

Saaien Tist – Jan Aerts, on bioinformatics and personal productivity

Bioinformatics Zen – Micheal Barton

Be sure to visit the Ruby for Bioinformatics room on FriendFeed for even more Ruby goodness.

O’Reilly Book: Programming Collective Intelligence

I love me some O’Reilly books. After recently reading Beautiful Code (I should write a review soon). I am eagerly awaiting a new book I just bought from Amazon called Programming Collective Intelligence.

It was written by Toby Segaran, a developer at Genstruct

“This fascinating book demonstrates how you can build web applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you’ve found it.”

Some of the most successful spots on the web are winning on this exact principle. Google, Facebook, Digg, MySpace, Flikr, Twitter, YouTube, and the list goes on and on. As much as I hate the almost cliched phrase Web 2.0, there is clearly a difference in how successful web applications are designed today as opposed to 5 years ago. We need more community based efforts in biology. Educational efforts like Bioscreencast. Or competitions like CASP which I have participated in. The tools are in place. Given the grand challenges to our health and understanding of life, we can no longer afford to work alone.

Google Summer of Code

Google has their annual Summer of Code. Plenty of great projects and extra incentives for students to participate. Including $5,000 per accepted student developer, of which $4,500 goes to the student and $500 goes to the mentoring organization.

A surprising lack of bioinformatics projects though… I only noticed a few:

Phyloinformatics with ideas for GBrowse, AJAX, and improving BioJava.

Michigan’s CSCS is working on running batch simulations with different parameter settings on a grid architecture.

UCSF’s GenMAPP project wants a structured wiki for gene pathways.

That’s the only 3 I found at first glance. Other non-bio projects worth noting are FANN (looking to go parallel/multi-threaded) and there’s even big projects like Apache, MySQL, and PHP.

Some really good things have come from SoC in the past. It looks like this year will be no exception.