
When I started working in a bioinformatics research lab I quickly discovered the wonderful dynamic language that is Perl. I’ve spent a couple of years with Mastering Perl for Bioinformatics somewhere on or around my desk. Perl itself was designed with text-processing and reporting in mind so naturally it’s become widely used when handling biological data.
So everything bioinformatics should be coded in Perl, right? A couple of years ago I might have agreed, but now I feel differently. My first “Perl, I’m leaving you.” moment came when I discovered the way that Rails does web programming. Ruby is the magic in Rails, but I soon discovered Ruby goes much beyond web frameworks. To quote Ezra:
“I came for the Rails, but I stayed for the Ruby”
I wanted to compile some links to show how an active community is positioning Ruby to be a powerful language for bioinformatics programming:
BioRuby - open source bioinformatics library
Web Frameworks
Rails
Merb
Camping
Sinatra
Distributed/Parallel Computing
DRb- Distributed Ruby
SkyNet- Map Reduce in Ruby
Rinda
rxgrid- Xgrid batch language
MPI Ruby
amazon-ec2
Testing/Spec
RSpec - BDD framework
Test::Unit
Integration with other programming languages
JRuby
SWIG and Ruby
Ruby C extensions
Math/Statistics
RSRuby- R statistics package in Ruby
Ruby NArray- similar to NumPy
Visualization/Graphics
Ruby-Processing
ruby-opengl
Gruff - Graph API
Ruby-SVG
Ruby Gnuplot
Machine Learning
Support Vector Machines in Ruby
Fast Artificial Neural Network library
[To be continued]
I love me some O’Reilly books. After recently reading Beautiful Code (I should write a review soon). I am eagerly awaiting a new book I just bought from Amazon called Programming Collective Intelligence.

It was written by Toby Segaran, a developer at Genstruct
“This fascinating book demonstrates how you can build web applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you’ve found it.”
Some of the most successful spots on the web are winning on this exact principle. Google, Facebook, Digg, MySpace, Flikr, Twitter, YouTube, and the list goes on and on. As much as I hate the almost cliched phrase Web 2.0, there is clearly a difference in how successful web applications are designed today as opposed to 5 years ago. We need more community based efforts in biology. Educational efforts like Bioscreencast. Or competitions like CASP which I have participated in. The tools are in place. Given the grand challenges to our health and understanding of life, we can no longer afford to work alone.

Google has their annual Summer of Code. Plenty of great projects and extra incentives for students to participate. Including $5,000 per accepted student developer, of which $4,500 goes to the student and $500 goes to the mentoring organization.
A surprising lack of bioinformatics projects though… I only noticed a few:
Phyloinformatics with ideas for GBrowse, AJAX, and improving BioJava.
Michigan’s CSCS is working on running batch simulations with different parameter settings on a grid architecture.
UCSF’s GenMAPP project wants a structured wiki for gene pathways.
That’s the only 3 I found at first glance. Other non-bio projects worth noting are FANN (looking to go parallel/multi-threaded) and there’s even big projects like Apache, MySQL, and PHP.
Some really good things have come from SoC in the past. It looks like this year will be no exception.