Archive for the 'Ruby' Category

Using Ruby for Bioinformatics Applications

bioruby

When I started working in a bioinformatics research lab I quickly discovered the wonderful dynamic language that is Perl. I’ve spent a couple of years with Mastering Perl for Bioinformatics somewhere on or around my desk. Perl itself was designed with text-processing and reporting in mind so naturally it’s become widely used when handling biological data.

So everything bioinformatics should be coded in Perl, right? A couple of years ago I might have agreed, but now I feel differently. My first “Perl, I’m leaving you.” moment came when I discovered the way that Rails does web programming. Ruby is the magic in Rails, but I soon discovered Ruby goes much beyond web frameworks. To quote Ezra:

“I came for the Rails, but I stayed for the Ruby”

I wanted to compile some links to show how an active community is positioning Ruby to be a powerful language for bioinformatics programming:

BioRuby - open source bioinformatics library

Web Frameworks

Rails

Merb

Camping

Sinatra

Distributed/Parallel Computing

DRb- Distributed Ruby

SkyNet- Map Reduce in Ruby

Rinda

rxgrid- Xgrid batch language

MPI Ruby

amazon-ec2

Testing/Spec

RSpecBDD framework

Test::Unit

Integration with other programming languages

JRuby

SWIG and Ruby

Ruby C extensions

Math/Statistics

RSRuby- R statistics package in Ruby

Ruby NArray- similar to NumPy

Visualization/Graphics

Ruby-Processing

ruby-opengl

Gruff - Graph API

Ruby-SVG

Ruby Gnuplot

Machine Learning

Support Vector Machines in Ruby

Fast Artificial Neural Network library

[To be continued]

A Pipeline is a Rakefile

Update: Mike over at Bioinformatics Zen has written a more thorough post about organised bioinformatics experiments with examples using Rake and DataMapper. Definitely check that out.


image credit: railsenvy.com

Make and it’s other revisionings tackle the challenging problem of dependency injection which is somewhat analogous to the Strategy pattern. Make is a tried and true Unix utility that does the heavy lifting each time you type “./configure; make && make install” inside a large chunk of open source goodness. Make became such a popular tool because it drastically reduced compilation times for large programs. In compiled languages such as C, each time a source file is changed it needs to be recomplied. Rather than rebuild the entire project everytime the source code is changed, an expert (a C programmer in this case) can specify dependencies so that make will build only the files that change and their dependencies. In that sense, it’s easy to take for granted how powerful a Makefile actually is. Make is an expert system that’s ubiquitous in the Unix world.

A makefile has the basic structure:


	target: dependencies

		command 1

		command 2

	          .

	          .

	          .

		command n

Which brings us to the actual point of this post; how to use Makefiles in bioinformatics. There’s a discussion on nodalpoint from 2007 that calls for the use of `make` more often when programming pipelines. This made perfect sense. In bioinformatics we do pipelines all the time.

Sequence analysis

Blast search -> Multiple sequence alignment -> Phylogenetic analysis

Homology Modeling

Find Template -> Align target-template -> Build model

Molecular Dynamics

Solvate -> Equilibrate -> Simulate -> Analyze

Those aren’t the most detailed examples but hopefully you get the idea. Each step is dependent on the previous step. If one single step takes a lot of computation time, it would be nice to skip that step if it’s already been done. There’s also a benefit to encoding expert knowledge. For example, how do you convert a .fasta sequence file to a .pir sequence file? By specifying a rule, a build system will know what to do everytime is sees a ‘*.fasta’ file in your project.


	%.pir: %.fasta

	./fasta2pir $< $@

But Makefile syntax can be tricky (is that a tab or a space?), and it’s not a full blown programming language by itself. Which is why I fell in love Rake.

Anyone who has tried out Ruby on Rails probably typed something like “rake db:migrate” without realizing what rake is all about. Rake is Ruby Make. Rake was designed to be just like make, but with all the power and flexibility of the Ruby programming language. A Rakefile is simply a set of tasks, which can have one or more dependencies. Unlike make, rake is an internal DSL since it morphs Ruby into a build language without losing it’s utility as a general purpose language.

A simple Rakefile in your bioinformatics project could do something like this:


	task :queryDatabase do

	  puts "Fetched Records"

	end

	task :formatData => :queryDatabase do

	  puts "Converted to XXX format"

	end

	task :createPlot => :formatData do

	  puts "Generated a Figure"

	end	

This says “before I formatData I must queryDatabase”, and “before I createPlot I must formatData”. So as you might expect, when you type:


	$ rake queryDatabase

	Fetched Records

	$ rake formatData

	Fetched Records

	Converted to XXX format

	$ rake createPlot

	Fetched Records

	Converted to XXX format

	Generated a Figure

And our Fasta rule in Rake would look like:


	rule '.pir' => ['.fasta'] do |t|

	  sh “./fasta2pir #{t.source} #{t.name}”

	end 

Pretty cool? Obviously these tasks don’t actually do much other than show how rake resolves dependencies for you, which can be a pretty powerful thing for hacking together a pipeline.

Rake resources:

Around the web 3/7/08

rb-processing

Around the web, week of March 7, 2008

A domain specific language for screencasting

castanaut

Two topics that I have been been reading a lot lately, Domain Specific Languages in Ruby and screencasting have converged to create a very cool little project called Castanaut. I found Castanaut via Peter Cooper of Ruby Inside. Castanaut is essentially a programming language for screencasts. So in castanaut you can write things like this:


launch "Safari", at(10, 10, 800, 600)

type "http://www.inventivelabs.com.au"

hit Enter

pause 2

move to(100, 100)

move to(200, 100)

move to(200, 200)

move to(100, 200)

move to(100, 100)

say "I drew a square!"

Thanks to the flexibility of Ruby, you can write your screenplay as a script and run it to automatically create a screencast. How cool is that? While this might take some of the personal touch away from screencasts, it could also be a powerful tool for those who need to create them in a more systematic way.

Ruby on Grids

Eric Rollins did some interesting tests with Ruby. It should be possible to run parallel tasks effectively with DRb. I am going to do some more playing around with this.

The Ruby fanfare on the net seems to be approaching the “multi-core” crisis and several bloggers are talking about Ruby as a language for parallel environments.

Multicore Hardware and the Future of Ruby
Multi-core hysteria and the thread confusion
News for Week 25/2007
Distributed Ruby Workers on EC2

Scraping Podcast RSS with Ruby

I was using iTunes to listen to podcasts but I hardly ever put them on my ipod. Recently, iTunes wouldn’t let me download every podcast for a channel (ironically, the rubyology channel). So I thought a Ruby command-line utility might be nice.

I also learned some YAML thanks to Rubyology, so why not have a subscriptions.yaml
Continue reading ‘Scraping Podcast RSS with Ruby’