Archive
Tim O’Reilly was commenting on some Google business when he points out that data ownership is crucial to future technology giants.
we see here another demonstration of my Web 2.0 principle that “data is the Intel Inside”, and that many of the future battles between industry giants will be around who owns data, rather than who controls software APIs. In that battle, we’ll see deployed all kinds of techniques to “harness collective intelligence” to build added value databases of various kinds.”
Google now has a free 411 phone service which he suspects is a vehicle for harvesting voice data to train speech recognition and translation software. It makes perfect sense.
This is why I think data mining is an integral part of drug discovery. Can you sell a specific decision support system for therapeutic targets? How much is a proprietary data warehouse of experimental data worth?
The explosive data acquisition rate of modern life sciences is overwhelming our ability to interpret it. Perhaps Google had a good strategy for data integration prior to opening the floodgates of this 411 service. What’s clear is that Google knows information is worth money and they are more than willing to roll out great free services in order to get it from you.
An arstechnica review got me to try the Papers.app for OS X today. Clearly designed with the life sciences in mind, Papers definitely has an iTunes feel. It looks great graphically and has easy and intelligent meta-data management. It has cannotea, built-in pubmed interaction, and ‘Smart Groups’ making reference management a lot easier.
I haven’t figured out how to use hubmed feeds (since hubmed is apparently down as I write this).
Also, to acquire the actual pdf I use the university VPN, which may be as simple as prepending the URL but I fear the authentication is not going to be supported by Papers directly. Hmm…. it will still compliment my previous methods of organizing literature. I would rather see some of the stability issues fixed before adding any new features as I’ve had it crash a handful of times. It’s still in ‘public preview’ phase but this is an app I will continue to use whenever I’m on a Mac.
If we imagine protein folding as the exploration by a human explorer navigating a rugged landscape in search of the lowest elevation point, then how do we describe protein design? If we are still baffled by structure prediction in the ‘midnight zone’ how can we have a bioengineering strategy that is worth the title ‘protein design’?
When does Bioinformatics become Bioengineering? Is protein design really just inverse structure prediction? Does that work to simplify the problem?
I have always looked at the problem from the perspective of predicting ternary structure from primary structure. It almost seems more practical to flip the problem around. Given a structure and function, can we predict the amino acid sequence?
DNA synthesizers are rapidly improving. We still do not understand how a change in the amino acid sequence affects changes in protein structure. Template based modeling is progressing, anything beyond a sequence identity of 30% can be reliably modeled with a good pipeline.
Reliable function prediction is obviously the next milestone, but I think the gap from structure to function is still enormous. How do we standardize function specification? Enzyme functions can be described by the reaction they catalyze or the pathway they are part of. However not all proteins are enzymes. And not all enzymes are proteins. Hormones, transcription factors, membrane receptors all have some characteristic structure that supports their function.
Function prediction needs about 10 years before it can provide the type of infrastructure for robust bioengineering and truly rational drug design.
Ironically printed on a colorful paper flyer….
Describe The Impact of Cyberinfrastructure on Your World
Cyberinfrastructure-
new research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet. In scientific usage, cyberinfrastructure is a technological solution to the problem of efficiently connecting data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge.
Continue reading ‘The Impact of Cyberinfrastructure on Your World’
I read a great paper by Såli et al this week, Refining Protein Structures by Iterative Comparative Modeling and CryoEM Density Fitting.
-image taken from viperdb.scripps.edu
I remember Matthew Baker, one of the authors on this paper, spoke at CASP7 last year. This is an exciting application of modeling since CryoEM can provide the structure of large virus assemblies and membranes. Where comparative modeling becomes important is increasing the resolution to pseudo-atom structures so you can really see what’s happening in terms of chemistry. With simulated electron density maps, Sali et al benchmarked a comparative modeling pipeline which added a density fit score to the DOPE potential.
The genetic algorithm runs for 15 hours on a 50 node dual-PIII cluster.

Google has their annual Summer of Code. Plenty of great projects and extra incentives for students to participate. Including $5,000 per accepted student developer, of which $4,500 goes to the student and $500 goes to the mentoring organization.
A surprising lack of bioinformatics projects though… I only noticed a few:
Phyloinformatics with ideas for GBrowse, AJAX, and improving BioJava.
Michigan’s CSCS is working on running batch simulations with different parameter settings on a grid architecture.
UCSF’s GenMAPP project wants a structured wiki for gene pathways.
That’s the only 3 I found at first glance. Other non-bio projects worth noting are FANN (looking to go parallel/multi-threaded) and there’s even big projects like Apache, MySQL, and PHP.
Some really good things have come from SoC in the past. It looks like this year will be no exception.
I’ve finally started a WordPress blog to ramble about my research, readings, excitations, and frustrations. I intend to discuss science and technology, with a focus on high performance computing in bioinformatics. Expect posts to be sparse at first as I get used to the whole blogging mentality but things should pickup over the next few months. In the meantime I am assembling my favorite bloggers and news feeds in the sidebar so feel free to check them out!