Wednesday, February 27, 2008

PSB proposal up on Nature Precedings

Our proposal for an Open Science session at PSB is now available on Nature Precedings!

Tuesday, February 26, 2008

Tools for analyzing "lists" in biology

My latest research is focused on cluster/list annotation in biology. Given a cluster or list of genes or proteins that were grouped together using some metric (expression profile, sequence or structure similarity, interactions, etc), how can you discover descriptive terms or labels for that cluster? This seems to be a common question, and yet I've had trouble finding tools that help you do what I am specifically trying to do (investigation of a list of biological entities). I've found many that can give you tons of information for single genes or proteins, which I don't consider that helpful, and a few that can give you information for a group, but these are either organism specific or limited to one or two types of data (e.g. GO terms).

Since I am developing a method to do this based on text, I'd like to be able to compare my method to existing ones that solve the same problem. What I am looking for is two or three available methods that give you information relevant to a list of biological entities from multiple species, at least one of which uses literature or text-mining. Does anyone know of such methods, or have ideas of where to look? Various PubMed and Google searches have failed me!

Unrelated, but also done today: Submitted the PSB proposal to Nature Precedings as per several of your requests. Will update once word is back from their review process.

Monday, February 25, 2008


Sean Mooney and his colleagues at the Indiana University Center for Computational Biology and Bioinformatics Core are developing a new professional networking site called Laboratree. It looks and feels like a more subdued version of Facebook, which is a good thing in my opinion - Facebook is all about Too Much Information and too many annoying alerts and requests, whereas I think (and hope) Laboratree will stay trim and relevant. It also integrates a blog into your account, which is nice for those who don't already keep one. While Laboratree may compete with sites like OpenWetWare in some of its features, it is less of a content management system and more meant to be a networking site. It is still being developed and so has some bugs and a limited user base, but you can sign up for account.


Homework is such a staple of education that most of us take it for granted (or wish it didn't exist). But done the right way, it can make a big difference in how much a student gets out of a class. Multiple choice questions are easy to come up with, answer, and grade, but are they effective at teaching or testing knowledge? I'm sure there have been all sorts of studies on testing in schools, but it'll be interesting to see the results of Rosie Redfield's experiment with her undergrad biology class.

Friday, February 22, 2008

Science journal feedmixes

The topic of literature review came up at a recent group meeting. Our advisor receives a number of print subscriptions to journals, but these often languish in some forgotten corner. Even when they are brought out of the depths, it seems a daunting task to leaf through them to find articles of interest to each student. Since everyone is on the interwebs, it is much easier (and complete) to get updates on relevant articles through a website or email, peruse the titles and blurbs online, and then decide what to actually sit down and read. There are a few problems with this, however.

  1. Getting alerts from journals, search engines, or aggregators like Faculty of 1000 still usually produces too many articles to sift through.
  2. To limit the amount of junk you get, you provide keywords - but, if you're like me, you will browse through unlikely articles in Science or Nature or PLoS ONE on a regular basis because they look interesting, so keywords will filter these out.

I've set up a feedmixer for science journals and related information using Feed Digest. It's little more than an aggregator right now so it doesn't really address those problems. If anyone knows of any cool tricks to help sift through the ridiculous amounts of information we're supposed to keep up with, without losing the unexpected gems, I'd love to hear it!

Update: A new tool called Persai claims to learn your preferences through what you accept and what you reject (review on Slate), and filters your feeds accordingly. I'm not sure it will help with issue #2, but it's probably just an irreconciliable trade-off between #1 and #2. Perhaps the solution is to have a couple different pages set up with Persai - narrow ones for specific fields or interests, and broader ones for the science "pleasure reading"!

Review on open source CMS for bioinformatics

An alum of my lab recently published a review on open source content management systems and their uses in bioinformatics.

Tuesday, February 19, 2008

Mostly Open collaboration on Warfarin pharmacogenetics

Warfarin is one of the best known case studies for pharmacogenetics - where variations in an individual's genome affects his or her response to drugs or other substances. Warfarin is an anti-coagulant often prescribed to relieve blood clots that result in embolism, stroke, or heart attacks. It is very effective at the right dosage, however, it is extremely difficult to dose and the wrong dosage can have very dire consequences. After decades of research, it is clear that personal genetics are at the root of the response variability, and warfarin is poised be the first widely used drug to carry an FDA recommendation for dosing based on pharmacogenetic testing.

Despite widespread acknowledgment of warfarin's pharmacogenetic factors, the road to clinical acceptance has been long. Some of the main challenges are devising an effective dosing scheme given pharmacogenetic data and demonstrating improved clinical outcome as a result of using that dosing scheme. It now appears that we are nearing the final push, with the creation of the International Warfarin Pharmacogenetics Consortium (IWPC). Hosted by the PharmGKB, it is a global collaboration between warfarin scientists sharing data and research to study the relationship between individual genetic variability and dosing response. Anyone with paired genotype and clinical data can join the consortium with the condition that their data is made available to other members of the consortium. When the results of the collaboration are published, it will be authored by the IWPC, and the pooled data will be made available to all those with accounts on PharmGKB.

The IWPC is an interesting and important development. Although it is not completely open, it is most certainly open to those with something to contribute, and will be made open to pharmacogenomics researchers on PharmGKB (detailed data requires an account, due to privacy issues surrounding clinical data). It is also an example of a group of researchers working on the same problem (relating warfarin dosing and response to genotypes) coming together in what has the potential to change current medical practice. Pharmacogenomics has been on the brink for so long, and the IWPC may very well provide the breakthrough that it needs to deliver on the promise of translational research.

Disclaimer: The information presented here is my own recollection of a talk given by Russ Altman at the annual Stanford Biomedical Informatics retreat. For detailed information, you are encouraged to email PharmGKB.

Saturday, February 9, 2008

PSB Open Science session proposal submitted!

Thanks, everyone, for the encouraging and helpful comments, for spreading the word, or for writing in with a letter of support. You all helped us craft a very strong proposal, which was submitted yesterday! I'll post an update on that once I hear back, sometime around March 5th.

For those interested, the final version of the document is updated on GoogleDocs here, and a PDF can be downloaded here.

Tuesday, February 5, 2008

Collaboration for change: an example from the entertainment industry

Disclaimer: This post has some political content.

The music industry has a long history of coming together for big projects meant to raise awareness and funds for good causes. Charity concerts, albums, and tours feature artists from all over the musical spectrum (and sometimes related industries, such as movies) to combat AIDS, fight poverty, help disaster victims, or call an end to violence. It may be useful to reflect on some of these as collaborations outside of science.

In addition to collaborative performing, we might stand to learn from the music industry's experiences with sharing. There is still a battle going on over the issue, but I think it's accepted that being more open with your music is the way to survive. Indie bands offer their music for free on their websites, artists make a name for themselves online, and, as Radiohead demonstrated, you can release your album as a user-priced download and benefit. With the explosion of video content that is YouTube, practically everything you'd ever want to see or hear is now free and shared.

Today is Super Tuesday in the US - the day when most state primaries happen, determining the presidential candidates for the election in November. A few days ago, a group of artists released a music video which has been subsequently proliferated on other websites and YouTube. The video endorses Barack Obama in a creative, artistic, and very moving way, bringing a diverse group of musicians and performers together to send a message. They recorded the video in 2 days, and reached millions in less than that.

Even though this example is politically charged, I think it encapsulates the power of openness and sharing as a mechanism for accomplishing something much bigger than one individual could achieve.

Monday, February 4, 2008

Open Students

A brand new blogsite devoted to Open Access sprang up about a week ago. This one is geared towards students and is sponsored by the Scholarly Publishing and Academic Resources Coalition (SPARC) as part of its student outreach campaign. It's called, naturally enough, Open Students.

Thanks to A Blog Around the Clock for the tip.

Sunday, February 3, 2008

Sharing in the news

A news feature in the latest issue of Nature ("Genetics by Numbers") discusses the recent proliferation of genome-wide association studies. The story brings up a couple interesting points relevant to Open Science.

  1. Data sharing is essential. Genome-wide association studies rely on having a lot of data to work with, and collecting the data (through SNP-chips, for example) is still too expensive for most researchers. Even one database of samples may not be enough - only by pooling many such independent sample collections together will conclusive evidence be gathered for variants with modest effects.
  2. Data sharing has its share of problems. The article cited a study that found that researchers sometimes abuse shared data, either by going outside the bounds of the original agreements, or by not accounting for certain aspects of the data relevant to their research question. Data shared does not necessarily mean better or faster science if it is analyzed poorly. This highlights the importance of thinking through your analysis, finding out how the data was collected, and ensuring that the data is appropriately "cleaned up" for your purposes before you analyze it. While there is increased benefit from sharing data, there is also increased responsibility to use the data properly.
    Another potential problem mentioned was that scientists would have fewer incentives to collect new data - and thus science could stagnate a little. I'm not sure how big of a concern this should be - after all, citation is a big incentive, and publishing (and sharing) new data would lead to more citations - but it is a concern I hadn't heard much about before.
  3. Data sharing made "Soviet"? Some US researchers may resent the mandatory policies set by the NIH, and think collaboration and sharing was progressing fine without them. Said one, "I don't want to share my data with anyone because the NIH decides I should, I want to do it because I decide to do it." Perhaps there has been some narcissistic pleasure associated with data sharing (not that there's anything wrong with that), and the NIH mandate is now raining on the parade. It's like doing chores - it can be fun if you don't know you're supposed to do it or take to it naturally (maybe you even get an altruistic kick out of it), but it's a lot less fun when you're told you have to do it or else.
The article as a whole was very interesting to read, but I thought these little tidbits were especially provocative. There was also a mention of how sharing data allowed two groups to make further discoveries, which led to additional, separate publications, which is always nice to see.

Friday, February 1, 2008

Ignorance of the masses - should we worry?

This is the second somewhat negative post I've written on Open Science. You might say I've moved from the Honeymoon phase to the phase where all you can do is judge and criticize and focus on the bad. Let's hope I move on to the mature, balanced, productive phase soon! In any case, I still highly support the concept of Open Science and want to see it grow, but right now I am using this blog to explore both sides.

There are many issues and questions surrounding Open Science which I have been slowly familiarizing myself with over the last few weeks. Some things, like intellectual property rights, privacy, and scooping, are obvious and comprise the bulk of the debate. I started thinking about a different issue related to Open Science recently, mostly inspired by the escalating battle between evolution and Creationism/ID and the comments of former presidential hopeful Mike Huckabee. The following may be more politically charged than appropriate for a blog like this, so consider yourself warned.

It boils down to this: the public is essentially ignorant. What I mean is that most people know a lot about very few topics, and very little about everything else. Most of what they learn about everything else comes from the media. I won't even go into the problems with our education system or the fact that most Americans have a very strange idea of what science is. The problem is that it doesn't take much for a study to be misinterpreted, or science to be misrepresented. Mainstream media will go for the most sensational spin. Think about all those "health" and "wellness" magazines that immediately latch on to and exaggerate the latest studies on coffee, supplements, and compounds in food, regardless of where they were published.

If Open Science is fully realized, bleeding edge scientific research will be at everyone's fingertips. Preliminary results, perhaps before appropriate controls are performed, will be available to people who don't have the training (or desire) to distinguish between rigorously obtained findings and works in progress. Prior to this, the only science accessible to the world outside went through the filter of a peer-review journal (and presumably is already summarized and interpreted in the way that best describes all the data and findings in the entire study). Without a filter, is there more risk for misinterpretation and misrepresentation by those outside the scientific sphere? If so, what precautions can we take to mitigate it?