A news feature in the latest issue of Nature ("Genetics by Numbers") discusses the recent proliferation of genome-wide association studies. The story brings up a couple interesting points relevant to Open Science.

  1. Data sharing is essential. Genome-wide association studies rely on having a lot of data to work with, and collecting the data (through SNP-chips, for example) is still too expensive for most researchers. Even one database of samples may not be enough - only by pooling many such independent sample collections together will conclusive evidence be gathered for variants with modest effects.
  2. Data sharing has its share of problems. The article cited a study that found that researchers sometimes abuse shared data, either by going outside the bounds of the original agreements, or by not accounting for certain aspects of the data relevant to their research question. Data shared does not necessarily mean better or faster science if it is analyzed poorly. This highlights the importance of thinking through your analysis, finding out how the data was collected, and ensuring that the data is appropriately "cleaned up" for your purposes before you analyze it. While there is increased benefit from sharing data, there is also increased responsibility to use the data properly.
    Another potential problem mentioned was that scientists would have fewer incentives to collect new data - and thus science could stagnate a little. I'm not sure how big of a concern this should be - after all, citation is a big incentive, and publishing (and sharing) new data would lead to more citations - but it is a concern I hadn't heard much about before.
  3. Data sharing made "Soviet"? Some US researchers may resent the mandatory policies set by the NIH, and think collaboration and sharing was progressing fine without them. Said one, "I don't want to share my data with anyone because the NIH decides I should, I want to do it because I decide to do it." Perhaps there has been some narcissistic pleasure associated with data sharing (not that there's anything wrong with that), and the NIH mandate is now raining on the parade. It's like doing chores - it can be fun if you don't know you're supposed to do it or take to it naturally (maybe you even get an altruistic kick out of it), but it's a lot less fun when you're told you have to do it or else.
The article as a whole was very interesting to read, but I thought these little tidbits were especially provocative. There was also a mention of how sharing data allowed two groups to make further discoveries, which led to additional, separate publications, which is always nice to see.

