Friday, August 22, 2008

One Big Lab has moved!

I will now be posting at my all-purpose blog: - "I was lost but now I live here"

Hope you don't mind a few posts here and there on non-sciencey things, but this move was really for the best (for me, that is).

Please re-subscribe, change your bookmarks, etc etc!

Thursday, August 21, 2008

Got platelets?

If I may divert for a post from our usual programming, this is a call for participation on behalf of a friend of a friend who is in dire need of platelet donors in the Houston area. The blood type doesn't matter, but the number of donors does; the more people who donate on her behalf, the higher on the priority list she goes. If you're in the Houston area, please consider donating! See below for more information.
Please donate! Kathryn Meacham, Patient ID: 754592

Friends & Family,

We are writing because we hope you can help secure or donate blood platelets in the Houston area for our sister/cousin Kathryn (Katie) Meacham. Katie is presently undergoing treatment at MD Anderson for a very aggressive strain of Hodgkin's Lymphoma!!!

Kathryn (Katie) Meacham is 25 years old and was diagnosed with Hodgkin's Lymphoma in April 2008. Katie underwent 3 months of unsuccessful chemo in New York. At that point, Katie and her mom made the difficult decision to move to Houston to undergo treatment at MD Anderson, which is known to have the best treatment available. Her current treatment plan includes a very aggressive chemo followed by a stem cell transplant.

When Katie has her stem cell transplant (in the next 2-3 weeks), she will be in great need of frequent, single donor platelets transfusions. Due to past negative reactions to multi-donor transfusions, single donor platelets are particularly important to Katie and often unavailable at the moment patients need them. We are in desperate need of finding people in the Houston area to give platelet donations for Katie. The more people who donate on her behalf, the higher on the priority list Katie gets. Blood type does NOT matter, the number of people donating does. We cannot overstate the importance of platelet transfusions to her treatment.

If you know anyone in the Houston area, please forward this message on to them and ask them to forward to everyone they know. We need platelet donors and words cannot sufficiently express our gratitude for your assistance and donations!

If you are interested in donating please call/email Lori or Wendy. We are trying to create a list of potential donors so we can contact people once the need arises. Unfortunately platelets have a short shelf life. With this in mind, please do not donate until we coordinate the donation with you to ensure it best helps Katie in her treatment. When you call or email us, please let us know your blood type (if you know it) and the best way to reach you.

Lori Rosen (Katie's sister)
Cell: 773-220-0418
Work: 312-277-1655

Wendy Clarfeld (Katie's cousin)
Cell: 206-375-2655


Call or email us with any questions and thank you for your support!!

Much thanks and love,

Lori Rosen and Wendy Clarfeld

Wendy, Alice & Katie pointing towards Paraguay, Argentina & Brazil

Friday, August 8, 2008

The future of science, gradical change, and tools for the people

Maybe you've felt it - the buzz in a room, the tension in the air, the accelerating pace at which people are connecting and the realization that we're all in this together, even if we don't quite know what "this" is. At least in my small pocket of the world (wide web), something is brewing.

That something is The Future of Science. Michael Nielsen has written about this at length in preparation for his forthcoming book of the same name, with a lively discussion in the comments following. At BioBarCamp this past weekend (many thanks to John Cumbers and Attila Csordas for organizing!), the future of science became a recurring theme, with an impromptu discussion on open science the first day and spirited sessions on open science, web 2.0, the data commons, change in science, science "worship", and redefining "impact" and "failure" the second. Each of these topics could be their own blog series, and, in fact, many of them are. Even if people didn't always agree on the details, it was clear that everyone there (a biased group, inarguably) agreed that change is necessary, and inevitable. The question is, what will that change look like, and how will we get there?

The creators of put forth the following thesis:
Science relies on trust. Trust only remains intact when change occurs through consensus. Change through consensus is inherently gradual. (Therefore change in science must be gradual to succeed.)
Though you could agree or disagree with each statement, there are two things I'd like to discuss in particular. One is the issue of trust. Science relies on trust, right? I would say instead that science could be built on trust, if people weren't so worried about it! The most popular argument made against radical openness in science is based on the fear that other people will not act in good faith, i.e. if you make your lab notebook public, you could get scooped. And yet it is exactly this current climate of secrecy and cutthroat competition that encourages scooping and offers little recourse when it happens. If all research were open, digital, and timestamped, there would be an indisputable record of work and ideas that could be used to argue precedence.

Of course, this all starts to sound a little chicken and egg after a while. How do we assuage the fear of scooping enough for things to get sufficiently open so that scooping really isn't a problem? This brings us to the next point - that change must be gradual. Let me add the session leaders' conclusion to this: "the first step is to create incentives for scientists to voluntarily start doing the same everyday things on the same web platform." I think this is a valuable statement to keep in mind as more and more web 2.0 tools and platforms keep cropping up - that in some sense, the best way to enact satisfied change is to make it beneficial to the individual researcher, and allow them to discover this on their own terms. Scientists are a skeptical lot by training; the fact that they are also generally time-strapped and resource-starved makes them, ironically, reluctant to experiment, at least with the way they do their work. They neither need, nor want, another social networking tool.

The key that some groups have discovered (Labmeeting, Epernicus, and OpenWetWare among them) is to discover what people need, and then build something they will want. For Labmeeting, it is online paper management, for Epernicus it is effective question answering and resource finding (no more wild goose chases looking for someone who can help you with a specific problem), and for OWW it is tools for managing group websites and sharing protocols. Although Epernicus does rely on there being a social/professional network in place, the other two provide services that are useful even if you're the only one using it; the online community therefore can build itself without pressure. And Epernicus along with the others recognizes that in order to be successful among scientists, you need to provide them with something useful. In other words, you need to make tools for the people, rather than tools that need people.

So what about change? How will it happen and when? Well, I'm hoping Michael's book will tell us. ;) But I have a feeling it will be "gradical" - gradual at first, and then...

Wednesday, July 16, 2008

Off to ISMB 2008

Tomorrow I'll be heading off to ISMB in Toronto. I haven't attended the big show before, other than a 1 day SIG 3 years ago in Detroit, so I'm sure it will be enlightening, perhaps for the science, but also for the sheer surreality of packing thousands of normally bunkered down and repressed scientists into a small and contrived space.

I hope I can smuggle my poster tube onto the airplane without them noticing that I also have a carry-on and personal item already. But kind of hard to hide something that's almost 4 feet long... speaking of transporting posters, how cool would it be to have electronic poster boards at conferences? No more poster tubes, little sheets of paper, curled edges, or push pins - just upload your file before you get on the plane, bring a thumb drive just in case, and you're good to go. Sure, they'd be expensive, delicate, heavy, and perhaps prone to glitches, but nothing a few years of tech investment can't fix. - "Make science easier"

Stanford students are nothing if not entrepreneurial. Only a couple months after reporting on Ologeez (out of the Genetics department), I receive an email about, the product of a group of students in Vijay Pande's lab, headed by a graduate student in the Physics department created by three folks who knew each other from their undergrad days at Harvard, one of whom is now a biophysics PhD student at Stanford.

The website grew out of their desire to "solve some of the organizational problems we've encountered while doing our Ph.Ds", and includes a PDF organizer, a space for labs to share protocols and files, and mechanisms for discovering and recommending papers.

After registering and doing a brief tour, it appears to be a well thought-out and executed entrant into what is starting to become a crowded market for tools that help you organize, search for, and share papers, or tools that help your lab share data and files. Because of the many offerings, however, some with much earlier and more widespread adoption, I'd be surprised if Labmeeting gathers much of a foothold. Then again, it's not claiming to be another OpenWetWare, so if it does what it does well, perhaps those looking for a smaller feature set (and a bit of relief from social networking) will find it just right.

Tuesday, July 1, 2008

Survey of bioinformatics

Michael Barton over at Bioinformatics Zen is collecting responses from those working in the field of bioinformatics to survey the current climate (and projected future) of bioinformatics, with data to be made public and back analysis encouraged. The (fully functional) survey is replicated below, the original can be found here.

Thursday, June 19, 2008

Is there a one stop shop for (good) science videos?

My advisor has recently become enamored with the idea behind SciVee, which is essentially a place where you can view video blurbs of people's research, but is convinced that nothing short of YouTube will catch on. His plot for world science domination is for all of us graduate students to tape ourselves talking about our papers for a few minutes and upload the videos to YouTube, with dramatic increases in paper readership sure to follow.

While it's undeniable that YouTube is by far the most popular video server and thus wins by breadth and pure viewership, one could argue that SciVee and JoVE provide a service by being specific - one features research blurbs, the other features video explanations of protocols and experiments. You can be assured that the videos will be high-quality and reputable, at the very least. But there are also many videos that may not fall under those two categories but are still interesting to scientists, or those interested in science.

The Inner Life of a Cell, various TED talks, and this demonstration of cornstarch physics come to mind as some science-related videos that I've enjoyed recently. Maybe even science humor. What YouTube has going for it is precedence and near monopoly, but quality control is dismal and it can be impossible to find good, engrossing science videos (their "Science and Technology" category is mostly dominated by technology - software, hardware, gaming, etc - and the science offerings are hardly scientific).

What I'd like to see is a way to aggregate high-quality science-related videos, categorized by type (protocol, experiment, research/paper promotion, cinematic, humor, etc - how would one categorize the TED talks?). Because sometimes a video is worth a hundred readings of a paper or protocol, because we're all just curious about so many things, and because we all need a bit of reinspiration every now and again.

Is there anything like this that exists now?

Probably old news by now, but JoVE's got this going for it - published videos will be indexed on PubMed! Will the day come when we have citation rates for videos, and can list number of hits as a bullet point in our CVs?

Wednesday, June 11, 2008

Would open science profit from a non-profit?

In my first foray organizing a formal meeting (the PSB workshop), I've learned that pretty much everything comes down to money. Having a successful meeting requires getting people to attend, and getting people to attend often involves money. Getting money to allow people to attend can even require money (for example, publishing costs for conference proceedings).

One idea Cameron's mentioned a few times during our fundraising discussions is the Open Science Collective, and though he hasn't fully described what this is to me, I get the impression that it would be some abstract entity comprising individuals, organizations, resources, and activities - some larger body that would provide support for open science-related endeavors. At the very least, I think it would provide a means to fund raise separate from any particular event such that we (in the collective sense) could act independently. By this I mean that the OSC could support individuals to go to meetings, sponsor meetings like the PSB workshop, or even organize its own meetings.

The idea hasn't really been taken further yet, with whimsical t-shirt ideas pretty much our only tangible revenue strategy so far. But to give the OSC a step towards reality, perhaps it's time to start talking about building a non-profit. From a quick glance through of one website's guide to starting a non-profit, it appears that we need to create a mission statement, obtain a board of directors, and eventually file for either incorporation (plus maybe other things, like tax-exempt or tax-deductible status). This is about where I start getting fuzzy on the details, so if anyone has experience with non-profits or foundations (I don't even know what the difference is) please feel free to enlighten me!

At the most basic level, it would be nice to have some external entity with a bank account into which we could funnel funds that we (again, the collective we) could apply towards open science activities. If that t-shirt shop ever sells anything, the proceeds should go into that account. What the best way to accomplish that is, I'm not quite sure. If it is to start a non-profit, then perhaps the first step is to see who else is interested and start drafting a mission statement together?

According to, a mission statement should cover the purpose, the business, the values, and the beneficiaries of the organization; i.e. the what, how, why, and who/where. This can be accomplished in one line but can also be expanded into paragraphs. But let's start small - here's a stab at a one liner mission statement:

"The Open Science Collective is an international and interdisciplinary [non-profit] organization that promotes open exchange and collaboration in science, and provides resources and support for the advancement of open science."

So some concluding questions: Does it make sense to have something like the Open Science Collective? If so, what should it be and how do we get there? What would be the mission statement of the OSC? If not the OSC, what do we need?

Monday, June 9, 2008

Epernicus - 2 steps closer to a science exchange?

This morning I received an email from Mikhail Shapiro inviting me to join Epernicus, a science networking site which he helped launch. He'd visited the blog and (quite rightly) deduced that I would be interested in trying out a web community that aims to do some of the very things discussed here regarding a science exchange. So after the slightest hesitation (so many networking sites, so little time...), I decided to give it a go. I populated my profile (inspiring me to update my CV), uploaded a picture and a shameless plug for the Open Science Collective shop, checked out other people's profiles (has anyone invented the term "profile envy" yet?), clicked on a little link next to the "Launch BenchQ" icon for "What is BenchQ?", and generally muddled around until I ran out of narcissistic things to do.

So far, I'm rather pleased with Epernicus. It has a nice, aesthetically pleasing interface (as do SciLink and Laboratree, which I've mentioned before) and I didn't run into any bugs like others have experienced with other networking sites. When you first sign up and fill out your affiliations, it automatically populates your network with members of Epernicus with the same affiliations - in my case, 2 students in my PhD program and 60 or so people from my university. Once you've signed up, you can create groups, send people messages, and invite others to join - fairly standard social networking fare. The two distinguishing features of Epernicus, however, are Assets and BenchQ.

As part of your Epernicus profile, you can list your "assets" - topics (Phosphoprotein signaling, gene expression analysis, etc), materials (anything from chemicals, cell lines, equipment, and kits), and methods (protocols, algorithms, techniques) in which you can claim knowledge or experience ranging from novice to expert. Through their search bar, you can identify people who know something about your topic of interest. Taking this one step further is BenchQ, their version of a discussion board, which lets you post a question to a person or network. The questions can be geared towards methods ("How do I stain live cells?", "What is the best software for hierarchically clustering microarray data?") but can also be requests like "I need to do assay Y on protein X. Does anyone have purified protein X available?" or "My [equipment] is broken, can I use someone else's this weekend?"

I posed two of the major issues that came up during the science exchange discussion to Mikhail. Issue 1 is Motivation - what is the incentive to help in science? Issue 2 is making connections rapidly and effectively - given the passive nature of BenchQ, were people getting answers to their questions? Issue 1 can be mitigated by having networks of people who are more likely to help each other because they have an actual professional relationship. Regarding issue 2, Mikhail said that he'd seen a number of successful interactions within the MIT and Harvard networks, and that almost every BenchQ question gets answered, which sounds like a great start. He also mentioned that an algorithmic notification targeting system is being developed, sort of like what human supernodes might do, which I'm interested in seeing when it comes out.

One thing Epernicus does not seem to support yet is some kind of "Projects" feature.* This is kind of a loose concept but generally it could be described as a place where you could list projects you work on and include pages with more information or interactive areas for each project. Project management, sort of. A mini-blog type thing could also be a useful feature, though at some point you need to decide what features contribute to the overall scientific mission and which ones distract. If Epernicus is more about science networking and less about the actual collaboration and project management aspects, maybe it's fine for it not to go down that road.

For a site that does go down that road, Laboratree tasks itself as a research management system, so in addition to the shallow networking aspects, it also allows you to upload documents, write blog entries, and create projects with group members, messages, and project-specific blogs.

In my limited experience, I've been most impressed with Laboratree and Epernicus out of the various science networking sites I've seen (though Laboratree does have some bugs of its own - notably the search bar, and empty Help page). They independently provide different aspects of the science exchange idea and do so with style without being over the top. A question now is whether we want to have everything but the kitchen sink in one website? Or do multiple networking sites (like Laboratree for research management and Epernicus for question-answering) serve their different functions? Is it simply an issue of critical mass?

Check out my public profile at Epernicus.

* Update: After posting this, Mikhail pointed out that the "Group" feature does accommodate some project management through document uploads and a "whiteboard". Thanks for the clarification and apologies for the lack of research on my part!

Sunday, May 25, 2008

Science protocols - Recipe for success?

I enjoy cooking and baking, and while I own my share of well-thumbed cookbooks, on a day to day basis I am likely to find my recipes on my favorite cooking websites. There are a number of good ones out there, including Epicurious and Food Network, but the one I go to 99% of the time is AllRecipes.

Why do I prefer this website? For one, the look and feel is inviting, intuitive, and informative. There is no barrier to entry and novice and expert cooks alike will find what they need easily without intimidation or pandering. A nice perk is the ability to search by ingredients, helping you find recipes that will use what you have on hand. But the most important feature is content - the community at Allrecipes is substantial and helpful, not only providing the recipes themselves, but also feedback on the recipes that is often corroborated multiple times. Tips like decreasing the number of eggs or doubling the sauce, roasting at a lower temperature for longer, cutting out the salt, or adding more lime juice can truly be the difference between a successful dish and not.

There has been a lot of discussion on science social networking sites and on whether the promise of "web 2.0" is being delivered in science yet (see David Crotty's post at CSH, Bora's question, and musings at The Scientist over on NN). Some reasons why so-called Science 2.0 hasn't been catching on include the fact that scientists are extremely busy and don't have time to invest in familiarizing themselves with new online networking sites or web tools that have no immediately obvious benefit to them (though some disagree with the claim that scientists are busier than those in other fields). As David points out, much of it boils down to inertia: if we already have a way of doing things, the only way we'll change is if the new way is obviously advantageous and it doesn't take too much effort to adopt it.

It was while reading these related discussions that I started thinking about scientific protocols and how much added benefit could be derived from community content. There are protocol websites out there (OpenWetWare, CSH Protocols, etc) which are a great start, but for the most part these are put up by the original user or published by a journal and rarely generate feedback that could be useful to those looking for a particular protocol - such as slight temperature changes, buffer modifications, or other tweaks that either led to better results or fixed problems. Although it's been a while since I've worked in a wet lab, it seems that a lot of fine optimization goes into a protocol before it produces what is eventually published, and this can often take months to refine.

Given how similar protocols are to recipes, is it that far of a stretch to imagine a protocol version of AllRecipes giving similar benefits? Just as you save time, money, and ingredients by learning from other cooks, you would save time, money, and resources in the lab from other researchers. Granted, this assumes that scientists aren't the type who would say, "What - give other labs a head start by learning from my mistakes? Are you crazy!?" but instead would say, "Think of how much this could help science in general if we all helped each other do experiments more efficiently!" Imagine going to a protocol website, searching by your requirements (protein name, species, type of assay, perhaps), going to the highest rated protocol, and reading a number of reviews that unanimously suggest tweaking one particular step. Or imagine finding the quickest (30-min meals)/most efficient (10 dinners for under $10!)/most popular (95% of people choose this recipe)/best (rated 5 stars by 500 users) protocols for doing X Y or Z as reviewed by scientists like you.

Some might find this kind of crowdsourcing offputting for the scientific domain, others might say it's about time. I know the picture is not so simple, but it just seems silly that we're not benefiting from what other fields (like cooking) have already embraced. Funding is scarce and time is a precious enough resource as it is - why waste both by banging our heads against the same wall others have banged on when we can move forward by finding the door?

I'd be interested to learn if there are actually any protocol websites out there that more fully resemble the types of recipe websites I mention. The solution isn't to create an AllRecipes for protocols (as David mentions in his post) but to provide a service that is useful to scientists and that encourages them to participate. Since the application area is more focused than general science collaboration/networking sites, are the benefits more obvious and will it gain traction more easily?

And for something that is neither here nor there, what is it about science that keeps it from exploiting and embracing the web the way practically everything else has done? (I have inklings but would enjoy hearing others' opinions.)

Thursday, May 22, 2008

Mac hacks for research

As much as I sometimes want to think that Apple is the new Microsoft, I can't deny that they've got something that the evil empire never had - fanatical users who are loyal not because they have to be, but because they truly love Macs. In fact, they love Macs so much that they often devote their free time to developing stunning software applications that range from the quirky and fun (think Delicious Library) to the "how did I ever live without it?" (think Papers). The enormous array of applications available for Macs, unrivaled in their aesthetics, ease of use, and depth of features, serves to reinforce the Mac's reputation as the platform of choice for trendsetting computer users.

It turns out that this is true in the scientific domain as well. Joel Dudley, founder of MacResearch, gave a guest talk for my lab today on a dizzying array of Mac tips, tricks, and software meant to optimize the Mac experience, especially in a scientific research environment. Some of the applications he mentioned looked truly extraordinary, and I thought I'd describe some of the more notable ones here for those interested in getting more out of their Macs.

For the cell and molecular biologists out there, here's a solution for your image processing needs. Macnification is like an extended iPhoto for microscopy. The full feature set looks impressive - you can track experiments, manage metadata, make measurements, create movies, and generate virtual z-slices through multiple images, all in one sleek application. I don't work with microscopy images, but now I wish I did!

For Python programmers wanting to flex their artistic side, NodeBox allows you to create amazingly complex graphics and animations with just a few lines of Python code. NodeBox is free and open source, with plenty of example scripts to get you started. Just looking through their online gallery is enough to get the "what-if" juices flowing.

Graph Sketcher and DataGraph
If you hate pretty much everything about Excel graphs, you might like everything about these two graph programs. Graph Sketcher is for quick, brainstorming type graph drawing - use their simple tools to draw pretty much any abstract relationship in 2D, with or without data. DataGraph is more powerful and meant to plot large volumes of data. The defaults start out fairly aesthetically pleasing, but there are many many ways to tweak the look of graphs, add or switch data, add additional axes, and plot multiple dimensions simultaneously. Both applications export to PDF for high-resolution figures, with DataGraph allowing export to vector-based formats as well for use in publications.

In addition to these, the Omni group has a suite of applications for boosting productivity, managing information, and drawing high-quality graphics (much more easily than the impossibly hard to use Adobe Illustrator); Journler is a great Mail-like program for organizing notes such as your lab notebook; and, of course, Papers is a must for anyone who reads scientific papers on a regular basis.

Be sure to check out MacResearch for more innovative applications geared especially towards science and research.

Wednesday, May 21, 2008

Open Science at PSB - deadline approaching!

The initial deadline for proposals for the first Open Science workshop at PSB is coming up on June 1. We welcome submissions on almost anything related to Open Science - tools, platforms, and resources; applications, first-hand experiences, or case studies; cultural, social, and historical perspectives or studies; Open Access and open source; pretty much anything that will help us get a better picture of how Open Science has developed, where it is now, what's brewing on the horizon, and what's needed going forward. The call for participation has more detailed information on the workshop and submission instructions.

Note that the proposal need not be a fully mature or completely fleshed out abstract - a rough outline of the content of the proposed talk is sufficient. The early deadline is simply for us to get a better idea of what the workshop will look like, and there is ample time to continue refining abstracts thereafter.

There is no selection process for posters; anyone interested in presenting a poster may present. The deadline for submitting a poster abstract is Sept 12; however, early submissions are encouraged so that we may better organize the workshop!

Fellow bloggers and readers - please take a moment to post a short note about our workshop on your own blogs, or send notice of our call for participation to potentially interested friends and colleagues! Thanks in advance. :)

Sunday, May 11, 2008

Echidnas have genomes, too

More t-shirt ideas for fundraising for the "Open Science Collective"...

T-shirt idea #2: If the platypus now has its genome sequenced, shouldn't the echidna, too?

"Fair go, mate" is Aussie slang that apparently translates to "a plea for fair or equitable treatment," according to A twist on "animal rights"?

See it on a shirt on Spreadshirt! It's on sale there, too, but we haven't really launched the official shop yet - it's mostly just to get started and see how things look.

Friday, May 9, 2008

"Worst Result Ever" t-shirts coming soon

You've been there, done that. Spent hours, days, weeks... months?... just to discover that your hypothesis (or "hope-othesis") is completely wrong. Finished a data analysis only to see that what you've just produced can only be described as the Worst. Result. Ever.

But graduate students have better things to do than mope over spilt data - like blog about their bad results, or go on to the next thing and hope history doesn't repeat itself. Inspired by Magda's great idea, I've decided to start a line of t-shirts that will hopefully allow those of us who have ever felt the pain of bad results to laugh a little at our plight - and raise a little money in the process. Yes, that's right. Proudly wear your results on your sleeve - er, chest - and support Open Science at the same time!

We're still in the early stages of brainstorming designs and have yet to put up a shop yet (most likely on CafePress, though other suggestions welcome), but Cameron and I are actively fundraising for the PSB workshop on Open Science and thought t-shirts would be a fun angle.

So here are some initial designs to get the series started. Each one is named after the hapless student who had the pleasure of seeing something very much like it in their own research.

"The Magda" - No correlation

"The Shirley" - No separation

"The Bernie" - No improvement

The back of the t-shirt would be something simple, possibly one of the following:

If you have your own worst result that you'd like to contribute to our cause, feel free to send them to me: shwu19 at stanford dot edu. We are also planning to launch a series of designs reflecting the frustration that is thesis writing. Suggestions and comments of course welcome!

Obviously, we don't expect to raise a significant amount of funding through t-shirts, so if you're interested in contributing more directly to the Open Science workshop, please do contact Cameron or myself. We also encourage everyone interested in Open Science to make it out to Hawaii to participate. :)

Friday, April 18, 2008

Call for collaboration: calcium site predictions in need of validation

Time to walk the walk? ;)

I work in a bioinformatics lab, and one of our major projects is protein function modeling and prediction from structure. This means that we often come up with predictions, but have little in the way of experimental validation. A small project done by a post-doc in the lab is looking like it could turn into a paper, and what could really give it the juice it (and many bioinformatics papers) needs to target a top tier journal would be validation in a living system.

In brief, we have a list of predictions for potentially novel calcium-binding sites in known calcium-binding proteins (i.e. new sites in addition to the ones already known) that we would like to validate. Probably only 2 or 3 validations would be sufficient. Since these proteins already bind calcium, some kind of quantitative assay on mutant versions of the proteins may be necessary (e.g. protein X normally binds this much calcium, mutate the loop predicted to bind and show that it now binds less).

If you or anyone you know is interested in collaborating with her to validate some of her predictions experimentally, please shoot me an email or respond here. Suggestions welcome, too!

Monday, April 14, 2008

Envisioning the scientific community as One Big Lab

The blogosphere has been abuzz recently, or, at least, it seems that way if you've only been checking up on it sporadically the last few weeks. Jennifer Rohn's post about lab notebooks has spurred over 100 lively comments spanning electronic lab notebooks, peer-review, openness in science, and the reward system in science, making for an engrossing peek at the social science of science. Cameron's own musings on that discussion. Pawel Szczesny writes about what it means to be a freelancing scientist. All of this is fascinating and it is exciting to contemplate both what the future of science holds and the obstacles we will need to overcome; the fact that there are indeed stubborn obstacles (technological as well as cultural) and potentially tremendous rewards makes the anticipation of that future all the more heightened.

Emboldened by the collective fervor, I would like to propose an idea - an idea with the same name as this blog. But first, the back story.

About 8 months ago, one of my lab mates was writing up a short paper for submission to a translational bioinformatics conference. The work she was submitting revolved around a powerful literature-search tool tailored for pharmacogenomics called Pharmspresso. Although Pharmspresso had features lacking in existing search methods and was thus useful, the intent was for it to recognize genes, drugs and polymorphisms in free text, and so she needed a way to evaluate its performance. The evaluation task would be straightforward: given a set of pharmacogenomics papers, what percentage of the mentions of genes, drugs, and polymorphisms does Pharmspresso capture? Getting the list of recognized entities from Pharmspresso would be easy, just give it the documents and set it running. But what would be the gold standard?

Typically, gold standards are created by humans. In this case, it would be the list of entities recognized by human readers with the appropriate knowledge to make the distinctions, in the same set of papers. To get her gold standard then, she essentially asked favors of her colleagues in the lab and the department, which translated to a number of them reading papers and doing data entry during free time (or during faculty talks) at a departmental retreat in early fall - not exactly fun, but done out of a sense of duty to science and the goodness of their hearts.

Afterwards, while socializing during one of the poster sessions, this task came up, and the discussion (in which Samuel Flores, Magda Jonikas, Yael Garten, Alain Laederach, and Bernie Daigle all participated) quickly turned to alternative solutions for tackling this and similar problems in science - those requiring knowledge and resources external to your own. As another example, many bioinformaticians work on problems that produce predictions of functions which would benefit from experimental tests of their validity. Conversely, a wet lab may benefit greatly from someone with computational expertise guiding or leading the data analysis, or even providing the hypotheses for experimental studies (in the form of predictions). This is the stuff from which many collaborations are born, but it may be difficult to find the right people in the first place, or the task at hand might seem not quite collaboration-worthy.

In essence, the problem boils down to this: you or your lab possesses a certain collection of skills, knowledge, and resources (hereafter referred to as simply resources), but your needs may not be fully addressed by what you possess. The solution lies in this simple proposition: some other person or lab has what you're looking for.

While it makes sense for a lab or individual to grow their resources and be mostly self-sufficient, at some point it becomes more economical to outsource certain tasks - to companies for antibody development, software for data analysis, supercomputers for high-throughput computing, etc. In some cases, the exchange takes place directly at the academic level, for example, with some labs maintaining and sharing specific cell lines or mouse strains for use by other researchers, or less directly through the use of published and available tools for all sorts of tasks in bioinformatics. So it would seem that outsourcing is common and accepted. But aside from these sorts of established avenues, what other needs do scientists have in conducting their research that are not easily solved? How often is a line of inquiry abandoned or slowed because of a lack of necessary skills, knowledge, or material resources?

The idea behind One Big Lab is that the scientific community should act as, well, one big lab, sharing resources when it makes sense, and everyone, especially the community as a whole, benefits.

During that discussion at the departmental retreat, the solution boiled down to some form of online transaction service built around a credit system. Scientist X would like 5 gold standard outputs for a certain task, so she posts a description of the task along with some credit attached. Other users can then sign up to complete the task, after which they receive the stated number of credits. Of course, in order to post tasks, you need to have a balance of credits you can draw from - which you earn by doing other people's tasks. Getting credits into the system to start needs to be figured out (give everyone N credits? Money for credits?), but assuming there's some baseline of credit floating around amongst the various users, an equilibrium should eventually be reached (at least, that's the hope).

Variations on this theme are natural - have a peer rating system, have the final credit payment be subject to a bidding system (based somehow on user ratings, e.g. highly rated users can ask for more credits to complete a task and the task-poster may select which user to "hire" based on the user ratings as well as how much each user is asking), have some kind of mechanism for taking transactions "offline" into serious collaborations, etc. Tasks may run the gamut from routine and rote to intellectually stimulating and scientifically rewarding. Obviously, guidelines will have to be set for what transactions may be appropriate for this forum and which ones might be more suited for formal, collaborative relationships - but even here, a forum such as this could be very useful for finding collaborators.

In addition to the scientific transaction system, there could be other features that build on the community aspect, such as journal clubs, informal manuscript review, resources for students, and discussion forums. There could be repositories for knowledge or links to existing ones, informal or formal consulting, and casual exchange of ideas which could stimulate research or professional development. All of this should reinforce the idea that science is strengthened by community and the scientific community should not be held back by insufficient allocation of resources.

Although there are a number of websites out there that tackle some of these aspects, especially the community-building ones, I haven't really seen much resembling the transaction system, which is really the core of the idea. Pawel's freelance science comes close, and what I'd like to see is a formalized community-wide online service for essentially that. Maybe this is technically infeasible right now right the way grants work (it may be difficult to justify spending time or resources on other people's research) or with the way scientists work, but I would like to think that the basic premise - bringing together people with complementary skills and resources - makes sense and balances out in everyone's favor. (Whether this premise actually pans out in practice is up for debate - if we offered credits for cash, would anyone ever do someone else's tasks, or would demand outpace supply? By the same token, there could be "freelance" scientists like Pawel who primarily complete tasks, and could then have the option of "cashing out".) I'm sure there are a ton of tricky legal, IP, financial, organizational, etc not to mention social and cultural issues (would you trust someone you don't know to do work for you?), but I think the idea of having One Big Lab is worth exploring.

If I had the time, skills, and business acumen I would throw together a prototype and work out a business plan, but at the moment the most I can do is outsource it to the closest thing we have to One Big Lab - the blogosphere. ;)

Incidentally, Alain Laederach had come up with a similar idea about a year earlier and we thought about naming it "Experitrade" - an online system for trading experiments, essentially, but the name sounded too corporate and the grant he wrote never got off the ground. But the idea has persisted and inspired One Big Lab.

So, I'd welcome any thoughts, logical extensions, deal-makers or deal-breakers, important issues to consider, "prior art"... does anyone think this idea has legs? Will it work if it is completely altruistic? Does adding money into the equation detract from its mission or the science? What sorts of technical and organizational roadblocks are there? Clearly it makes the most sense, if any prototype is developed, to start small - with a couple participating labs or within a school or university, which helps with the trust issue as well. But I'd like to make sure I'm not completely missing the picture!

Friday, April 11, 2008

New paper-protocol-lab-knowledge sharing website out of Stanford

Stanford PhD student Jason Hoyt in the Department of Genetics was fed up with the inadequate presence of literature resources on the web, specifically good discussion surrounding papers, so he's set out to build his own website that would allow users to post, rate, and discuss papers, in addition to other features. Jason says,

Hey fellow colleagues and grad students. So, about a year and a half ago I got tired of the lack of good discussion around research literature online. For instance, what was the best review paper in the field of a new research project I was about to start? So, I started building a website.

What I ended up with was:
-A citation manager called 'My Libraries' (easily download papers to EndNote)
-A lab database called 'WikiGroups' for any lab in the world
-A protocols database
-A paper search that gives better results than PubMed (this depends on you adding more
-Import papers from PubMed
-Contact or colleague manager called 'Notes'
-A 'My World' page that gathers all the latest from your colleagues, lab group activities
and school seminars.

It's in beta, so please report any bugs or feature requests (form available on all pages).

It's called Ologeez. From the plural of the suffix "-Ology," it refers to every branch of learning. If you find it useful, let other departments or schools know.

After very briefly exploring Ologeez, it seems like a competent addition to the handful of other science oriented resource and knowledge sharing websites currently available. OpenWetWare offers lab websites and shared protocols, but doesn't have literature-oriented resources. PLoS ONE has a journal club feature, but just for PLoS ONE and PLoS doesn't host lab websites or protocols. Laboratree and SciLink offer nice networking and some content management features, but don't support lab websites and literature discussion is indirect at best. Although Ologeez has very few users and entries right now, people may find it useful to be able to set up a lab presence with shared protocols and papers, post and discuss interesting papers, and keep up to date with what their colleagues are doing, all in one website. It includes categories for all branches of science and research, including business/econ, law, and math.

Given its inclusiveness, it has the potential to spread school-wide, though it'll be interesting to see if it catches on enough for the discussion and search features to be useful.

Wednesday, March 26, 2008

March 26 is Document Freedom Day!

Today marks the first observation of Document Freedom Day, from here on out an annual celebration held on the last Wednesday of March.

From the official website:

Document Freedom Day (DFD) is a global day for document liberation. It will be a day of grassroots effort to educate the public about the importance of Free Document Formats and Open Standards in general.

Complementary to Software Freedom Day, we aim to have local teams all over the world organise events on the last Wednesday of March. 2008 is the first year that Document Freedom Day is being called for, and we are looking for people around the world who are willing to join the effort.

DFD's main goals are:

  • promotion and adoption of free document formats
  • forming a global network
  • coordination of activities that happen on 26th of March, Document Freedom Day

Once a year, we will celebrate Document Freedom Day as a global community. Between those days, DFD will be focused on facilitating community action and building awareness for issues of Document Freedom and Open Standards.

Given the work on open data standards, structured data, and open repositories being done by Cameron, Peter MR, and others in the open science community, this is definitely cause for celebration! Unfortunately, the United States seems to be lagging behind other countries in its observance of this holiday (but maybe we'll give it another year).

Thanks to Alain Laederach for the tip!

Monday, March 24, 2008

PSB Open Science workshop - call for participation

The call for participation for the Open Science workshop at PSB 2009 is now up! We welcome anyone with an interest in open science to submit proposals for talks. Note that although space is limited for talks and demos, anyone who registers for the conference can present a poster, so we also encourage poster submissions!

Tuesday, March 18, 2008

Gregory Petsko on "the right to be wrong"

Gregory Petsko expounds eloquently on the "climate of fear" in science in a recent commentary in Genome Biology, titled "The right to be wrong." Drawing a provocative parallel to US politics, he describes how honest, intelligent people willing to admit their (almost always) understandable mistakes are turned on and burned at the stake for by their opponents, accused of lacking integrity and being "flip-floppers" . In science, the attacks are much less direct, but the attitude is still entrenched, and the vast majority of scientists are opting for "safe", incremental, "data gathering/discovery" based research as opposed to bold, hypothesis-driven science. The sentiment is echoed by funding agencies who do not want to risk funding anything that might "fail."

The commentary, although ominous at first, should inspire us all to behave as true scientists should - boldly but carefully, objectively and rationally.

Saturday, March 15, 2008

AMIA Summit on Translational Bioinformatics

Hundreds of clinical scientists, biologists, bioinformaticians, and policy gurus descended on the swanky Intercontinental Mark Hopkins hotel for the first AMIA-sponsored Summit on Translational Bioinformatics last week. Stanford's Atul Butte rallied impressive troops for this inaugural meeting, including the leaders of all of the National Centers for Biomedical Computation (NCBCs, 7 or so total). Since translational bioinformatics is not simply about research, but about translating research into tangible benefits (clinical diagnostics, therapeutics, and standard of care), this meant a many faceted conversation involving basic researchers, large-scale integrative projects (e.g. caBIG, the NCBCs), clinical scientists, informaticians, and government agencies. This was reflected by the structure of the meeting, which consisted of tutorials; policy, technology, and organization panels; primary paper sessions; and posters covering topics ranging from how to establish collaborative projects to ontologies and phenomics.

Given the breadth of the audience, I'm sure the highlights of the conference vary from person to person. Below are some of mine:

Eitan Rubin from Ben Gurion University, Israel (Talk highlight). "Reverse translational bioinformatics: a bioinformatics assay of age, gender and clinical biomarker." A self-proclaimed biologist, Eitan presented some intriguing work in what he called "reverse translational bioinformatics" - using clinical/medical data to make useful discoveries about biology. As an additional aim, he strove to show that existing bioinformatics tools could be applied to clinical data with little modification. To do this, he took an immense data set - thousands of variables collected for tens of thousands of individuals (part of a nutrition and lifestyle survey that was epidemiological in nature), including laboratory tests, questionnaire answers, and medication data - and essentially turned it into a microarray after binning by age. Note that this was a proxy for clinical data since no such data is currently publicly available. He then subjected this array to the same kinds of analyses one would perform on an array of molecular biological data: normalization, calculation of median values, clustering by age and variable. The results encompassed both the expected and the surprising. For example, when he clustered by age, he found distinct boundaries between somewhat intuitive ages - at 12 yrs and 16 yrs for both sexes, at 40 yrs for women and again around 49, and around 45 for men; these could point to interesting biological changes going on at these age boundaries. He also plotted the median values for variables like serum lead level vs age and found distinct patterns. At this point, he has only begun to analyze the enormous amounts of data, and more interesting patterns are sure to emerge. In the meantime, it helps drive home the potential behind open data and data (and methods!) re-use.

Yael Garten from Stanford University (talk highlight). "Pharmspresso: a text analysis tool for linking pharmacogenomic concepts." [Disclaimer: Yael and I are colleagues in the same lab and I helped to critique her presentation.] Yael's work on a semantic, scoped search engine for pharmacogenomics is worth mentioning because of its immediate and potential utility. Pharmspresso allows a user to query a corpus of documents (currently about a thousand pharmacogenomic-related articles previously curated by the PharmGKB team) for keywords, genes, drugs, and/or polymorphisms occurring in the same sentences. Based on the Textpresso ontology created for mining the C.elegans literature, Pharmspresso includes semantic support for human genes, drugs, and genetic polymorphisms and additionally improves upon more general search engines such as Google and PubMed by limiting the scope of the hits to the sentence-level and returning hits color-coded within each sentence for easy evaluation of search results. Pharmspresso has already helped the PharmGKB curators and in the future will be incorporated into an automatic curation pipeline.

Selected papers to be published in BMC Bioinformatics. At the close of the conference, the surprise announcement was made that 15 of the 27 presented papers had been selected to be published in a summer issue of BMC Bioinformatics as a joint agreement between the Open Access journal and AMIA, who would foot the bill. The papers would need to be expanded and updated for submission but the peer review process had happened for the conference and so they were already considered accepted for the journal. A couple of big conferences already do something similar - ISMB/ECCB and RECOMB - but it would be great if every major conference had some kind of arrangement like this with a journal. It seems like it would be a win-win for everyone - peer-review already taken care of, an increased audience for that issue of the journal, and a nice CV boost for the authors (and no more hard decisions between presenting at a conference vs publishing in a journal). Given the fact that this was the very first meeting for this conference, it was a very nice surprise indeed.

Thoughtful A/V setup. This is simply a logistical highlight. We've all sat through our share of technical difficulties, but this conference (at least in the main room) was astonishingly free of them. A large part of this was due to the presence of dedicated A/V staff who knew just when to dim and raise the lights, cue mood music, and put up the "transition screen" - a screen blank except for the AMIA logo. This screen went up whenever a presenter's slides were NOT up, and prevented those awkward moments when the audience could see the desktop of the presenter's laptop or the view of the Powerpoint application. It was also nice not to have to see the blue or black screens when video input was changed. All in all, it imparted a much-appreciated professional touch to the conference which other meetings would do well to emulate.

In summary, there were some informative panels on various policies and the NCBCs, interesting research, and nice extras that made this first Summit on Translational Bioinformatics a big success!

Thursday, March 13, 2008

Help for protein misfolding in foreign vectors?

A friend of mine is getting ready to do some experiments involving purified human proteins expressed in E. coli, and she asked me if I knew anything about protein misfolding - apparently, proteins sometimes misfold when expressed in foreign vectors such as E. coli. Unfortunately, I didn't, but a Google search hit brought up an explanation that's really not that surprising when you think about it, and has to do with the fact that many proteins fold correctly only with the help of chaperone proteins or cofactors. Obviously, this can be a big problem for an experimentalist who wants to get usable amounts of a specific, correctly folded protein.

Does anyone know where to find good information about this problem or have suggestions for how to get around it (with or without changing vectors - I'm not sure if E.coli is a crucial part of the study or not)? The document I linked has some solutions but I'm wondering if there are any resources or "easy" tips out there I can forward along.

Online collaborative manuscript annotation

While at the inaugural AMIA Summit on Translational Bioinformatics the first half of this week (stay tuned for another post summarizing that), I started thinking about some ideas for tools that could help make discussion of papers easier and more productive.

Currently, it seems that there are a few avenues for discussing a paper: 1) have an informal conversation in person, 2) hold a journal club where one person presents the paper and discussion ensues, or 3) blog about it and hope others comment. (You could argue that another avenue exists through some journals - especially open access ones - allowing comments on published articles, but this hasn't caught on as far as I can tell.) There are several disadvantages of the current systems. In-person conversations or journal clubs can be stimulating as they happen, but are transient and usually go unrecorded, resulting in little tangible benefit to others (or often even the participants); they also usually preclude remote participation without some sort of audio-visual setup. Going the blog route allows anyone to participate, but it's difficult to connect the comments back to the paper and the discussion may be less productive than hoped.

A group of students in my human-computer interaction class a few years ago developed an idea called Collaboread for their final project. In essence, it allowed multiple online users to markup a document, enabling collaborative annotation. I'm sure there are several products out there that allow either online markup of documents (Adobe, for one) or collaborative editing (Google Docs), but I haven't seen anything that resembles exactly what I envision.

Suppose you are viewing a document on the screen - maybe a full-text articles at BioMed Central, or a PDF. Clearly, things like web URLs and references should be hyperlinked already. But suppose you could create additional hyperlinks, such as to wikipedia pages, other papers that were not referenced but are relevant, blog posts, etc. You can also start individual discussion threads attached to a particular results, claims, or points made in the paper, or to tables or figures. Mousing over or clicking on the icon indicating such a thread would bring up a summarized view of the thread overlaid on the screen which you could browse more deeply or hide if you decide you're not interested. The idea is to make a richly annotated document that is easy to read but at the same time make it easy to see what other people thought or were confused by and respond if so inclined without too much disruption. When I envision this, I see a Google Maps-like navigation and manipulation style with lots of linked text, little colored balloons at the POIs - the discussion threads, and liberal use of tags to help with filtering and searching of the document and annotations.

A tool like this would be useful not just for journal club-style discussion of papers, but also as a teaching and editing tool. Authors could collaboratively comment on a paper, or learn from others' comments after it is published and made the focus of such a discussion. Readers and students would benefit from the additional linked resources and learn from the discussions how to critique a paper. And the annotated document would be available to everyone long after discussion has tapered off.

Of course, there are potentially many technical, legal, social, *al issues surrounding this, but I think some kind of tool along this vein would be useful and interesting. Does anyone know of any tools that do these things already? If not, I am already looking into what it would take to develop it, and would appreciate tips, suggestions, warnings...

Tuesday, March 11, 2008

PSB proposal accepted for a workshop

The proposal we submitted for a session on Open Science at the Pacific Symposium on Biocomputing was accepted! They notified us yesterday that it would be included as a 3 hour workshop on the first day of the conference. Many thanks to all those who sent encouragement or letters of support - I'm sure it went a long way towards convincing the conference organizers that we are serious!

A call for participation will come out shortly, but just wanted to get the good news out!

Monday, March 3, 2008

Anatomy of a Ph.D. thesis

Let's face it: life is complicated. But thanks to the ever-flourishing DIY industry (for example, WikiHow), a lot of endeavors that used to seem complicated are made much less so through step by step instructions. In science, experimental protocols already do this, at least in theory, but what about other aspects of science, like writing papers, keeping up with literature, making presentations, or networking at conferences? My advisor has given informal talks for his students on a number of these topics, the latest of which was a set of general guidelines for writing a Ph.D. thesis.

In order of their appearance in the final document...
  • Chapter 1 - Introduction. This is essentially an executive summary. You should briefly describe all contributions your thesis makes to your field, provide at least one "gee-whiz" result, and lay out a roadmap for the rest of the thesis ("In Chapter 2, I present the background... In Chapter 3, I discuss my work on X...."). It is acceptable to make claims without proof, since you will be defending these later on.
  • Chapter 2 - Background. This is essentially a literature review, and demonstrates your understanding of the field and the context surrounding your work. For bioinformatics theses, this covers both the biomedical domain and the area of informatics or computation your work involves. You should present an intellectual framework in which your work fits - what has been done, the advantages and limitations of this previous work, the potential avenues for improvement, and where you come in. Ideally, this chapter could be published as a review article with very little modification.
The next couple chapters are the meat of the thesis, and can take at least two forms depending on what kind of work you did during your Ph.D. If you worked on several somewhat disjoint projects and published 2 or 3 papers on them, you can write one chapter for each paper (but no more than 3). If you worked on just one problem, you are probably better off writing a chapter for the methods and a chapter for the results and discussion (if you developed two approaches for the same problem, you can repeat this for the second approach). So:
  • GENERAL THEME: Several projects
    Chapter 3 - Methods, Results, and Discussion from paper 1
    Chapter 4 - Methods, Results, and Discussion from paper 2
    (Chapter 5 - Methods, Results, and Discussion from paper 3, if applicable)
    FOCUSED THEME: Single project

    Chapter 3 - Methods
    Chapter 4 - Results/Discussion
    (Chapters 5 and 6 - Methods and Results/Discussion for approach 2, if applicable)

    In general, you do not want to reuse text from your published papers verbatim, despite how tempting this can be. Papers are very strict and limit what you can express, so you should see your thesis as an opportunity to pontificate and give voice to your ideas. You should also form your thesis into a detailed guide of everything you tried, even some of the things that didn't work, so that it can be a reference to future generations of grad students who may pursue extensions of your research.
  • Chapter 6 or 7, depending on type of thesis - Summary chapter. Describe overall contributions to the relevant domains. (For biomedical informatics theses, describe the overall contributions to biology or medicine, and the overall contributions to informatics or engineering. If applicable, you may also describe core contributions to computer science.) Here is where you also discuss the limitations of the work, the unsolved problems, and your best ideas for how to solve them.

  • Appendices - supplementary material. Almost anything goes, but you should definitely include all key data and datasets (information needed to recreate the major results from your thesis). Ideally, all data relevant to your thesis (and other related work, if possible) will be stored and/or made available either on the web or as a physical copy, though this is mostly for the advisor as a reference for future students. If you have any proofs or supplementary material, these should be in an appendix. You can also include additional work or papers published unrelated to your thesis.
So that explained what each chapter of the thesis should be about; what about actually writing the thesis? My advisor's recommendation is to start with the meat chapters (Ch. 3 - 6/7) since you should have pretty much all the necessary material to begin with, then write Chapter 2, then write the first and last chapters.

More specific advice on how to actually write each chapter was not covered and probably warrants its own post. Note that this is my advisor's take on the Ph.D. thesis; I'm sure there are some other interpretations, which would be interesting to hear! How much does the thesis vary by field?

Wednesday, February 27, 2008

PSB proposal up on Nature Precedings

Our proposal for an Open Science session at PSB is now available on Nature Precedings!

Tuesday, February 26, 2008

Tools for analyzing "lists" in biology

My latest research is focused on cluster/list annotation in biology. Given a cluster or list of genes or proteins that were grouped together using some metric (expression profile, sequence or structure similarity, interactions, etc), how can you discover descriptive terms or labels for that cluster? This seems to be a common question, and yet I've had trouble finding tools that help you do what I am specifically trying to do (investigation of a list of biological entities). I've found many that can give you tons of information for single genes or proteins, which I don't consider that helpful, and a few that can give you information for a group, but these are either organism specific or limited to one or two types of data (e.g. GO terms).

Since I am developing a method to do this based on text, I'd like to be able to compare my method to existing ones that solve the same problem. What I am looking for is two or three available methods that give you information relevant to a list of biological entities from multiple species, at least one of which uses literature or text-mining. Does anyone know of such methods, or have ideas of where to look? Various PubMed and Google searches have failed me!

Unrelated, but also done today: Submitted the PSB proposal to Nature Precedings as per several of your requests. Will update once word is back from their review process.

Monday, February 25, 2008


Sean Mooney and his colleagues at the Indiana University Center for Computational Biology and Bioinformatics Core are developing a new professional networking site called Laboratree. It looks and feels like a more subdued version of Facebook, which is a good thing in my opinion - Facebook is all about Too Much Information and too many annoying alerts and requests, whereas I think (and hope) Laboratree will stay trim and relevant. It also integrates a blog into your account, which is nice for those who don't already keep one. While Laboratree may compete with sites like OpenWetWare in some of its features, it is less of a content management system and more meant to be a networking site. It is still being developed and so has some bugs and a limited user base, but you can sign up for account.


Homework is such a staple of education that most of us take it for granted (or wish it didn't exist). But done the right way, it can make a big difference in how much a student gets out of a class. Multiple choice questions are easy to come up with, answer, and grade, but are they effective at teaching or testing knowledge? I'm sure there have been all sorts of studies on testing in schools, but it'll be interesting to see the results of Rosie Redfield's experiment with her undergrad biology class.

Friday, February 22, 2008

Science journal feedmixes

The topic of literature review came up at a recent group meeting. Our advisor receives a number of print subscriptions to journals, but these often languish in some forgotten corner. Even when they are brought out of the depths, it seems a daunting task to leaf through them to find articles of interest to each student. Since everyone is on the interwebs, it is much easier (and complete) to get updates on relevant articles through a website or email, peruse the titles and blurbs online, and then decide what to actually sit down and read. There are a few problems with this, however.

  1. Getting alerts from journals, search engines, or aggregators like Faculty of 1000 still usually produces too many articles to sift through.
  2. To limit the amount of junk you get, you provide keywords - but, if you're like me, you will browse through unlikely articles in Science or Nature or PLoS ONE on a regular basis because they look interesting, so keywords will filter these out.

I've set up a feedmixer for science journals and related information using Feed Digest. It's little more than an aggregator right now so it doesn't really address those problems. If anyone knows of any cool tricks to help sift through the ridiculous amounts of information we're supposed to keep up with, without losing the unexpected gems, I'd love to hear it!

Update: A new tool called Persai claims to learn your preferences through what you accept and what you reject (review on Slate), and filters your feeds accordingly. I'm not sure it will help with issue #2, but it's probably just an irreconciliable trade-off between #1 and #2. Perhaps the solution is to have a couple different pages set up with Persai - narrow ones for specific fields or interests, and broader ones for the science "pleasure reading"!

Review on open source CMS for bioinformatics

An alum of my lab recently published a review on open source content management systems and their uses in bioinformatics.

Tuesday, February 19, 2008

Mostly Open collaboration on Warfarin pharmacogenetics

Warfarin is one of the best known case studies for pharmacogenetics - where variations in an individual's genome affects his or her response to drugs or other substances. Warfarin is an anti-coagulant often prescribed to relieve blood clots that result in embolism, stroke, or heart attacks. It is very effective at the right dosage, however, it is extremely difficult to dose and the wrong dosage can have very dire consequences. After decades of research, it is clear that personal genetics are at the root of the response variability, and warfarin is poised be the first widely used drug to carry an FDA recommendation for dosing based on pharmacogenetic testing.

Despite widespread acknowledgment of warfarin's pharmacogenetic factors, the road to clinical acceptance has been long. Some of the main challenges are devising an effective dosing scheme given pharmacogenetic data and demonstrating improved clinical outcome as a result of using that dosing scheme. It now appears that we are nearing the final push, with the creation of the International Warfarin Pharmacogenetics Consortium (IWPC). Hosted by the PharmGKB, it is a global collaboration between warfarin scientists sharing data and research to study the relationship between individual genetic variability and dosing response. Anyone with paired genotype and clinical data can join the consortium with the condition that their data is made available to other members of the consortium. When the results of the collaboration are published, it will be authored by the IWPC, and the pooled data will be made available to all those with accounts on PharmGKB.

The IWPC is an interesting and important development. Although it is not completely open, it is most certainly open to those with something to contribute, and will be made open to pharmacogenomics researchers on PharmGKB (detailed data requires an account, due to privacy issues surrounding clinical data). It is also an example of a group of researchers working on the same problem (relating warfarin dosing and response to genotypes) coming together in what has the potential to change current medical practice. Pharmacogenomics has been on the brink for so long, and the IWPC may very well provide the breakthrough that it needs to deliver on the promise of translational research.

Disclaimer: The information presented here is my own recollection of a talk given by Russ Altman at the annual Stanford Biomedical Informatics retreat. For detailed information, you are encouraged to email PharmGKB.

Saturday, February 9, 2008

PSB Open Science session proposal submitted!

Thanks, everyone, for the encouraging and helpful comments, for spreading the word, or for writing in with a letter of support. You all helped us craft a very strong proposal, which was submitted yesterday! I'll post an update on that once I hear back, sometime around March 5th.

For those interested, the final version of the document is updated on GoogleDocs here, and a PDF can be downloaded here.

Tuesday, February 5, 2008

Collaboration for change: an example from the entertainment industry

Disclaimer: This post has some political content.

The music industry has a long history of coming together for big projects meant to raise awareness and funds for good causes. Charity concerts, albums, and tours feature artists from all over the musical spectrum (and sometimes related industries, such as movies) to combat AIDS, fight poverty, help disaster victims, or call an end to violence. It may be useful to reflect on some of these as collaborations outside of science.

In addition to collaborative performing, we might stand to learn from the music industry's experiences with sharing. There is still a battle going on over the issue, but I think it's accepted that being more open with your music is the way to survive. Indie bands offer their music for free on their websites, artists make a name for themselves online, and, as Radiohead demonstrated, you can release your album as a user-priced download and benefit. With the explosion of video content that is YouTube, practically everything you'd ever want to see or hear is now free and shared.

Today is Super Tuesday in the US - the day when most state primaries happen, determining the presidential candidates for the election in November. A few days ago, a group of artists released a music video which has been subsequently proliferated on other websites and YouTube. The video endorses Barack Obama in a creative, artistic, and very moving way, bringing a diverse group of musicians and performers together to send a message. They recorded the video in 2 days, and reached millions in less than that.

Even though this example is politically charged, I think it encapsulates the power of openness and sharing as a mechanism for accomplishing something much bigger than one individual could achieve.

Monday, February 4, 2008

Open Students

A brand new blogsite devoted to Open Access sprang up about a week ago. This one is geared towards students and is sponsored by the Scholarly Publishing and Academic Resources Coalition (SPARC) as part of its student outreach campaign. It's called, naturally enough, Open Students.

Thanks to A Blog Around the Clock for the tip.

Sunday, February 3, 2008

Sharing in the news

A news feature in the latest issue of Nature ("Genetics by Numbers") discusses the recent proliferation of genome-wide association studies. The story brings up a couple interesting points relevant to Open Science.

  1. Data sharing is essential. Genome-wide association studies rely on having a lot of data to work with, and collecting the data (through SNP-chips, for example) is still too expensive for most researchers. Even one database of samples may not be enough - only by pooling many such independent sample collections together will conclusive evidence be gathered for variants with modest effects.
  2. Data sharing has its share of problems. The article cited a study that found that researchers sometimes abuse shared data, either by going outside the bounds of the original agreements, or by not accounting for certain aspects of the data relevant to their research question. Data shared does not necessarily mean better or faster science if it is analyzed poorly. This highlights the importance of thinking through your analysis, finding out how the data was collected, and ensuring that the data is appropriately "cleaned up" for your purposes before you analyze it. While there is increased benefit from sharing data, there is also increased responsibility to use the data properly.
    Another potential problem mentioned was that scientists would have fewer incentives to collect new data - and thus science could stagnate a little. I'm not sure how big of a concern this should be - after all, citation is a big incentive, and publishing (and sharing) new data would lead to more citations - but it is a concern I hadn't heard much about before.
  3. Data sharing made "Soviet"? Some US researchers may resent the mandatory policies set by the NIH, and think collaboration and sharing was progressing fine without them. Said one, "I don't want to share my data with anyone because the NIH decides I should, I want to do it because I decide to do it." Perhaps there has been some narcissistic pleasure associated with data sharing (not that there's anything wrong with that), and the NIH mandate is now raining on the parade. It's like doing chores - it can be fun if you don't know you're supposed to do it or take to it naturally (maybe you even get an altruistic kick out of it), but it's a lot less fun when you're told you have to do it or else.
The article as a whole was very interesting to read, but I thought these little tidbits were especially provocative. There was also a mention of how sharing data allowed two groups to make further discoveries, which led to additional, separate publications, which is always nice to see.

Friday, February 1, 2008

Ignorance of the masses - should we worry?

This is the second somewhat negative post I've written on Open Science. You might say I've moved from the Honeymoon phase to the phase where all you can do is judge and criticize and focus on the bad. Let's hope I move on to the mature, balanced, productive phase soon! In any case, I still highly support the concept of Open Science and want to see it grow, but right now I am using this blog to explore both sides.

There are many issues and questions surrounding Open Science which I have been slowly familiarizing myself with over the last few weeks. Some things, like intellectual property rights, privacy, and scooping, are obvious and comprise the bulk of the debate. I started thinking about a different issue related to Open Science recently, mostly inspired by the escalating battle between evolution and Creationism/ID and the comments of former presidential hopeful Mike Huckabee. The following may be more politically charged than appropriate for a blog like this, so consider yourself warned.

It boils down to this: the public is essentially ignorant. What I mean is that most people know a lot about very few topics, and very little about everything else. Most of what they learn about everything else comes from the media. I won't even go into the problems with our education system or the fact that most Americans have a very strange idea of what science is. The problem is that it doesn't take much for a study to be misinterpreted, or science to be misrepresented. Mainstream media will go for the most sensational spin. Think about all those "health" and "wellness" magazines that immediately latch on to and exaggerate the latest studies on coffee, supplements, and compounds in food, regardless of where they were published.

If Open Science is fully realized, bleeding edge scientific research will be at everyone's fingertips. Preliminary results, perhaps before appropriate controls are performed, will be available to people who don't have the training (or desire) to distinguish between rigorously obtained findings and works in progress. Prior to this, the only science accessible to the world outside went through the filter of a peer-review journal (and presumably is already summarized and interpreted in the way that best describes all the data and findings in the entire study). Without a filter, is there more risk for misinterpretation and misrepresentation by those outside the scientific sphere? If so, what precautions can we take to mitigate it?