Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Tuesday, June 30, 2009

An interesting converation on bioinformatics business models

Every once in a while, I suddenly remember SeqAnswers.com, and rush over there to see what I've been missing. (My occasional lapses generally coincide with my bi-weekly meetings with my supervisor, an upcoming talk or something of that sort...) SeqAnswers is easily the best resource on Next-Gen sequencing, and I truly enjoy the people that hang out on that forum.

Anyhow, I've been participating in an interesting conversation on the business of bioinformatics and next-gen sequencing. It started off on a question on market research, and then blossomed into a much wider ranging conversation. One re-occurring thread in the discussion is if there are valid bioinformatics business models in which the bioinformatics application is the commodity. I maintain that there aren't but clearly other people disagree.

In the name of encouraging a wider audience to contribute, I thought I'd ask anyone who's reading my blog what they think. Join in here or on the forums.

Cheers!

Labels:

Sunday, June 28, 2009

A sunny day in Vancouver

It's a weekend, so I'm going to stay off topic for a little while longer. People often believe in the stereotype that it rains all the time in Vancouver. Well, that might be true from November-February, but the here summers are fantastic. Here's just a little proof of that. (=



Now, you'll have to excuse me - I have some sunshine to enjoy.

Thursday, June 25, 2009

m-based heirarchy

An IRC friend of mine proposed the following hierarchy of terms for reactiveness and I liked it so much, I figured I'd have to post it here so that I wouldn't forget it.
minimal < minor < mild < moderate < marked < major < maximal

It's not news worthy, but I really liked it and figured other people might get some use out of it. Thanks Jasabella!

And, in case you're wondering, you can find me on Efnet (#chemistry) and Freenode (#bioinformatics). I don't watch the window all the time, but if you say my name, you'll get my attention.

Labels:

Wednesday, June 24, 2009

CSHL: Personal Genomes

For all that I've been ranting about how much I dislike Cold Spring Harbor's policies on blogging (or at least the rumours about how they'll be changing them in the future), I have to admit that they do have the coolest topics for conferences.

I just received an advertisement in the mail for their upcoming "Personal Genomes" conference in September. I'd like to reprint their ad's description (I'm citing fair use here, just in case any one wonders why I feel free to reproduce it.) for anyone who's interested:
"This second meeting builds on last year's presentations showing a significant milestone in human genetics - the first production of "personal genomes." Ultra high through put swequencing strategies have now been used to study more individual genomes - and yet few scientists and even fewer clinical geneticists, are familiar with the implications of this new data. This meeting will address the issues of individual genomes being part of research and routine clinical medicine within the new years."
Far too cool. Here's a link to the web page.

They have applied for funding to partially support postdocs and graduate students, so you'd better start working on that abstract if you're intersted: they're due July 1st.

By the way, the conference runs from Sept 14-17, 2009.

Labels:

Tuesday, June 23, 2009

FindPeaks 4.0

Well, I've finally gotten to it: the tag for FindPeaks 4.0. At this point, I'm more or less satisfied with what made it in to this release: Saturation, Controls, Compares and a whole lot of changes to the underlying machinery. The documentation is still going through some changes, (I have another two flags to add in) and a lot more clarification to do on what some of the parameters actually accomplish, but it's now in a reasonable state.

Despite the milestone, this project is really a constant evolution. I'm already thinking about what should be in the next version (4.1?): Support for SAM/BAM, "peakless peak calling" for regions instead of peaks, a vastly upgraded FindFeatures code and a host of small changes that I had thought weren't worth the effort for this particular release. I'm even considering a GUI, if I can squeeze it in. (If anyone would like to help out on that project, I'd be thrilled to add them to the project!)

At this point, I'm happy to say I'm not aware of any outstanding coding bugs - although I do take it seriously that there is an open bug remarking that the documentation is insufficient. I've been worknig on improving it, and reorganizing the manual, which should be done in the next couple days. Once that's done, I'll jump back into using my code to do some analysis of my own. There are a few really neat things, based on work on my poster, that I'd like to play with. I guess that's what they say about coders: when you write software for yourself, you never lack the motivation to add in one more feature. (=

Labels:

Monday, June 22, 2009

4 Freedoms of Research

I'm going to venture off the beaten track for a few minutes. Ever since the discussion about conference blogging started to take off, I've been thinking about what the rights of scientists really are - and then came to the conclusion that there really aren't any. There is no scientist's manifesto or equivalent oath that scientists take upon receiving their degree. We don't wear the iron ring like engineers, which signifies our commitment to integrity...

So, I figured I should do my little part to fix that. I'd like to propose the following 4 basic freedoms to research, without which science can not flourish.
  1. Freedom to explore new areas
  2. Freedom to share your results
  3. Freedom to access findings from other scientists
  4. Freedom to verify findings from other scientists
Broadly, these rights should be self evident. They are tightly intermingled, and can not be separated from each other:
  • The right to explore new ideas depends on us being able to trust and verify the results of experiments upon which our exploration is based.
  • The right to share information is contingent upon other groups being able to access those results.
  • The purpose of exploring new research opportunities is to share those results with people who can use them to build upon them
  • Being able to verify findings from other groups requires that we have access to their results.
In fact, they are so tightly mingled, that they are a direct consequence of the scientific method itself.
  1. Ask a question that explores a new area
  2. Use your prior knowledge, or access the literature to make a best guess as to what the answer is
  3. Test your result and confirm/verify if your guess matches the outcome
  4. share your results with the community.
(I liked the phrasing on this site) Of course if your question in step 1 is not new, you're performing the verification step.

There are constraints on what we are allowed to do as scientists as well, we have to respect the ethics of the field in which we do our exploring, and we have to respect the fact that ultimately we are responsible to report to the people who fund the work.

However, that's where we start to see problems. To the best of my knowledge, funding sources define the directions science is able to explore. We saw the U.S. restrict funding to science in order to throttle research in various fields (violating Research Freedom #1) for the past 8 years, which was effectively able to completely halt stem cell research, and suppress alternative fuel sources, etc. In the long term, this technique won't work, because the scientists migrate to where the funding is. As the U.S. restores funding to these areas, the science is returning. Unfortunately, it's Canada's turn, with the conservative government (featuring a science minister who doesn't believe in evolution) removing all funding from genomics research. The cycle of ignorance continues.

Moving along, and clearly in a related vein, Freedom #2 is also a problem of funding. Researchers who would like to verify other group's findings (a key responsibility of the basic peer-review process) aren't funded to do this type of work. While admitting my lack of exposure to granting committees, I've never heard of a grant being given to verify someone else's findings. However, this is the basic way by which the scientists are held accountable. If no one can repeat your work, you will have many questions to answer - and yet the funding for ensuring accountability is rarely present.

The real threat to an open scientific community occurs with the last two Freedoms: sharing and access. If we're unable to discuss the developments in our field, or are not even able to gain information on the latest work done, then science will come grinding to a major halt. We'll waste all of our time and money exploring areas that have been exhaustively covered, or worse yet, come to the wrong conclusions about what areas are worth exploring in our ignorance of what's really going on.

Ironically, Freedoms 3 and 4 are the most eroded in the scientific community today. Even considering only the academic world, where freedoms are taken for granted our interaction with the forums for sharing (and accessing) information are horribly stunted:
  • We do not routinely share negative results (causing unnecessary duplication and wasting resources)
  • We must pay to have our results shared in journals (limiting what can be shared)
  • We must pay to access other scientists results in journals (limiting what can be accessed)
It's trivial to think of other examples of how these two freedoms are being eroded. Unfortunately, it's not so easy to think of how to restore these basic rights to science, although there are a few things we can all do to encourage collaboration and sharing of information:
  • Build open source scientific software and collaborate to improve it - reducing duplication of effort
  • Publish in open access journals to help disseminate knowledge and bring down the barriers to access
  • Maintain blogs to help disseminate knowledge that is not publishable
If all scientists took advantage of these tools and opportunities to further collaborative research, I think we'd find a shift away from conferences towards online collaboration and the development of tools favoring faster and more efficient communication. This, in turn, would provide a significant speed up in the generation of ideas and technologies, leading to more efficient and productive research - something I believe all scientists would like to achieve.

To close, I'd like to propose a hypothesis of my own:
By guaranteeing the four freedoms of research, we will be able to accomplish higher quality research, more efficient use of resources and more frequent breakthroughs in science.
Now, all I need to do is to get someone to fund the research to prove this, but first, I'll have to see what I can find in the literature...

Labels: , , , , ,

Wednesday, June 17, 2009

More on conference blogging...

If you've been following along with the debate on conference blogging, you've surely been reading Daniel McArthur's blog, Genetic Future. His latest post on the subject provides a nifty idea: presenters who are ok with their talks being discussed should have an icon in the conference proceedings beside the anouncement of their talks so that members of the audience know it's safe to discuss their work. He even goes so far as to present a few icons that could be used.

On the whole, I'm not opposed to such a scheme - particularly at conference like Cold Spring, where unpublished information is commonly presented and even encouraged by the organizers. However, Cold Spring is one of the few rare venues where the attendance is "open", but the policy on disclosing the information is restricted. It's entirely regulated for journalists, but in the past has not been an issue for scientists. However, if a conference begins to restrict what the scientists are allowed to disclose outside of the meetings, the organizers are really removing themselves from the free and open scientific debate. A conference that does that isn't technically a conference - at best it's a closed door meeting - and the material should explicitly be labeled as confidential.

Assuming that the vast majority of presentations can't be discussed without explicit permission is quite the anathema of science. If you look at the way technology is handled in western society, you'll see a general trend: The patent system is based around the idea of disclosure, copyright is based on the idea of retaining rights after disclosure, and even our publication/peer review system demands full disclosure as the minimum standard. (Well, that plus a wad of cash for most journals...) For most conferences, then, I suggest we use a more fitting model than opting-in to allow disclosure, as proposed by Daniel. Rather, we should provide the opportunity to opt-out.

All presenters should have the option of choosing "I do not want my presentation disclosed." We can even label their presentation with a nice little dohicky that indicates that the material is not for public discussion.

Audience members who attend the talk then agree that they are not allowed to discuss this information after leaving the room. Why operate in half measures? It's either confidential or it's not. Why should we forbid people from discussing it online, and then turn a blind eye to someone reading their notes in front of the non-attending members of their institution?

Hyperbole aside, what we're all after here is a common middle-ground. Science Bloggers don't want to bite the hands of the conference organizers, and I can't really imagine conference organizers not being interested in fostering a healthy discussion. After all, conferences like AGBT have done well because of the buzz that surrounds their organization.

As I said in my last post on the topic, Science does well when the free and open exchange of ideas is allowed to take place, and people presenting at conferences should be aware of why they're presenting. (I leave figuring out those reasons as exercise to the student.)

Lets not throw the blogger out with the bathwater in our haste to find a solution: Conferences are about disclosure and blogs are about communication: aren't we all working towards the same goal?

Labels: , ,

Monday, June 15, 2009

Another day, another result...

I had the urge to just sit down and type out a long rant, but then common sense kicked in and I realized that no one is really interested in yet another graduate student's rant about their project not working. However, it only took a few minutes for me to figure out why it's relevant to the general world - something that's (unfortunately) missing from most grad student projects.

If you follow along with Daniel McArthur's blog, Genetic Future, you may have caught the announcement that Illumina is getting into the personal genome sequencing game. While I can't admit that I was surprised by the news, I will have to admit that I am somewhat skeptical about how it's going to play out.

If your business is using arrays, then you'll have an easy time sorting through the relevance of the known "useful" changes to the genome - there are only a couple hundred or thousand that are relevant at the moment, and several hundred thousand more that might be relevant in the near future. However, when you're sequencing a whole genome, interpretation becomes a lot more difficult.

Since my graduate project is really the analysis of transcriptome sequencing (a subset of genome sequencing), I know firsthand the frustration involved. Indeed, my project was originally focused on identifying changes to the genome common to several cancer cell lines. Unfortunately, this is what brought on my need to rant: there is vastly more going on in the genome than small sequence changes.

We tend to believe blindly what we were taught as the "central paradigm of molecular biology". Genes are copied to mRNA, mRNA is translated to proteins, and the protein goes off to do it's work. However, cells are infinitely more complex than that. Genes can be inactivated by small changes, can be chopped up and spliced together to become inactivated or even deregulated, interference can be run by distally modified sequences, gene splicing can be completely co-opted by inactivating genes we barely even understand yet and desperately over-expressed proteins can be marked for deletion by over-activating garbage collection systems so that they don't have a chance to get where they were needed in the first place. And here we are, looking for single nucleotide variations, which make up a VERY small portion of the information in a Cell.

I don't have the solution, yet, but whatever we do in the future, it's not going to involve $48,000 genome re-sequencing. That information on it's own is pretty useless - we'll have to study expression (WTSS or RNA-Seq, so figure another $30,000), changes to epigenetics (of which there are many histone marks, so figure 30 x $10,000) and even dna methylation (I don't begin to know what this process costs.)

So, yes, while I'm happy to see genome re-sequencing move beyond the confines of array based SNP testing, I'm pretty confident that this isn't the big step forward it might seem. The early adopters might enjoy having a pretty piece of paper that tells them something unique about their DNA, and I don't begrudge it. (In fact, I'd love to have my DNA sequenced, just for the sheer entertainment value.) Still, I don't think we're seeing a revolution in personal genomics - not quite yet. Various experiments have shown we're on the cusp of a major change, but this isn't the tipping point: we're still going to have to wait for real insight into the use of this information.

When Illumina offers a nice toolkit that allows you to get all of the SNVs, changes in expression and full ChIP-Seq analysis - and maybe even a few mutant transcription factor ChIP-Seq experiments thrown in - and all for $48,000, then we'll have a truly revolutionary system.

In the meantime, I think I'll hold out on buying my genome sequence. $48,000 would buy me a couple more weeks in Tahiti, which would currently offer me a LOT more peace of mind. (=

And on that note, I'd better get back to doing the things I do.... new FindPeaks tag, anyone?

Labels: , , , ,

Monday, June 8, 2009

Poster - reprise

In an earlier post, I said I'd eventually get around to putting up a thumbnail of the poster that I presented at the Canadian Institutes of Health Research National Research Poster Competition. (Yes, the word "research" appears twice in that sentence.) After a couple days of being busy with other stuff, I've finally gotten around to it.

I'm also happy to say that the poster was well received, despite the unconventional appearance. It was awarded an Award of Excellence (Silver category) from the judges.

thumbnail of poster

Labels: , ,

Saturday, June 6, 2009

Once more into the breach...

I haven't been able to follow the whole conversation going on with respect to conference blogging, since I'm still away at a conference for another day. Technically, the conference ended a on thursday, but I'm still here visiting with some of the more important people in my life - so that is my excuse.

At any rate, I received an interesting comment from someone posting as "such.ire", to which I wrote a reply. In the name of keeping the argument going (since it is such a fascinating topic), I thought I'd post my reply to the front page. For context, I suggest reading such.ire's comment first:

click here for his comment.

My reply is below:

-------

Hi Such.ire,

I really appreciate your comment - it's a great counter point to what I said, and really emphasizes the fact that this debate will have plenty of nuances, which will undoubted carry this conversation on long after the blogosphere has finished with it.

To rebut a few of your points, however, I should point out that your examples aren't all correct.

Yes, conferences are well within their rights to ask you to sign NDAs as an attendee - or to require that confidentiality is a part of the conference - there is no debate on that point. However, if you attend a conference that is open and does not have an explicit policy, then it really is an open forum, and they do not have the right to retroactively dictate what you can (or can't) do with the information you gathered at the conference.

I think all of us would agree that the boundaries for a conference should be clearly specified at the time of registration.

As for lab talks for your lab members - those are not "public disclosures" in the eye of the law. All of your lab colleagues are bound by the rules that govern your institution, and I would be surprised if your institution hadn't asked you to sign various confidentiality rules or policies about disclosure at the time you joined them.

Department seminars are somewhat different - if they are advertised outside the department to individuals that are not members of the institution, then again, I would suggest they are fair game.

I don't blog departmental talks or RIP talks for that reason. They are not public disclosures of information.

Finally, my last point was not that journalists and bloggers do anything different up front, but that the method of their publishing should have a major impact on how they are treated. Bloggers can make corrections that reach all of their audience members and can update their stories, while journalists can not.

If a conference demands to see the material a journalist publishes up front, it makes sense. If they demand to do the same thing for a blogger, it completely ignores the context of the media in which the communication occurs.

Labels: , ,

Thursday, June 4, 2009

The Rights of Science Blogging

An article recently appeared on scienceweb, in relation to Daniel McArthur's blogging coverage of a conference he attended at Cold Spring Harbor, which has raised a few eyebrows (the related article is here). Cold Spring Harbor has a relatively strict policy for journalists, but it appears that Daniel wasn't constrained by it, since he's not a "journalist", by the narrow definition of the word.  More than half of the advice I've ever received on blogging science conferences comes from Daniel, and I would consider him one of the more experienced and professional of the science bloggers - which makes this whole affair just that much more interesting.  If anyone is taking exception to blogging, Daniel's coverage of an event is guaranteed to be the least offensive, best researched and most professional of the blogs, and hence the least likely to be the one that causes the outcry.

As far as I can tell from the articles, Cold Spring is relatively upset about this whole affair, and is going down the path that many other institutions have chosen: Trying to suppress blogging, instead of embracing it.
Unfortunately, there really very few reasons for this to be an issue - and I thought I'd put forward a few counter-points to those who think science blogging should be restrained.

1.  Public disclosure

Unless the conference organizers have explicitly asked each participant to sign a non-disclosure agreement, the conference contents are considered to be a form of public disclosure.  This is relevant, not because of the potential for people to talk about it is important, but because legally, this is when the clock starts ticking if you intend to profit from your discovery.  In most countries, the first time an invention is disclosed is when you begin to lose rights to an invention - broadly speaking, it often means that you have one year to officially file the patent, or the patent rights to it become void.  Public disclosure can be as simple as emailing your invention in an un-encrypted file, leaving a copy of a document in a public place....  the bar for public disclosure is really quite low.  More crucially, you can lose your rights to patenting things at all if they're disclosed publicly before the patent is filed.

Closer to home, you might have to worry about academic competition.  If you stand up in front of a room and tell everyone what you've just discovered (before you've submitted it), any one can then replicate that experiment and scoop you...  The academic world works on who has published what first - so we already have the built in instinct to keep our work quiet - until we're ready to release it.  (There's another essay in that on open source science, but I'll get to it another day.)  So, when academics stand up in front of an audience, it's always something that's ready to be broadcast to the world.  The fact that it's then being blogged to a larger audience is generally irrelevant at that point.

2.  Content quality

An argument raised by Cold Spring suggests that they are afraid that the material being blogged may not be an accurate reflection of the content of the presentation.  I'm entirely prepared to call B*llsh!t on this point.

Given a journalist with a bachelors degree in general science, possibly a year or two of journalism school and maybe a couple years of experience writing articles and a graduate student with several years of experience tightly focussed on the subject of the conference, who is going to write the more accurate article?

I can't seriously believe that Cold Spring or anyone else would have a quality problem with science blogging - when it's done by scientists with an interest in the field.  More on this in the conclusion.

3. Journalistic control

This one is more iffy to begin with.  Presumably, the conference would like to have tighter control over the journalists who write articles in order to make sure that the content is presented in a manner befitting the institution at which the conference took place.  Frankly, I have a hard time separating this from the last point:  If the quality of the article is good, what right does the institution have to dictate the way it's presented by anyone who attended.  If I sit down over beers with my colleagues and discuss what I saw at the conference, we'd all laugh if a conference organizer tried to censor my conversation.  It's both impossible and violates a right to free speech. (Of course, if you're in russia, or china, that argument might have a completely different meaning, but in North America or Europe, this shouldn't be an issue.)  The fact that I record that conversation and allow free access to it in print or otherwise should not change my right to freely convey my opinions to my colleagues.

Thus, I would argue you can either have a closed conference, or an open conference - you have to pick one or the other, and not hold different attendees to different standards depending on the mode by which they converse with their colleagues.

4. Bloggers are journalists

This is a fine line.  Daniel and I have very different takes on how we interact with the blogosphere.  I tend to publish notes and essays, where Daniel focusses more on news, views and well-researched topic reviews.  (Sorry about the alliteration.)  There is no one format for bloggers, just as there isn't one for journalists. Rather, it's a continuous spectrum of how information is distributed and for journalists to get upset about bloggers in general makes very little sense.  Most bloggers work in the niches where journalists are sparse.  In fact, for most people, the niches are what making blogs interesting.  (I'm certainly not aware of any journalists who work on ChIP-Seq full time, and that is, I suspect the main reason why people read my feeds.)

Despite anything I might have to say on the subject, the final answer will be decided by the courts, who have been working on this particular thorny issue for years.  (Try plugging "are bloggers journalists" into google, and you'll find more nuances to the issue than you might expect.

What it comes down to is that bloggers are generally protected by the same laws that protect journalists, such as the right to keep their sources confidential, and bound by the same limits, such as the ability to be sued for spreading false information.  Responsibility goes hand in hand with accountability.

And, of course, that should be how institutions like Cold Spring Harbor have to address the issue.

Conclusion:

Treating science bloggers the way Cold Spring Harbor treats journalists doesn't make sense.  Specialists talking about a field in the public is something that the community has been trying to encourage for years: greater disclosure, more open dialog and sharing of ideas are the fundamental pillars of western science.  To force the bloggers into the category of the journalists in the world of print magazines is utterly ridiculous: bloggers articles can be updated to fix typos, to adjust the content and to ensure clarity.  Journalists work in a world in which a typo becomes part of the permanent record and misunderstandings can remain in the public mind for decades.   The power to reach a large audience exists - but only bloggers have the ability to go back and make corrections.    Working with bloggers is a far better strategy than working against them.

No matter how you slice it, institutions with a vested interest in a single business model always resist change - and so do those who have not yet come to terms with the advances of technology.  Unfortunately, it sounds like Cold Spring Harbor hasn't yet adapted to the internet age and are trying to fig a square peg into a round hole.  

I'd like to go on the record in support of Daniel McArthur - blogging a conference is an important method of creating dialog in the science community.  We can't all attend each conference, but we shouldn't all be left out of the discussion - and blogs are one important way that that can be achieved.

If Cold Spring Harbor has a problem with Daniel's blog, let them come forward and identify the problem.  Sure, they can ask bloggers to announce their blog urls before the conference - allowing the organizers to follow along and be aware of the reporting, I wouldn't argue against that.  It provides accountability for those blogging the conference - which serious bloggers won't object to - and it allows the bloggers to go forth and engage the community.  

To strangle the communication between conference attendees and their colleagues, however, is to throttle the scientific community itself.  Lets all challenge Cold Spring to do the right thing and adapt with the times, rather than to ask scientists to drop a useful tool just because it's inconvenient and doesn't fit in with the way the conference organizers currently interact with their audience.

Labels: , , ,

Monday, June 1, 2009

New Poster on FindPeaks 4.0

FindPeaks 4.0 will be tagged shortly, and it was the focus of my latest poster.... the one that included all of the comics.

Since I'll be presenting it on wednesday, I figured I may as well post it up here, just in case. You can now find it on the right hand side of the page - I'll probably update this post when I've got a few more minutes, to include a thumbnail.

Labels:

Science Cartoons - 5 (RNA-Seq)

This is the last science cartoon I did for my poster. I was pretty happy with the pictures, although if I were to do it over again, I've learned a few more tricks that I'd have used instead.

Anyhow, my favorite effect on this picture is the "text to path", where you can make any string follow any line - who knew graphic design could be so much fun. It definitely makes for some interesting graphics. I'd definitely use this effect in an RNA folding paper, if I ever got the chance to do another one. (-;

Labels: , ,

Science cartoons - 4 (ChIP-Seq)

I suppose this cartoon doesn't need an introduction...

Labels: ,