Thanks for visiting my blog - I have now moved to a new location at Nature Networks. Url: http://blogs.nature.com/fejes - Please come visit my blog there.

Friday, November 28, 2008

It never rains, but it pours...

Today is a stressful day. Not only do I need to to finish my thesis proposal revisions (which are not insignificant, because my committee wants me to focus more on the biology of cancer), but we're also in the middle of real estate negotiations. Somehow, this is more than my brain can handle on the same day... At least we should know by 2pm if our counter-offer was accepted on the sales portion of the transaction, which would officially trigger the countdown on the purchase portion of the transaction. (Of course, if it's not accepted, then more rounds of offers and counter-offers will probably take place this afternoon. WHEE!)

I'm just dreading the idea of doing my comps the same week as trying to arrange moving companies and insurance - and the million other things that need to be done if the real estate deal happens.

If anyone was wondering why my blog posts have dwindled down this past couple of weeks, well, now you know! If the deal does go through, you probably won't hear much from me for the rest of this year. Some of the key dates this month:
  • Dec 1st: hand in completed and reviewed Thesis Proposal
  • Dec 5th: Sales portion of real estate deal completes.
  • Dec 6th: remove subjects on the purchase, and begin the process of arranging the move
  • Dec 7th: Significant Other goes to Hong Kong for~2 weeks!
  • Dec 12th: Comprehensive exam (9am sharp!)
  • Dec 13th: Start packing 2 houses like a madman!
  • Dec 22nd: Hannukah
  • Dec 24th: Christmas
  • Dec 29th: Completion date on the new house
  • Dec 30th: Moving day
  • Dec 31st: New Years!
And now that I've procrastinated by writing this, it's time to get down to work. I seem to have stuff to do today.

Labels: , , , , , ,

Thursday, November 27, 2008

Should Bioinformatics be on your degree?

An interesting topic came up the other day - Should your specialization be on your graduate degree? Apparently, it's under consideration at my university and the faculty is consulting with staff and students to decide.

Unlike most consultation processes, this one got a lot of "reply all" comments, which showed to two distinct responses, one for, and one against. (That there are two sides to this story shouldn't be a surprise, I hope!)

Those in favour claimed that having a generic M.Sc or Ph.D. really doesn't reflect the value of the work you've done in achieving the degree, and it should reflect the subject area you've contributed to. Since nearly all of the replies I saw were from people in the bioinformatics program, having a M.Sc. (Bioinformatics) or Ph.D. (Bioinformatics) just seems way cooler than a generic degree from the faculty of science. Future employers will look at your publication record, anyhow, not what's on your degree.

On the other hand, those against proposed that Bioinformatics is too new a field and is likely to be swallowed up by other fields in the future - making a Ph.D. (Bioinformatics) more of a liability than an advantage. Equally important, many researchers switch fields several times as they follow their research throughout the course of their career, meaning that the bioinformatics specialization could constrain you as you apply for jobs in the future.

So, what's the answer? I haven't the faintest idea. My masters is pretty darn plain, and no one would have the faintest idea that I did it in microbiology and immunology. I have to admit I was underwhelmed when I saw it for the first time... but when it comes time to apply for jobs, I might be very glad to avoid giving away any pretense that I might know some immunology!

Labels:

Saturday, November 22, 2008

Is medicine ready for 2nd Generation Genomics?

Yesterday was the second day of the 10th Annual B.C. Cancer Conference, which draws in researchers, practitioners and cancer survivors from around BC and the world. It also draws in a lot of drug companies, but that's somewhat besides the point.

One particular part of the conference caught my eye as a must see: the Cancer Genetics Laboratory Open House. This event was a set of posters and hands-on demonstrations for how genetics can be used to help cure cancer. Since that's essentially my thesis project, I figured I absolutely had to attend it.

Unfortunately, I was rather disappointed in the scope of the work they do, though not because they do poor work, but rather that my expectations were too high. For breast cancer, they only screen people who have a family history of breast cancer, and even then they only look for two markers in BRCA1 and BRCA2. That's not a bad thing, though - those two genes make up a significant portion of the hereditary breast cancer risk for women. What surprised me was how reactive the technology was - the lab only screens patients with a high likelihood of carrying the mutant genes, and only patients referred to them by physicians who suspect the familial disease association. Again, this is pretty standard, so there's no criticism meant.

However, what concerns me is whether these people will be ready for the onslaught of information that's about to hit them. Ina couple of years, genetics researchers will be handing off the testing of tens of thousands of simultaneous genes, gene splicing defects, and complete analyses of transcriptomes/genomes, which are currently being done in the lab, as we speak. This is a far cry from doing PCR on two genes to look for SNPs, with a massive technology gap in between.

The Cancer Genetics Lab appears to have 15 doctors and technologists, which is a small staff to support a whole city of several million people, let alone the whole province. I have to wonder if any of them have any experience with 2nd-generation sequencing, seriously high throughput genetic screens, or even the concept of how to do genetic councilling about risk factors in the "whole genome diagnostics" age. Of course, I didn't spend too long at the session, so I don't know if anyone there is an expert, however the few people I spoke to weren't really talking about the upcoming changes to their discipline, so I'm a little skeptical.

In any case, I do have to wonder how this pandoras box of information we're unleashing about the makeup of patient's cell and heredity will effect the downstream medical practitioners, and how well they are prepared to deal with it. Are the seminars to bring these people up to speed on what's coming at them? Are the agencies ready to shell out the money for the infrastructure they'll need? Are the people writing the textbooks that educate these people including chapters on the subject?

It's all nice that I talk about trying to understand how a cancer works in the lab through 2nd-generation sequencing, but I have to wonder what we should be doing in the meantime to prepare them from the firehose of information that we're going to point at them and let loose. The personal genomics revolution is poised to land on these people like a ton of bricks, and with about as much mercy.

Then again, lest we be smug about it, how many of us are writing aligners for SMRT sequencing that's already on the horizon in our own field. Preparing for the future is definitely hard when we're still coping with the present. I'm sure it's no different for the Hospital Genetics Labs, even if they're 15 years behind the cutting edge.

Labels: ,

Wednesday, November 19, 2008

Worlds Easiest Home-Made Pizza Dough

People at work keep asking me for this recipe, so I thought I'd share it. Trust me, this makes the best pizza crust ever, and requires surprisingly little work, compared to most other pizza doughs, which require several hours of rising time.

Ingredients:
1 envelope fast rising active dry yeast
1/4 cup H2O
1+1/4 tsp sugar
1 cup milk
1 tsp salt
1/8tsp baking soda
3 cups all purpose flour
Directions:
  1. dissolve sugar in warm water
  2. add yeast, stir and let stand 10 minutes (till bubbly)
  3. combine milk and salt in saucepan
  4. heat over low-heat until lukewarm (40-46C)
  5. pour milk, baking soda & yeast mixture into large bowl
  6. add 1+1/2 cup flour and mix until smooth (dough will be sticky)
  7. stir in enough flour to make batter stiff
  8. press dough into oiled pizza pan (coat hands with flour so you don't wear the dough!)
  9. brush dough with oil
  10. let rise 10 minutes
  11. top pizza
  12. bake at 400F/200C for 20-25 minutes

I rarely follow the recipe exactly. Here are a few shortcuts and tips:
  • You can just microwave the milk, to speed up step 3.
  • If you use a seasoned pizza stone, you do not need to use any oil in step 8.
  • You can skip oiling the dough in step 9. It's not strictly necessary, but it is nice on the crusts, if small amounts are used.
  • If you're slow about topping the pizza, it'll rise while you top it. Alternately, if you skip this step, you'll get a nice thin crust pizza, which is also decent.
  • You can use brown sugar instead of white sugar, for slightly different flavour.
  • By varying the amount of milk and flour you use, you can make the dough more or less bread-like. This gives you lots of options for deep dish pizza, rolled crusts or thin pizzas (just like Nat's Pizza on Broadway and Stevens - Best pizza in Vancouver.)
  • Rolling out the dough instead of pressing it down will give you a thinner, more even pizza dough. (I find this is much easier, with a pizza stone.)
I've made hundreds of pizza from this recipe, and tried to change pretty much all of the ingredients at one point or another, and yet none of the pizzas have turned out badly. You really can't make a bad pizza crust with this recipe. Bon appetit!

Labels:

Has Apple gone too far?

As a bioinformatician, I enjoy a good looking piece of computer hardware and, for the last few years, the best looking hardware around has been the Apple Macs. I've even thought about buying their new macbooks, although for the same specs, you can pick up a dell on sale at 1/3rd of the price, so it's hardly a good deal. I really can't see myself running anything other than Linux on it, though, so despite the beautiful engineering, I can't see myself paying ~$300 for an OS I'd just remove. (I was even upset at paying ~$50 for a copy of Windows XP with my current laptop. Drop me a line if you want to buy the license - It doesn't even have a Valid EULA... but that's another story.)

Anyhow, I've got to admit, Apple has finally managed to turn me off completely. Check out this article. To paraphrase, Apple has decided to follow suit with Microsoft and Intel in order to prevent you from enjoying the content you own in the way you'd like to use it. In other words, Mac OSX is now claiming control over your media files. (And, I might add, this is not about copyright, because the article shows uses that are clearly restricting "fair use" as well.) DRM is now built right into your hardware, and if your hardware isn't DRM enabled, you can't use it. Ouch.

I feel sorry for those people who have jumped the Microsoft ship just to end up in the Apple camp and are about to discover that Apple doesn't have their best interests at heart either. Why shouldn't you be able show a movie on an external monitor or projector?

In the long run, this is probaby good advertising for GNU/Linux, which doesn't enforce media company greed on it's users. So, if anyone wants a free Ubuntu disk to make their Apple harware work for them instead of against them, here you go.

Labels: , ,

Tuesday, November 18, 2008

Dancing your research.

I heard a new expression the other day, apparently credited to Steve Martin,
"Talking about music is like dancing about architecture."
I laughed for a few minutes, and then realized it wasn't so silly, after all. Dance is a pretty powerful form of communication, even if I personally couldn't communicate anything other than a broken toe through that medium.

Still, not only can you dance about architecture, you can also dance about science. I received an email advertising the second year of the AAAS Science Dance Contest competition, which has just closed. I'll admit I read all about the last year's competition, but I didn't personally watch the tapes. This year, with youtube playing an important role, we'll all be able to judge for ourselves just how effectively dance can be used to describe the universe.

It's just too bad that my research isn't on Honey Bees, or other odd bee shaped things. How do you express a SNP through dance?

Labels:

Bioinformatics Companies

I was working on my poster this afternoon, when I got an email asking me to provide my opinions on certain bioinformatics areas I've blogged on before, in return for an Apple iPod Touch in a survey that would take about half an hour to complete. Considering that ratio of value to time (roughly 44x what I get paid as a graduate student), I took the time to take the survey.

Unfortunately, at the very end of the survey, it told me I wasn't eligible to recieve the iPod. Go figure. Had they told me that first, I probably would have (wisely) spent that half hour on my poster or studying. (Then they told me they'd ship it in 4-6 weeks.... ok, then.)

In any case, the survey asked very targeted questions with multiple choice answers which really didn't encompas the real/full answers to the questions, and questions which were so leading that there really was no way to give the complete answer. (I like boxes to give my opinions... which kind of describes my blog, I suppose - A box into which I write my opinion. Anyhow...) In some ways, I have to wonder if the people who wrote the survey were trying to sell their product, or get feedback on it. Still, it led me to think about bioinformatics applications companies. (Don't worry, this will make sense in the end.)

The first thing you have to notice as a bioinformatics software company is that you have a small audience. A VERY small audience. If microsoft could only sell it's OS to a couple hundred or a thousand labs, how much would it have had to charge to make several billion dollars? (Answer: too much.)

And that's the key issue - bioinformatics applications don't come cheap. To make a profit on a bioinformtics application, you can only do one of four things:
  1. Sell at a high volume
  2. Sell at a high price
  3. Find a way to tie it to something high price, like a custom machine.
  4. Sell a service using the application.
The first is hard to do - there aren't enough bioinformatics labs for that. The second is common, but really alienates the audience. (Too many bioinformaticians believe that a grad student can just build their own tools from scratch cheaper than buying a pre-made and expensive tool, but that's another rant for another day. I'll just say I'm glad it's not a problem in my lab!) The third is good, but buying a custom machine has hidden support costs and in a world where applications get faster all the time, runs the risk of the device becoming obsolete all too fast. The last one is somewhat of a non-starter. Who wants to send their results to a third party for processing? Data ownership issues aside, if the bandwidth isn't expensive enough, the network transfer time usually negates the advantages of doing that.

So that leaves anyone who wants to make a profit in bioinformatics in a tight spot - and I haven't even mentioned the worst part of it yet:

If you are writing proprietary bioinformatics software, odds are, someone's writing a free version of it out there somewhere too. How do you compete against free software, which is often riding on the cutting edge? Software patents are also going to be hard to enforce in the post-bilski legal world, and even if a company managed to sue a piece of software out of existence (e.g. injunctions), someone else will just come along and write their own version. After all, bioinformaticians are generally able to program their own tools, if they need to.

Anyhow, all this was sparked by the survey today, making me want to give the authors of the survey some feedback.
  1. Your audience knows things - give them boxes to fill in to give their opinions. (Even if they don't know things, I'm sure it's entertaining.)
  2. Don't try to lead the respondents to the answers you want - let them give you their opinions. (That can also be paraphrased as "less promotional material, and more opinion asking." Isn't that the point of asking their opinions in the first place?
  3. Make sure your survey works! (The one I did today asked a few questions to test if I was paying attention to what I was reading, and then told me I got the answers wrong, despite confirming that the answer I checked was correct. Oops.)
So how does all of that tie together?

If you ask questions with the only possible answers being the ones you've provided, you're going to convince yourself that the audience and pricing for your product are something that it may not be. Bioinformatics software is a hard field to be successful in - and asking the wrong questions will only make it harder to understand the pitfalls ahead. With pressure on both the business side and the software side, this is not a field in which you can afford to ask the wrong questions.

Labels:

Monday, November 17, 2008

Synergy!

Studying for my comprehensive exam is moving along slowly, rather disrupted by the poster I'm creating for the annual Cancer Conference taking place this week. I'm a little behind, but I'm getting there. Anyhow, I thought I'd take a minute to mention something that's come up several times in conversation this week: Synergy.

This is one of those applications that is an absolute must for bioinformatics students and researchers, or anyone who uses more than one computer. (Don't we all, these days?) I've been using it, myself, for about a year now, and it's one of the most useful applications on my computers.

Synergy is an open source software implementation of a KVM switch. Like a KVM switch, it can be used across operating systems - anything from win95 to XP to OSX to Linux/*nix. It's not even hard to install. The beauty of it is really in its simplicity. Not only can your mouse and keyboard move across your computers, but it also carries a clipboard with it. Cutting and pasting between computers, on its own, is worth it's weight in gold. (though, that probably depends on how much you have on your clipboard...)

Anyhow, just because not everyone is aware of this nifty little tool, I figured I'd mention it. Hopefully it's useful to a few people out there!

Labels:

Thursday, November 13, 2008

AGBT 2009

A couple of quick points:
1. I'll be going to AGBT 2009, and I hope to be blogging it live. Mark it on your calendars: Feb 4-7, 2009, Marco Island. (I'm dying to hear if Pacific Biosciences has more to add to what they presented last year...)

2. Studying for Comprehensive exams is going slowly. Canada post delivered the textbook on sunday (!!), but I've only read 2 chapters.

3. Findpeaks 3.2.2 will be tagged tomorrow. It won't have Controls built in, which will slip to 3.2.3, I suppose. If anyone is interested in testing it out, please let me know!

4. I've heard rumours of a company that's supporting FindPeaks 3.1 for use in academic/industrial labs. If anyone comes across this company, and knows who it is, please let me know! I'd really like to find out. (=

Monday, November 10, 2008

Bowtie and Single End Mapped Paired End Data

Strange title for a posting, but it actually makes sense, if you think about it.

One of the researchers here has been working on an assembly of a reasonably small fungal genome, using short read technology with which I've been somewhat involved. It's been an educational project for me, so I'm glad I had the opportunity to contribute. One of the key elements of the paper, however, was the use of Paired End Tags (PET) from an early Illumina machine to assess the quality of the assembly.

Unfortunately, an early run of the software I'd written to analyze the Maq alignments of the paired ends to the assemblies had a bug, which made it look like the data supported the theory quite nicely - but alas, it was just a bug. The bug was fixed a few weeks ago, and a re-run of the data turned up something interesting:

If you use Maq alignments, you're biasing your alignments towards the smith-waterman aligned sequences, which are not an independent measure of the quality of the assembly. Not shocking? I know - it's pretty obvious.

Unfortunately, we hadn't put it in context before hand, so we had to find another way of doing this. We wanted to get a fairly exhaustive set of alignments for each short read - and we wanted to get it quickly, so we turned to Bowtie. While I wasn't the one running the alignments, I have to admit, I'm impressed with the quick performance of the aligner. Multi-matches work well, the file format is intuitive - similar to an eland file, and the quality of the information seems to be good. (The lack of a good scoring mechanism is a problem, but wasn't vital for this project.)

Anyhow, by performing a Single End Tag style alignment on PET data, while retaining multimatches, we were able to remove the biases of the aligners, and perform the pairings ourselves - learning more about the underlying assembly to which we were aligning. This isn't something you'd want to do often, unless you're assembling genomes fairly regularly, but it's quite cool.

Alas, for the paper, I don't think this data was good enough quality - there may not be a good story in the results that will replace the one we thought we had when the data was first run... but there were other good results in the data, and the journey was worth the trip, even if the destination wasn't all we'd hoped it would be.

As a sidenote, my code was run on an otherwise unused 16-way box, and managed to get 1592% CPU usage, at one point, while taking up 46Gb of RAM. (I should have processed each lane of reads separately, but didn't.) That's the first time I've seen CPU usage that high on my code. My previous record was 1499%. I should note that I only observe these scalings during the input processing - so the overal speed up was only 4x.

The code is part of the Vancouver Short Read Analysis Package, called BowtieToBedFormat.java, if anyone needs a process similar to this for Bowtie Reads. (There's no manual for it at the moment, but I can put one togther, upon request.)

Labels: , , ,

Monday, November 3, 2008

Updates

It has been a VERY busy week since I last wrote. Mainly, that was due to my committee meeting on Friday, where I had to present my thesis proposal. I admit, there were a few things left hanging going into the presentation, but none of them will be hard to correct. As far as topics go for my comprehensive exams, it sounds like the majority of the work I need to do is to shore up my understanding of cancer. With a field that big, though, I have a lot of work to do.

Still, it was encouraging. There's a very good chance I could be wrapping up my PhD in 18-24 months. (=

Things have also been busy at home - we're still working on selling a condo in Vancouver, and had two showings and two open houses over the weekend, and considering the open houses were well attended,that is an encouraging sign.

FindPeaks has also had a busy weekend, even though I wasn't doing any coding, myself. A system upgrade took FindPeaks 3.1.9.2 off the web for a while and required a manual intervention to bring that back up. (Yay for the Systems Dept!) A bug was also found in all versions of 3.1 and 3.2, which could be fairly significant -and I'm still investigating. At this point, I've confirmed the bug, but haven't had a chance to identify if it's just for this one file, or for all files...

Several other findpeaks projects are also going to be coming to the forefront this week. Controls and automated file builds. Despite the bug I mentioned, FindPeaks would do well with an automated trunk build. More users would help me get more feedback, which would help me figure out what things people are using, so I can focus more on them. At least that's the idea. It might also recruit more developers, which would be a good thing.

And, of course, several new things that have appeared that I would like to get going on: Bowtie is the first one. If it does multiple alignments (as it claims to), I'll be giving it a whirl as the new basis of some of my work on transcriptomes. At a rough glance, the predicted 35x speedup compared to Maq is a nifty enticement for me. Then there's the opportunity to do some clean up code on the whole Vancouver package for command line parameter processing. A little work there could unify and clean up several thousand lines of code, and make new development Much easier.

First things first, though, I need to figure out the source and the effects of that bug in findpeaks!

Labels: , , ,