MAQ mapview format. - Updated
Well, I promised a quick update on the MAQ mapview format, after I wrote the interpreter for it, but there isn't much to say.
Key bits of information:
The best information I had was from the MAQ manpage. In a slightly more readable format:
Most strikingly, you'll notice 16 fields are listed above, while the file appears to have 14 fields. It seems not all files have the last two fields. I don't know if it's just the file I have, or if it's usually that way. (Update: Actually, there are normally 16 fields. The files I was given were generated using the mapview -B flag, which strips out some of the information, which I believe are the final two fields. Thus, the comments above reflect the -B flag output only! Thanks to Ryan for catching that!)
If I come across anything else that needs to be added, I'll update this entry.
Key bits of information:
- The file is simply a zipped text file. (Update: this file is not normally gzipped. While the .map file is zipped, the .mapview file is not normally found in the gzipped state.) You can unzip it with 'gunzip' on a linux system.
- Reads are pre-sorted by chromosome, then position.
- The format is 1 based, so if you're using a zero based format, you'll need to convert.
- Starting points are for the "left end", ie, regardless of which strand the sequence aligned to, the matching position with the lowest position is reported.
- Sequences are not contained in this file, but if you go back to the original fastq file you can retrieve the sequence. If you do so, you will need to obtain the reverse compliment of any read that maps to the reverse strand to map to your fasta file sequence. Forward strand sequences will map correctly to the fasta sequence.
- Most of the fields are not useful for any form of analysis, and what's given is mostly incomprehensible.
The best information I had was from the MAQ manpage. In a slightly more readable format:
- read name
- chromosome
- position
- strand
- insert size from the outer coorniates of a pair
- paired flag
- mapping quality
- single-end mapping quality
- alternative mapping quality
- number of mismatches of the best hit
- sum of qualities of mismatched bases of the best hit
- number of 0-mismatch hits of the first 24bp
- number of 1-mismatch hits of the first 24bp on the reference
- length of the read
- read sequence
- quality
Most strikingly, you'll notice 16 fields are listed above, while the file appears to have 14 fields. It seems not all files have the last two fields. I don't know if it's just the file I have, or if it's usually that way. (Update: Actually, there are normally 16 fields. The files I was given were generated using the mapview -B flag, which strips out some of the information, which I believe are the final two fields. Thus, the comments above reflect the -B flag output only! Thanks to Ryan for catching that!)
If I come across anything else that needs to be added, I'll update this entry.
1 Comments:
Thank you Fejes. I was looking for .map documentation. I did not realize it was in MAQ page.
hi1
Post a Comment
<< Home