Posts Be careful about -p in bwa mem
Post
Cancel

Be careful about -p in bwa mem

If align two PE fatsq, it is simple to do that bwa mem ref.fa r1.fq r2.fq > mem-pe.sam. But in GATK Best Practice pipeline, there is -p parameter in bwa command. According to official description:

bwa mem [-aCHMpP] [-t nThreads] [-k minSeedLen] [-w bandWidth] [-d zDropoff] [-r seedSplitRatio] [-c maxOcc] [-A matchScore] [-B mmPenalty] [-O gapOpenPen] [-E gapExtPen] [-L clipPen] [-U unpairPen] [-R RGline] [-v verboseLevel] db.prefix reads.fq [mates.fq]

If mates.fq file is absent and option -p is not set, this command regards input reads are single-end. If mates.fq is present, this command assumes the i-th read in reads.fq and the i-th read in mates.fq constitute a read pair. If -p is used, the command assumes the 2i-th and the (2i+1)-th read in reads.fq constitute a read pair (such input file is said to be interleaved). In this case, mates.fq is ignored. In the paired-end mode, the mem command will infer the read orientation and the insert size distribution from a batch of reads.

It means that when -p is absent, we can use bwa mem ref.fa r1.fq for SE alignment, and bwa mem ref.fa r1.fq r2.fq for PE alignment, it is easy to understand.

But when -p is presented, only one fastq file will be used, others will be discarded. The 2i-th and (2i+1)-th read should be a read pair. It indicates input fastq file should include information about PE, because GATK Best Practice use one ubam for downstream, so if you use fastq files for alignment, you MUST check the bwa command carefully.

Let’s make this a liitle bit more clear with an example.

1: @A00183:510:HMYH3DSXX:2:1101:26187:26584/1
2: ATGGGTACTGCCAGAAACTGGCATCAGGTGCATGGGAACCAAGCATTCAGGGGAAAGCTTCCTCTAGCTGGCCTTACCAAGGAGTTGAATGACCGCTGCTTGATCCCACAGGTCTCTGTCATTAAGGTGAAGATTCATGAAGCCACAGGC
3: +
4: FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF
5: @A00183:510:HMYH3DSXX:2:1101:26187:26584/2
6: GAGGTGTCAGGACCCAAGGGACAGGTCTGTGGACAAACTGACTCACCTCATACTGTAGCTTCTGTTTCCCTGCAGGCATGCCTGTGGCTTCATGAATCTTCACCTTAATGACAGAGACCTGTGGGATCAAGCAGCGGTCATTCAACTCCT
7: +
8: FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF,FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFF:FFFFFFFFFFFFFFFFFFFFFF
...
...
...
...

In this fastq, we can clearly see that line 1 and 5 is 26584/1 26584/2, it is read pair. So, the read sequence: line 2(2*0-th, 0th) and line 6(2*0+1-th, 1st, i=0, read sequence number, not line number, so line 1,3,4,5,7,8 is not read sequence, first 2 read sequence is read pair)

So, when converting two PE fastq into one ubam, then mapping reads in one ubam file using mem -p is OK, and you have to pay attention in other cases.

OLDER POSTS NEWER POSTS

Comments powered by Disqus.

Contents

Search Results