Posts bcftools usage (Updating)
Post
Cancel

bcftools usage (Updating)

Select genotypes from multi samples

We got a vcf file named as trio.vcf containing 3 samples for a trio family for genetic analysis:

\# ...
\# ...
\# ...
\#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1 Sample2 Sample3
1       12783   .       G       A       607.89  PASS    AC=3;AF=0.750;AN=4;BaseQRankSum=0.00;DP=27;ExcessHet=3.0103;FS=0.000;MLEAC=3;MLEAF=0.750;MQ=26.35;MQRankSum=0.621;QD=22.51;ReadPosRankSum=1.73;SOR=2.753        GT:AD:AF:DP:GQ:JL:JP:PL:PP      0/1:5,17:0.773:22:88:-1:-1:463,0,88:478,0,88    1/1:0,5:1.00:5:13:-1:-1:157,15,0:157,13,0       ./.:0,0:.:0:.:.:.:0,0,0
1       13178   .       G       A       56.48   PASS    AC=1;AF=0.167;AN=6;BaseQRankSum=1.48;DP=254;ExcessHet=3.0103;FS=9.066;MLEAC=1;MLEAF=0.167;MQ=25.58;MQRankSum=-9.600e-02;QD=2.09;ReadPosRankSum=-6.200e-02;SOR=3.287     GT:AD:AF:DP:GQ:JL:JP:PL:PP      0/0:218,0:.:218:99:3:3:0,120,1800:0,120,1803    0/1:22,5:0.185:27:64:3:3:64,0,543:64,0,603 0/0:9,0:.:9:0:3:3:0,0,194:0,0,254
...
...

The header for #… means headers of vcf, just for simplicity. And how to select all genotypes identical for all 3 samples?

Before going on next, how to select specified genotype for ONE sample? bcftools view -i 'GT[0]~"^1.1"' trio.vcf, GT[0]~”^1.1” means first sample’s GT is match pattern ‘^1.1’, which can be 1/1, 1|1.

So, we can use bcftools view -i 'GT[0]==GT[1]' trio.vcf to select subvcf that genotype of sample1 and sample2 is identical. But! Bcftools will interrupt with “Comparison between samples is not supported, sorry!” Unfortunately, we can’t do comparions between samples using bcftools.

awk for substitution

awk '{if($0 !~ /^#/){split($(NF-2),arr1,":");split($(NF-1),arr2,":");split($NF,arr3,":");if (arr1[1]==arr2[1] && arr2[1]==arr3[1]){print $0}}else if($0 ~ /^#/)print $0 }' trio.vcf This awk select identical genotypes in all trio samples. Take note that keep origin headers of vcf for downstream usage with bcftools.

OLDER POSTS NEWER POSTS

Comments powered by Disqus.

Contents

Search Results