源码阅读网elAsticsearch核心技术
比对到参考基因组,生成SAM文件
bwa index -a bwtsw hg19.fa #*.fa >2G,用-a bwtsw;*.fa < 2G,用-a is (默认)time bwa mem -M -t 10 -R '@RG\tID:HKV2KALXX.7\tSM:sample1\tPL:illumina\tLB:sample1'
$DB/hg19.fa sample1_1.fq sample1_2.fq >aligned_sample1.sam#-M :Mark shorter split hits as secondary (essential for Picard compatibility)#-t: number of threads#-R:定义头文件,如果在此步骤不进行头文件定义,在后续GATK分析中还是需要重新增加头文件。具体信息可从样本的fq文件中获得。#@RG: Read Group,必须要有,否则GATK无法进行calling;比对速度比不加@RG更快 ##ID:Read group identifier, flowcell + lane name and number in Illunima data >>>> ID:FLOWCELL1.LANE1(每个flowcell的每个lane是unique的), EX. HKV2KALXX.7#PL: platform>>> ILLUMINA#SM: Sample >>>sample1#LB: DNA preparation library identifier. MarkDuplicates uses the LB

