今天给大家推荐两款sam文件处理小工具samblaster和sambamba,它们具有排序、比对信息查看等常用功能之外,最棒的是可以用来代替picard去除重复序列,在筛选标准不变的前提下速度能提升30倍以上,赶紧来试试吧~ 这两款软件比picard快30倍 SAMBAMBA 功能介绍 sambamba主要有filter,merge,slice和duplicate等七个功能来处理sam/bam文件。 一、安装 (支持mac OS/linux 64位) git clone --recursive https://github.com/lomereiter/sambamba.git cd sambamba make 也可以在download里面手动下载,上传到系统解压安装sambamba_v0.6.7_linux.tar.bz2 二、使用方法 1.排序 2.建立索引 $ sambamba index example.bam #显示处理过程 $ sambamba index --show-progress example.bam
/tmp/example.bam.bai 3.提取文件的信息 #显示参考基因组序列基本信息 $ sambamba view --reference-info
ex1_header.bam [{"name":"chr1","length":1575},{"name":"chr2","length":1584}] #计算3号染色体上质量值大于5且序列长大于80bp的reads个数 $ sambamba view -c -F "ref_id == 3 and mapping_quality >=
50 and sequence_length >= 80" ex1_header.bam 3124 4.合并多个bam文件 5.查看reads
flag的比对结果 ·
duplicates ·
mapped
reads (plus percentage relative to the numbers from the first line) ·
reads
with 'is_paired' flag set ·
paired
reads which are first mates ·
paired
reads which are second mates ·
paired
reads with 'proper_pair' flag set (plus percentage relative to the numbers of
QC-passed/failed reads with 'is_paired' flag set) ·
paired
reads where both mates are mapped ·
paired
reads where read itself is unmapped but mate is mapped ·
paired
reads where mate is mapped to a different chromosome ·
the
same as previous but mapping quality is not less than 6.查重复序列 此外,还可以提取sam文件的某一段,sambamba slice OPTIONS<input.bam>
region SAMBLASTER https://github.com/GregoryFaust/samblaster 一、安装 (支持linux/mac OS Version 10.7以上) git clone git://github.com/GregoryFaust/samblaster.git cd samblaster make cp samblaster /usr/local/bin/. 二、使用方法 主要参数: 其他参数: 示例: #自动输出discordant read pairs和split read alignments: bwa mem <idxbase> samp.r1.fq samp.r2.fq
| samblaster -e -d samp.disc.sam -s samp.split.sam | samtools view -Sb - >
samp.out.bam #从bam文件中提取 split reads和discordants read pairs samtools view -h samp.bam | samblaster -a -e
-d samp.disc.sam -s samp.split.sam -o /dev/null 转自生信草堂公众号,已授权 |
|