Example: [the path at the beginning of the code was necessary for excute the code]/usr/lib/qiime/bin/split_libraries.py -m map.txt -f mtt.fna -q mtt.qual -o lib1 -r -l 150 -b variable_length split_libraries.py -m
-f -q -r -l -L -t -s -k -B
-b
-e -c -a -H -M
-o
-n
--retain_unassigned_reads -w Enable sliding window test of quality scores. If the average score of a continuous set of w nucleotides falls below the threshold (see -s for default), the sequence is discarded. A good value would be 50. 0 (zero) means no filtering. Must pass a .qual file (see -q parameter) if this functionality is enabled. Default behavior for this function is to truncate the sequence at the beginning of the poor quality window, and test for minimal length (-l parameter) of the resulting sequence. [default: 0] -g, --discard_bad_windows If the qual_score_window option (-w) is enabled, this will override the default truncation behavior and discard any sequences where a bad window is found. [default: False] -p, --disable_primers Disable primer usage when demultiplexing. Should be enabled for unusual circumstances, such as analyzing Sanger sequence data generated with different primers. [default: False] -z, --reverse_primers Enable removal of the reverse primer and any subsequence sequence from the end of each read. To enable this, there has to be a “ReversePrimer” column in the mapping file. Primers a required to be in IUPAC format and written in the 5’ to 3’ direction. Valid options are ‘disable’, ‘truncate_only’, and ‘truncate_remove’. ‘truncate_only’ will remove the primer and subsequent sequence data from the output read and will not alter output of sequences where the primer cannot be found. ‘truncate_remove’ will flag sequences where the primer cannot be found to not be written and will record the quantity of such failed sequences in the log file. [default: disable] --reverse_primer_mismatches Set number of allowed mismatches for reverse primers (option -z). [default: 0] -d, --record_qual_scores Enables recording of quality scores for all sequences that are recorded. If this option is enabled, a file named seqs_filtered.qual will be created in the output directory, and will contain the same sequence IDs in the seqs.fna file and sequence quality scores matching the bases present in the seqs.fna file. [default: False] -i, --median_length_filtering Disables minimum and maximum sequence length filtering, and instead calculates the median sequence length and filters the sequences based upon the number of median absolute deviations specified by this parameter. Any sequences with lengths outside the number of deviations will be removed. [default: None] -j, --added_demultiplex_field Use -j to add a field to use in the mapping file as an additional demultiplexing option to the barcode. All combinations of barcodes and the values in these fields must be unique. The fields must contain values that can be parsed from the fasta labels such as “plate=R_2008_12_09”. In this case, “plate” would be the column header and “R_2008_12_09” would be the field data (minus quotes) in the mapping file. To use the run prefix from the fasta label, such as “>FLP3FBN01ELBSX”, where “FLP3FBN01” is generated from the run ID, use “-j run_prefix” and set the run prefix to be used as the data under the column headerr “run_prefix”. [default: None] -x, --truncate_ambi_bases Enable to truncate at the first “N” character encountered in the sequences. This will disable testing for ambiguous bases (-a option) [default: False] 生成文件: .fna histograms.txt包含了特殊长度的序列的数目
split_library_log.txt 1,如果是好几个样品,只要他们Map文件中barcode不一样,可以这么来: split_libraries.py -m Mapping_File.txt -f 1.TCA.454Reads.fna,2.TCA.454Reads.fna -q 1.TCA.454Reads.qual,2.TCA.454Reads.qual -o Split_Library_Output_comma_separated/ 也可以直接将所有序列合并后再来处理 2,如果是双端测序,来自两个测序。比如说同一个barcode的几个不同测序结果中编号一样,如果都用同一个barcode,导致的结果就是不同测序中的片段被划分了同一个编号。 split_libraries.py -m Mapping_File.txt -f 1.TCA.454Reads.fna -q 1.TCA.454Reads.qual -o Split_Library_Run1_Output/ split_libraries.py -m Mapping_File.txt -f 2.TCA.454Reads.fna -q 2.TCA.454Reads.qual -o Split_Library_Run2_Output/ -n 2000000 cat Split_Library_Run1_Output/seqs.fna Split_Library_Run2_Output/seqs.fna > Combined_seqs.fna -n后面接着起始序列编号,这个数值应该大于打一个脚本中序列数之和 参考资料: http:///scripts/split_libraries.html 铁汉1990的博客 |
|