NAME

rmDup.pl -- Removing duplicates in a SAM file of a chromosome (Dr. JH Lee's format), where the extra 12th attribute (mapped read blocks) are used to identify distinct reads (read pairs). Reads (read pairs) are considered as duplicates only if all of their mapped read blocks have the same coordinates. Only the read (pair) with the highest read quality will be kept. The output SAM file can be used as the input file for merging of multiple independent replicates (mergeSam), or read processing to generate SNV and bedgraph files (procReads).

SYNOPSIS

This is part of the full pre-processing:

USAGE:

 perl rmDup.pl input_sam_file output_sam_file is_paired_end

NOTE:

the duplicate removal script is for standard SAM and Dr. Jae-Hyung Lee's 20-attribute SAM file output formats, used in RNA-editing or allele specific expression (ASE) studies

ARGUMENTS:

 is_paired_end		0: single-end; 1: paired-end
			For paired-end reads, all reads should be paired up, 
			where pair-1 should be always followed by pair-2 in the next line.

DESCRIPTION

input_sam_file should contain only 1 chromosome, and it should be in standard SAM format or Dr. Jae-Hyung Lee's SAM format (check out www.ncbi.nlm.nih.gov for more details)

SEE ALSO

mergeSam, procReads, asarp

COPYRIGHT

This pipeline is free software; you can redistribute it and/or modify it given that the related works and authors are cited and acknowledged.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

AUTHOR

Cyrus Tak-Ming CHAN

Xiao Lab, Department of Integrative Biology & Physiology, UCLA