NAME

snpParser.pl -- All the sub-routines for SNV (sometimes denoted interchangeably as SNP) handling in the ASARP pipeline.

SYNOPSIS

	use Statistics::R; #interact with R
	require "fileParser.pl"; #sub's for input annotation files
	require "snpParser.pl"; #sub's for snps
	...

... get all configs, input files (see fileParser)

	# read and parse SNVs
	my ($snpRef, $pRef) = initSnp($snpF, $POWCUTOFF);
        # suggested, get the Chi-Squared Test p-value cutoff from FDR ($FDRCUTOFF)
	($SNVPCUTOFF) = fdrControl($pRef, $FDRCUTOFF, 0); #1--verbose
	# match SNVs with gene transcript annotations
	my $geneSnpRef = setGeneSnps($snpRef, $transRef);
	# match gene SNVs with AI/AT and alternative splicing (AS) events
	my ($snpEventsRef) = 
	setSnpEvents($geneSnpRef, $altRef, $splicingRef);

	# calculate NEV and filter the matched gene SNVs with AI/AT/AS events
	my ($snpsNevRef) = 
	filterSnpEventsWithNev($snpRef, $geneSnpRef, $snpEventsRef, $bedF, 
	$allEventsListRef, $NEVCUTOFFLOWER, $NEVCUTOFFUPPER); 

	# process ASE and ASARP
	my ($allAsarpsRef) = 
	processASEWithNev($snpRef, $geneSnpRef, $snpsNevRef, $SNVPCUTOFF, 
	$ASARPPCUTOFF, $ALRATIOCUTOFF);

	# format results to output
	my $outputGene = $outputFile.'.gene.prediction';
	outputRawASARP($allAsarpsRef, 'ASARPgene', $outputGene);
	my $allNarOutput = formatOutputVerNAR($allAsarpsRef);

REQUIREMENT

Statistics::R: has to be installed. See search.cpan.org

DESCRIPTION

This perl file contains all the sub-routines for SNV handling and ASARP processing, as well as result formatting. They are quite procedural and one should first get the input files such as annotations and events using the sub-routines in fileParser.

Basically there are 3 steps:

1. read and parse the individual SNVs

2. match the SNVs to transcripts, and then events, and then filter them based on the PSI like Normalized Expression Value (NEV) calculation

3. process the SNVs with ASE patterns and SNV pairs with other ASARP patterns: AI/AT/AS, and output the formatted results

AI/AT/AS categories are briefly illustrated below (where the red dots represent SNVs with ASARP patterns):

SNV List Format

The SNV list input file contains the list of all SNVs covered by RNA-Seq in the genes of interest, with the read counts of the reference (Ref) and alternative (Alt) alleles.

This file is space delimited with the following fields for each SNV:

	chromosome
	coordinate
	alleles (reference allele>alternative allele)
	dbsnpID
	RNA-Seq counts 
	(# reads for 
	reference allele:alternative allele:wrong nucleotide)

Example file: data/snp.list.fig3

Sub-routines (major)