% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nullp.R
\name{nullp}
\alias{nullp}
\title{Probability Weighting Function}
\usage{
nullp(DEgenes, genome, id, bias.data = NULL, plot.fit = TRUE)
}
\arguments{
\item{DEgenes}{A named binary vector where 1 represents DE, 0 not DE and the
names are gene IDs.}

\item{genome}{A string identifying the genome that \code{genes} refer to.
For a list of supported organisms run \code{\link{supportedGenomes}}.}

\item{id}{A string identifying the gene identifier used by \code{genes}.
For a list of supported gene IDs run \code{\link{supportedGeneIDs}}.}

\item{bias.data}{A numeric vector containing the data on which the DE may
depend.  Usually this is the median transcript length of each gene in bp.
If set to \code{NULL} \code{nullp} will attempt to fetch length using
\code{\link{getlength}}.}

\item{plot.fit}{Plot the PWF or not?  Calls \code{\link{plotPWF}} with
default values if \code{TRUE}.}
}
\value{
A data frame with 3 columns, named "DEgenes", "bias.data" and "pwf"
with the rownames set to the gene names.  Each row corresponds to a gene
with the DEgenes column specifying if the gene is DE (1 for DE, 0 for not
DE), the bias.data column giving the numeric value of the DE bias being
accounted for (usually the gene length or number of counts) and the pwf
column giving the genes value on the probability weighting function. This
object is usually passed to \code{goseq} to calculate enriched categories or
\code{plotPWF} for further plotting.
}
\description{
Calculates a Probability Weighting Function for a set of genes based on a
given set of biased data (usually gene length) and each genes status as
differentially expressed or not.
}
\details{
It is essential that the entire analysis pipeline, from summarizing raw
reads through to using \code{goseq} be done in just one gene identifier
format.  If your data is in a different format you will need to obtain the
gene lengths and supply them to the \code{nullp} function using the
\code{bias.data} argument.  Converting to a supported format from another
format should be avoided whenever possible as this will almost always result
in data loss.

\code{NA}s are allowed in the bias.data vector if you do not have
information about a certain gene.  Setting a gene to \code{NA} is preferable
to removing it from the analysis.

If \code{bias.data} is left as \code{NULL}, \code{nullp} attempts to use
\code{\link{getlength}} to fetch GO category to gene identifier mappings.

It is recommended you review the fit produced by the \code{nullp} function
before proceeding by leaving \code{plot.fit} as \code{TRUE}.
}
\examples{
data(genes)
pwf <- nullp(genes, 'hg19', 'ensGene')

}
\references{
Young, M. D., Wakefield, M. J., Smyth, G. K., Oshlack, A. (2010)
\emph{Gene ontology analysis for RNA-seq: accounting for selection bias}
Genome Biology Date: Feb 2010 Vol: 11 Issue: 2 Pages: R14
}
\seealso{
\code{\link{supportedGenomes}}, \code{\link{supportedGeneIDs}},
\code{\link{goseq}}, \code{\link{getlength}}
}
\author{
Matthew D. Young \email{myoung@wehi.edu.au}
}
