Package 'sgof' reference manual

Title:	Multiple Hypothesis Testing
Description:	Seven different methods for multiple testing problems. The SGoF-type methods (see for example, Carvajal Rodríguez et al., 2009 <doi:10.1186/1471-2105-10-209>; de Uña Álvarez, 2012 <doi:10.1515/1544-6115.1812>; Castro Conde et al., 2015 <doi:10.1177/0962280215597580>) and the BH and BY false discovery rate controlling procedures.
Authors:	Irene Castro Conde and Jacobo de Una Alvarez
Maintainer:	Irene Castro Conde <[email protected]>
License:	GPL-2
Version:	2.3.5
Built:	2025-03-16 05:35:09 UTC
Source:	https://github.com/cran/sgof

Multiple hypothesis testing

Description

This package implements seven different methods for multiple testing problems. The Benjamini and Hochberg (1995) false discovery rate controlling procedure and its modification for dependent tests Benjamini and Yekutieli (2001), the method called Binomial SGoF proposed in Carvajal Rodríguez et al. (2009) and its conservative and bayesian versions called Conservative SGoF (de Uña Álvarez, 2011) and Bayesian SGoF (Castro Conde and de Uña Álvarez, 2013 13/06), respectively, and the BB-SGoF (Beta-Binomial SGoF, de Uña Álvarez, 2012) and Discrete SGoF (Castro Conde et al., 2015) procedures which are adaptations of SGoF method for possibly correlated tests and for discrete tests, respectively. Number of rejections, FDR and adjusted p-values are computed among other things.

Details

This package incorporates the functions BH,BY, SGoF, Binomial.SGoF, Bayesian.SGoF, Discrete.SGoF and BBSGoF, which call the methods aforementioned. For a complete list of functions, use library(help="sgof").

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

Maintainer:Irene Castro Conde [email protected]

References

Benjamini Y and Hochberg Y (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological) 57, 289–300.

Benjamini Y and Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165–-1188.

Carvajal Rodríguez A, de Uña Álvarez J and Rolán Álvarez E (2009). A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests. BMC Bioinformatics 10:209.

Castro Conde I and de Uña Álvarez J. Power, FDR and conservativeness of BB-SGoF method. Computational Statistics; Volume 30, Issue 4, pp 1143-1161 DOI: 10.1007/s00180-015-0553-2.

Castro Conde I and de Uña Álvarez J (2015). Adjusted p-values for SGoF multiple test procedure. Biometrical Journal; 57(1): 108-122. DOI: 10.1002/bimj.201300238

Castro Conde I, Döhler S and de Uña Álvarez J (2015). An extended sequential goodness-of-fit multiple testing method for discrete data. Statistical Methods in Medical Research. doi: 10.1177/0962280215597580.

Castro Conde I and de Uña Álvarez J (2014). sgof: An R package for multiple testing problems. The R Journal; Vol. 6/2 December: 96-113.

Castro Conde I and de Uña Álvarez J (2013). SGoF multitesting method under the Bayesian paradigm. Discussion Papers in Statistics and Operation Research. Report 13/06. Statistics and OR Department. University of Vigo.

Dalmasso C, Broet P and Moreau T (2005). A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

de Uña Álvarez J (2011). On the statistical properties of SGoF multitesting method. Statistical Applications in Genetics and Molecular Biology, Vol. 10, Iss. 1, Article 18.

de Uña Álvarez J (2012). The Beta-Binomial SGoF method for multiple dependent tests. Statistical Applications in Genetics and Molecular Biology, Vol. 11, Iss. 3, Article 14.

Kihn C, Döhler S, Junge F (2024). DiscreteDatasets: Example Data Sets for Use with Discrete Statistical Tests. R package version 0.1.1

Hong Y. (2013). On computing the distribution functions for the Poisson binomial distribution. Computational Statistics and Data Analysis 59, 41-51.

Hong Y. (2019). poibin: The Poisson Binomial Distribution. R package version 1.4

Pounds, S. and C. Cheng (2006). Robust estimation of the false discovery rate. Bioinformatics 22 (16), 1979-1987.

Bayesian SGoF multiple testing procedure

Description

Performs the Bayesian SGoF method (Castro Conde and de Uña Álvarez , 2013 13/06) for multiple hypothesis testing.

Usage

Bayesian.SGoF(u, alpha = 0.05, gamma = 0.05, P0 = 0.5, a0 = 1, b0 = 1)
Bayesian.SGoF(u, alpha = 0.05, gamma = 0.05, P0 = 0.5, a0 = 1, b0 = 1)

Arguments

`u`	A (non-empty) numeric vector of p-values.
`alpha`	Numeric value. The significance level of the metatest.
`gamma`	Numeric value. The p-value threshold, so Bayesian SGoF looks for significance in the amount of p-values below gamma.
`P0`	The a priori probability of the null hypothesis.
`a0`	The first parameter of the a priori beta distribution.
`b0`	The second parameter of the a priori beta distribution.

Details

Bayesian SGoF (Castro Conde and de Uña Álvarez, 2013 13/06) is an adaptation of SGoF method to the Bayesian paradigm, in which the proportion of p-values falling below gamma is random. This method has two main steps. First, Bayesian SGoF performs a pre-test at level alpha which decides if the complete null hypothesis should be rejected or not. This Bayesian pre-test is based on lower bounds of the a posteriori probability of H0 (computed using the default a priori probability P0=.5, unless otherwise is indicated, and a family of a priori beta distributions located at the null and indexed by a correlation factor). Second, the number of rejections is computed constructing an interval for the 'excess of significant cases', analogously to the SGoF procedure. For this, the posterior distribution of the proportion of p-values falling below gamma is used; this posterior distribution is calculated on the basis of the default priors a0=b0=1, unless otherwise is indicated. Besides, the posterior probability that the complete null hypothesis is true is computed using P0, a0 and b0. One important difference between the frequentist and the Bayesian SGoF is that the Bayesian setting induces (and hence allows for) a dependence structure among the p-values; this is very interesting for real problems where correlation is present. From a less philosophical point of view, in practice Bayesian SGoF may be more conservative than frequentist SGoF, particularly when the number of tests is small; this is due to the fact that Bayesian testing of point nulls is much more conservative than its frequentist counterpart and, therefore, the pre-test part of Bayesian SGoF may play a very important role. Typically the choice alpha=gamma will be used for Bayesian.SGoF; this common value will be set as one of the usual significance levels (0.001, 0.01, 0.05, 0.1). Note however that alpha and gamma have different roles. The false discovery rate is estimated by the simple method proposed by: Dalmasso , Broet , Moreau (2005), by taking n=1 in their formula.

Value

A list containing the following values:

`Rejections`	The number of effects declared by Bayesian SGoF.
`FDR`	The estimated false discovery rate.
`Posterior`	The posterior probability that the complete null hypothesis is true depending on a0, b0 and P0.
`s`	The amount of p-values falling below gamma.
`s.alpha`	Critical point at level alpha of the Bayesian pre-test for the complete null depending on P0.
`data`	The original p-values.
`alpha`	The specified significance level for the metatest.
`gamma`	The specified p-value threshold.
`P0`	The specified a priori probability of the null hypothesis.
`a0`	The first specified parameter of the a priori beta distribution.
`b0`	The second specified parameter of the a priori beta distribution.
`call`	The matched call.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Dalmasso C, Broet P and Moreau T (2005) A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

Examples



res<-Bayesian.SGoF(Hedenfalk$x)
summary(res)   

res<-Bayesian.SGoF(Hedenfalk$x)
summary(res)

BBSGoF multiple testing procedure.

Description

Usage

BBSGoF(u, alpha = 0.05, gamma = 0.05, kmin = 2, kmax = min(length(u)%/%10, 100),
 tol = 10, adjusted.pvalues = FALSE, blocks = NA)
BBSGoF(u, alpha = 0.05, gamma = 0.05, kmin = 2, kmax = min(length(u)%/%10, 100),
 tol = 10, adjusted.pvalues = FALSE, blocks = NA)

Arguments

`u`	A (non-empty) numeric vector of p-values.
`alpha`	Numeric value. The significance level of the metatest.
`gamma`	Numeric value. The p-value threshold, so SGoF looks for significance in the amount of p-values below gamma.
`kmin`	Numeric value. The smallest allowed number of blocks of correlated tests.
`kmax`	Numeric value. The largest allowed number of blocks of correlated tests.
`tol`	Numeric value. The tolerance in model fitting (see Details).
`adjusted.pvalues`	Logical. Default is FALSE. A variable indicating whether to compute the adjusted p-values.
`blocks`	Numeric value. The number of existing blocks (see Details).

Details

BB-SGoF (de Uña-Álvarez, 2012; Castro-Conde and de Uña-Álvarez, in press) is an adaptation of SGoF method for possibly dependent tests. It is initially assumed that the provided vector of p-values are correlated in k blocks of equal size (following the given sequence), where k is unknown. Inference on the number of existing effects is performed following SGoF principles, but replacing the binomial distribution for a beta-binomial in the metatest. The beta-binomial distribution is approximated by the normal distribution; therefore, some caution is needed when the number of tests is small. It is implicitly assumed that the probability p for a p-value to fall below gamma is random, following a beta distribution, Beta(a,b); as a consequence, the number of p-values below gamma in each block generates a random sample from a Betabinomial(p,rho) model, where p=E(p)=a/(a+b) and rho=Var(p)/p(1-p)=1/(a+b+1) are respectively the mean of p and the within-block correlation between two indicators of type I(pi<gamma), I(pj<gamma). The parameters are estimated by maximum likelihood, and the asymptotic normal distribution of the estimated parameters is used to perform the inferences (so caution is needed when the number of p-values is small). Since k is unknown, the method is fitted for each integer ranging from k=kmin to k=kmax, and results for each k are saved. Automatic (conservative) choice of k is also performed; the automatic k is the value of k leading to the smallest amount of declared effects (by effects it is meant null hypotheses to be rejected). The excess of observed significant cases in the beta-binomial metatest are reported as number of existing effects N. Finally, the effects are identified by considering the smallest N p-values. BB-SGoF procedure weakly controls the family-wise error rate (FWER) and the false discovery rate (FDR) at level alpha. That is, the probability of commiting one or more than one type I errors along the multiple tests is bounded by alpha when all the null hypotheses are true. SGoF does not control for FWER nor FDR in the presence of effects. It has been quoted that BB-SGoF provides a good balance between FDR and power, particularly when the number of tests is large, and the effect level is weak to moderate. It is also known that the number of effects declared by BB-SGoF is a 100(1-alpha)% lower bound for the true number of existing effects with p-value below the initial threshold gamma so, interestingly, at probability 1-alpha, the number of false discoveries of BB-SGoF does not exceed the number of false non-discoveries (de Uña-Álvarez, 2012). As for SGoF method, typically the choice alpha=gamma will be used for BB-SGoF; this common value will be set as one of the usual significance levels (0.001, 0.01, 0.05, 0.1). Note however that alpha and gamma have different roles. When adjusted.pvalues=TRUE adjusted p-values are calculated. This are defined in the same spirit of SGoF method, but a guessed value for k must be supplied in the argument blocks. Once k is supplied, the adjusted p-value of a given p-value pi is defined as the smallest alpha0 for which the null hypothesis attached to pi is rejected by BB-SGoF (based on the given k) with alpha=gamma=alpha0. Actually, BBSGoF function provides an approximation of these adjusted p-values by restricting alpha0 to the set of original p-values. The argument tol allows for a stronger (small tol) or weaker (large tol) criterion when removing poor fits of the beta-binomial model. When the variance of the estimated beta-binomial parameters for a given k is larger than tol times the median variance along k=kmin,...,kmax, the particular value of k is discarded. The false discovery rate is estimated by the simple method proposed by: Dalmasso, Broet, Moreau (2005), by taking n=1 in their formula.

Value

A list containing the following values:

`Rejections`	The number of effects declared by BB-SGoF with automatic k.
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	The adjusted p-values.
`effects`	A vector with the number of effects declared by BB-SGoF for each value of k.
`SGoF`	The number of effects declared by Conservative SGoF.
`automatic.blocks`	The automatic number of blocks.
`deleted.blocks`	A vector with the values of k for which the model gave a poor fit.
`n.blocks`	A vector with the values of k for which the model fitted well.
`p`	The average ratio of p-values below gamma.
`cor`	A vector with the estimated within-block correlation.
`Tarone.pvalues`	A vector with the p-values of Tarone’s test for no correlation.
`Tarone.pvalue.auto`	The p-values of Tarone’s test for the automatic k.
`beta.parameters`	The estimated parameters of the Beta(a,b) model for the automatic k.
`betabinomial.parameters`	The estimated parameters of the Betabinomial(p,rho) model for the automatic k.
`sd.betabinomial.parameters`	The standard deviation of the estimated parameters of the Betabinomial(p,rho) model for the automatic k.
`data`	The original p-values.
`adjusted.pvalues`	A logical value indicating whether the adjusted p-values have been ordered.
`blocks`	Guessed value of k.
`n`	The length of x.
`alpha`	The specified significance level for the metatest.
`gamma`	The specified p-value threshold.
`kmin`	The smallest allowed number of blocks of correlated tests.
`kmax`	The largest allowed number of blocks of correlated tests.
`tol`	Tolerance in model fitting (see Details).
`call`	The matched call.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Castro Conde I and de Uña Álvarez J. Power, FDR and conservativeness of BB-SGoF method. Computational Statistics; Volume 30, Issue 4, pp 1143-1161 DOI: 10.1007/s00180-015-0553-2.

Dalmasso C, Broet P and Moreau T (2005). A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

de Uña Álvarez J (2012). The Beta-Binomial SGoF method for multiple dependent tests. Statistical Applications in Genetics and Molecular Biology, Vol. 11, Iss. 3, Article 14.

Examples


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-BBSGoF(p)
summary(res)    #automatic number of blocks, number of rejected nulls, 
		#estimated FDR, beta and beta-binomial parameters,
		#Tarone test of no correlation 

par(mfrow=c(2,2))
plot(res)   #Tarone test, within-block correlation, beta density (for automatic k),
	    #and decision plot (number of rejected nulls)


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-BBSGoF(p)
summary(res)    #automatic number of blocks, number of rejected nulls, 
		#estimated FDR, beta and beta-binomial parameters,
		#Tarone test of no correlation 

par(mfrow=c(2,2))
plot(res)   #Tarone test, within-block correlation, beta density (for automatic k),
	    #and decision plot (number of rejected nulls)

Benjamini-Hochberg (BH) multiple testing procedure

Description

Performs the Benjamini-Hochberg FDR-controlling method for multiple hypothesis testing.

Usage

BH(u, alpha = 0.05)
BH(u, alpha = 0.05)

Arguments

`u`	A (non-empty) numeric vector of p-values.
`alpha`	Numeric value. The significance level of the test.

Details

The function BH allows for the application of the Benjamini and Hochberg (1995) false discovery rate controlling procedure. The false discovery rate is estimated by the simple method proposed by: Dalmasso, Broet, Moreau (2005), by taking n=1 in their formula.

Value

A list containing the following values:

`Rejections`	The number of effects declared by the BH procedure.
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	The adjusted p-values.
`data`	The original p-values.
`alpha`	The specified significance level for the test.
`call`	The matched call.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Dalmasso C, Broet P and Moreau T (2005). A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

Examples




res<-BH(Hedenfalk$x)
summary(res)   #number of rejected nulls, estimated FDR
plot(res)   #adjusted p-values
res<-BH(Hedenfalk$x)
summary(res)   #number of rejected nulls, estimated FDR
plot(res)   #adjusted p-values

Binomial SGoF multiple testing procedure

Description

Performs the Binomial SGoF method (Carvajal Rodríguez et al., 2009) for multiple hypothesis testing.

Usage

Binomial.SGoF(u, alpha = 0.05, gamma = 0.05)

Binomial.SGoF(u, alpha = 0.05, gamma = 0.05)

Arguments

`u`	A (non-empty) numeric vector of p-values.
`alpha`	Numeric value. The significance level of the metatest.
`gamma`	Numeric value. The p-value threshold, so Binomial SGoF looks for significance in the amount of p-values below gamma.

Details

Binomial SGoF starts by counting the amount of p-values below gamma. This amount is compared to the expected one under the intersection or complete null hypothesis (all the nulls are true) in a metatest, performed at level alpha. Note that, under the intersection null, the p-values will be uniformly distributed on the (0,1) interval, so one expects gamma times the length of u p-values falling below gamma. If the intersection null is accepted, Binomial SGoF reports no effects. If the intersection null is rejected, the excess of observed significant cases are reported as number of existing effects N (by effects it is meant null hypotheses to be rejected). Finally, the effects are identified by considering the smallest N p-values. The only input you need is the set of p-values. Binomial SGoF procedure weakly controls the family-wise error rate (FWER) and the false discovery rate (FDR) at level alpha. That is, the probability of committing one or more than one type I errors along the multiple tests is bounded by alpha when all the null hypotheses are true. SGoF does not control for FWER nor FDR in the presence of effects. It has been quoted that Binomial SGoF provides a good balance between FDR and power, particularly when the number of tests is large, and the effect level is weak to moderate. It is also known that the number of effects declared by Binomial SGoF is a 100(1-alpha)% lower bound for the true number of existing effects with p-value below the initial threshold gamma so, interestingly, at probability 1-alpha, the number of false discoveries of SGoF does not exceed the number of false non-discoveries (de Uña Álvarez, 2012). Typically, the choice alpha=gamma will be used; this common value will be set as one of the usual significance levels (0.001, 0.01, 0.05, 0.1). Note however that alpha and gamma have different roles. The FDR is estimated by the simple method proposed by: Dalmasso, Broet, Moreau (2005), by taking n=1 in their formula. The adjusted p-value of a given p-value pi is defined as the smallest alpha0 for which the null hypothesis attached to pi is rejected by SGoF with alpha=gamma=alpha0. Actually, Binomial.SGoF function provides these adjusted p-values by restricting alpha0 to the set of original p-values. Castro-Conde and de Uña-Álvarez (2015) proved that this restriction does not change the adjusted p-values, while reducing the computational time.

Value

A list containing the following values:

`Rejections`	The number of effects declared by Binomial SGoF.
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	The adjusted p-values.
`data`	The original p-values.
`alpha`	The specified significance level for the metatest.
`gamma`	The specified p-value threshold.
`call`	The matched call.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Castro Conde I and de Uña Álvarez J (2015). Adjusted p-values for SGoF multiple test procedure. Biometrical Journal, 57(1): 108-122. DOI: 10.1002/bimj.201300238

Dalmasso C, Broet P and Moreau T (2005) A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

de Uña Álvarez J (2011). On the statistical properties of SGoF multitesting method. Statistical Applications in Genetics and Molecular Biology, Vol. 10, Iss. 1, Article 18.

Examples


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-Binomial.SGoF(p)
summary(res)   #number of rejected nulls, estimated FDR
plot(res)   #adjusted p-values
p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-Binomial.SGoF(p)
summary(res)   #number of rejected nulls, estimated FDR
plot(res)   #adjusted p-values

Benjamini-Yekutieli (BY) multiple testing procedure

Description

Performs the Benjamini-Yekutieli FDR-controlling method for multiple hypothesis testing.

Usage

BY(u, alpha = 0.05)
BY(u, alpha = 0.05)

Arguments

`u`	A (non-empty) numeric vector of p-values.
`alpha`	Numeric value. The significance level of the test.

Details

The function BY allows for the application of the Benjamini and Yekutieli (2001) false discovery rate controlling procedure under dependence assumptions. The false discovery rate is estimated by the simple method proposed by: Dalmasso, Broet, Moreau (2005), by taking n=1 in their formula.

Value

A list containing the following values:

`Rejections`	The number of effects declared by the BY procedure.
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	The adjusted p-values.
`data`	The original p-values.
`alpha`	The specified significance level for the test.
`call`	The matched call.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Benjamini Y and Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165–1188.

Dalmasso C, Broet P and Moreau T (2005). A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

Examples




res<-BY(Hedenfalk$x)
summary(res)   #number of rejected nulls, estimated FDR
plot(res)   #adjusted p-values
res<-BY(Hedenfalk$x)
summary(res)   #number of rejected nulls, estimated FDR
plot(res)   #adjusted p-values

Discrete SGoF multiple testing procedure

Description

Performs the Discrete SGoF method (Castro-Conde, Döhler et al., 2015) for multiple hypothesis testing.

Usage

Discrete.SGoF(u,pCDFlist=NA, K=NA, alpha = 0.05, gamma = 0.05, method=NA, 
              Discrete=TRUE, Sides=1,...)

Discrete.SGoF(u,pCDFlist=NA, K=NA, alpha = 0.05, gamma = 0.05, method=NA, 
              Discrete=TRUE, Sides=1,...)

Arguments

`u`	A (non-empty) numeric vector of p-values.
`pCDFlist`	A (non-empty) list with the empirical cumulative function of each discrete p-value.
`K`	Numeric value. The number of continuous tests.
`alpha`	Numeric value. The significance level of the metatest.
`gamma`	Numeric value. The p-value threshold, so Discrete SGoF looks for significance in the amount of p-values below gamma.
`method`	Method used in the computation of the Poisson binomial quantile. "DFT-CF" for the exact method and "RNA" for the refined normal approximation.
`Discrete`	Logical. Default is TRUE. A variable indicating if the tests are discrete or continuous in order to estimate the FDR.
`Sides`	Numeric value indicating if the tests are one-sided (default), `Sides=1`, or two-sided, `Sides=2` in order to estimate the FDR.
`...`	Other parameters to be passed through to `robust.fdr` function.

Details

Discrete SGoF is an extension of Binomial SGoF, based on the generalized or Poisson binomial distribution (Hong, 2013), which takes into account the discreteness of the p-values. If all the tests are continuous Discrete SGoF reduces to Binomial SGoF method. In particular, if the p-values are continuous, the number of rejections given by Discrete.SGoF will be the number of effects declared by Binomial SGoF. For computing the Poisson Binomial quantile, the poibin package (Hong, 2019) is used. The exact method ("DFT-CF") and the "RNA" approximation are used by default to compute the quantile depending on whether the number of tests is smaller than 2000 or not, respectively (see Hong 2013a for more information). However, the user can specified which of the two methods to use. Discrete SGoF works the same like Binomial SGoF but it uses the quantiles of the generalized binomial distribution, as mentioned, instead of the binomial quantiles. Discrete SGoF maintains the theoretical properties of Binomial SGoF, e.g. weak control of FDR(FWER) and increasing power when the number of tests increases (de Uña Álvarez, 2011). The FDR is estimated by using the method proposed by: Pounds and Cheng (2006) using the robust.fdr function provided by the authors.

Value

A list containing the following values:

`Rejections`	The number of effects declared by Discrete SGoF.
`FDR`	The estimated false discovery rate.
`pvalues`	The original p-values.
`alpha`	The specified significance level for the metatest.
`gamma`	The specified p-value threshold.
`K`	The specified number of continuous tests.
`Method`	The specified method used in the computation of the Poisson binomial quantile.
`Discrete`	The specified type of tests.
`Sides`	Numeric value indicating if the tests are one-sided (default), `Sides=1`, or two-sided, `Sides=2`.
`call`	The matched call.

Author(s)

Irene Castro Conde and Sebastian Döhler

References

de Uña Álvarez J (2011). On the statistical properties of SGoF multitesting method. Statistical Applications in Genetics and Molecular Biology, Vol. 10, Iss. 1, Article 18.

Kihn C, Döhler S, Junge F (2024). DiscreteDatasets: Example Data Sets for Use with Discrete Statistical Tests. R package version 0.1.1

Hong Y. (2013). On computing the distribution functions for the Poisson binomial distribution. Computational Statistics and Data Analysis 59, 41-51.

Hong Y. (2019). poibin: The Poisson Binomial Distribution. R package version 1.4

Pounds, S. and C. Cheng (2006). Robust estimation of the false discovery rate. Bioinformatics 22 (16), 1979-1987.

Examples



require(DiscreteDatasets)

data(amnesia) #discrete data

AllAdverseCases<-amnesia$OtherAdverseCases + amnesia$AmnesiaCases
A11 <- amnesia$AmnesiaCases
A21 <- sum(AllAdverseCases) - A11
A12 <- AllAdverseCases - A11
A22 <- sum(AllAdverseCases) - sum(amnesia$AmnesiaCases) - A12

A1. <- sum(amnesia$AmnesiaCases)
A2. <- sum(AllAdverseCases) - A1.
  
n <- A11 + A12
k <- pmin(n,A1.)

pCDFlist <- list()
pvec <- numeric(nrow(amnesia))

## Calculation of the p-values and the p-values CDFs: 

for (i in 1:nrow(amnesia))
{
  x <- 0:k[i]
  pCDFlist[[i]] <- dhyper(x ,A1., A2. ,n[i]) + phyper(x ,A1. ,A2. ,n[i] ,lower.tail = FALSE)
  pCDFlist[[i]] <- rev(pCDFlist[[i]])
  pvec[i] <- dhyper(A11[i] ,A1. ,A2. ,n[i]) + phyper(A11[i] ,A1. ,A2. ,n[i] ,lower.tail = FALSE)
}

res<-Discrete.SGoF(u=pvec,pCDFlist=pCDFlist,alpha=0.05,gamma=0.05,Discrete=TRUE,Sides=1)
res


#continuous p-values

res2<-Discrete.SGoF(u=Hedenfalk$x,K=3170,Discrete=FALSE, method="DFT-CF",Sides=2)
res2

require(DiscreteDatasets)

data(amnesia) #discrete data

AllAdverseCases<-amnesia$OtherAdverseCases + amnesia$AmnesiaCases
A11 <- amnesia$AmnesiaCases
A21 <- sum(AllAdverseCases) - A11
A12 <- AllAdverseCases - A11
A22 <- sum(AllAdverseCases) - sum(amnesia$AmnesiaCases) - A12

A1. <- sum(amnesia$AmnesiaCases)
A2. <- sum(AllAdverseCases) - A1.
  
n <- A11 + A12
k <- pmin(n,A1.)

pCDFlist <- list()
pvec <- numeric(nrow(amnesia))

## Calculation of the p-values and the p-values CDFs: 

for (i in 1:nrow(amnesia))
{
  x <- 0:k[i]
  pCDFlist[[i]] <- dhyper(x ,A1., A2. ,n[i]) + phyper(x ,A1. ,A2. ,n[i] ,lower.tail = FALSE)
  pCDFlist[[i]] <- rev(pCDFlist[[i]])
  pvec[i] <- dhyper(A11[i] ,A1. ,A2. ,n[i]) + phyper(A11[i] ,A1. ,A2. ,n[i] ,lower.tail = FALSE)
}

res<-Discrete.SGoF(u=pvec,pCDFlist=pCDFlist,alpha=0.05,gamma=0.05,Discrete=TRUE,Sides=1)
res


#continuous p-values

res2<-Discrete.SGoF(u=Hedenfalk$x,K=3170,Discrete=FALSE, method="DFT-CF",Sides=2)
res2

Hedenfalk data

Description

The data include information of the micro array study of hereditary breast cancer of Hedenfalk et al. (2001). Many cases of hereditary breast cancer are due to mutations in either the BRCA1 or the BRCA2 gene. The histopathological changes in these cancers are often characteristic of the mutant gene. They hypothesized that the genes expressed by these two types of tumors are also distinctive, perhaps allowing to identify cases of hereditary breast cancer on the basis of gene-expression profiles.

The patients consisted of 23 with BRCA1 mutations, 17 with BRCA2 mutations, 20 with familial breast cancer, 19 with possibly familial breast cancer and 34 with sporadic breast cancer to determine whether there are distinctive patterns of global gene expression in these three kinds of tumors.

One of the goals of this study was to find genes differentially expressed between BRCA1- and BRCA2-mutation positive tumors. Thus, the data included here are p-values obtained from a two- sample t-test analysis on a subset of 3170 genes, as described in Storey and Tibshirani (2003).

Usage

HedenfalkHedenfalk

Format

x: A numeric vector of 3170 p-values of tests comparing BRCA1 to BRCA2.

References

Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M et al. (2001). Gene-Expression Profiles in Hereditary Breast Cancer. New England Journal of Medicine 344, 539–548.

Storey JD and Tibshirani R (2003). Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences, 100: 9440–9445.

Examples


hist(Hedenfalk$x)
hist(Hedenfalk$x)

Plot of a BBSGoF object.

Description

Plot of a BBSGoF object

Usage

## S3 method for class 'BBSGoF'
plot(x, ...)
## S3 method for class 'BBSGoF'
plot(x, ...)

Arguments

`x`	A BBSGoF object.
`...`	Other parameters to be passed through to plotting functions.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

de Uña Álvarez J (2012). The Beta-Binomial SGoF method for multiple dependent tests. Statistical Applications in Genetics and Molecular Biology, Vol. 11, Iss. 3, Article 14.

Examples


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-BBSGoF(p)
	
par(mfrow=c(2,2))
plot(res)   #Tarone test, within-block correlation, beta density (for automatic k),
	        #and decision plot (number of rejected nulls)

p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-BBSGoF(p)
	
par(mfrow=c(2,2))
plot(res)   #Tarone test, within-block correlation, beta density (for automatic k),
	        #and decision plot (number of rejected nulls)

Plot of a BH object

Description

Plot of the Adjusted p-values given by the BH method.

Usage

## S3 method for class 'BH'
plot(x, ...)
## S3 method for class 'BH'
plot(x, ...)

Arguments

`x`	A BH object.
`...`	Other parameters to be passed through to plotting functions.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Examples




res<-BH(Hedenfalk$x)
plot(res)  
res<-BH(Hedenfalk$x)
plot(res)

Plot of a Binomial.SGoF object

Description

Plot the Adjusted p-values given by the Binomial SGoF method.

Usage

## S3 method for class 'Binomial.SGoF'
plot(x, ...)
## S3 method for class 'Binomial.SGoF'
plot(x, ...)

Arguments

`x`	A Binomial.SGoF object.
`...`	Other parameters to be passed through to plotting functions.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Carvajal Rodríguez A, de Uña Álvarez J and Rolán Álvarez E (2009) A new multitest correction (Binomial.SGoF) that increases its statistical power when increasing the number of tests. BMC Bioinformatics 10:209.

Castro Conde I and de Uña Álvarez J (2015). Adjusted p-values for SGoF multiple test procedure. Biometrical Journal 57(1): 108-122. DOI: 10.1002/bimj.201300238

Examples


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-Binomial.SGoF(p)
plot(res)  
p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-Binomial.SGoF(p)
plot(res)

Plot of a BY object

Description

Plot of the Adjusted p-values given by the BY method.

Usage

## S3 method for class 'BY'
plot(x, ...)
## S3 method for class 'BY'
plot(x, ...)

Arguments

`x`	A BY object.
`...`	Other parameters to be passed through to plotting functions.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Benjamini Y and Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165–1188.

Examples




res<-BY(Hedenfalk$x)
plot(res)  
res<-BY(Hedenfalk$x)
plot(res)

Plot of a SGoF object

Description

Plot the Adjusted p-values given by the Conservative SGoF method.

Usage

## S3 method for class 'SGoF'
plot(x, ...)
## S3 method for class 'SGoF'
plot(x, ...)

Arguments

`x`	A SGoF object.
`...`	Other parameters to be passed through to plotting functions.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Castro Conde I and de Uña Álvarez J (2015). Adjusted p-values for SGoF multiple test procedure. Biometrical Journal; 57(1): 108-122. DOI:DOI:10.1002/bimj.201300238.

de Uña Álvarez J (2011). On the statistical properties of SGoF multitesting method. Statistical Applications in Genetics and Molecular Biology, Vol. 10, Iss. 1, Article 18.

Examples


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-SGoF(p)
plot(res)  
p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-SGoF(p)
plot(res)

Conservative SGoF multiple testing procedure

Description

Performs Conservative SGoF method (de Uña Álvarez, 2011) for multiple hypothesis testing.

Usage

SGoF(u, alpha = 0.05, gamma = 0.05)

SGoF(u, alpha = 0.05, gamma = 0.05)

Arguments

`u`	A (non-empty) numeric vector of p-values.
`alpha`	Numeric value. The significance level of the metatest.
`gamma`	Numeric value. The p-value threshold, so Conservative SGoF looks for significance in the amount of p-values below gamma.

Details

Conservative SGoF is an asymptotic version (large number of tests) of the Binomial SGoF procedure, where the binomial quantiles are approximated by the normal ones. Besides, the variance of the number of p-values below gamma is estimated without assuming that all the null hypotheses are true, which typically results in a more conservative decision (from this the method’s name). When the number of tests is large, Conservative SGoF and Binomial SGoF report approximately the same result. This method should no be used when the number of tests is small, because the binomial-normal approximation will perform poorly. Conservative SGoF method has the main properties of Binomial SGoF like weak control of the family-wise error rate (FWER) and the false discovery rate (FDR) at level alpha and a good balance between FDR and power, particularly when the number of tests is large, and the effect level is weak to moderate. See Binomial.SGoF for more details. Typically, the choice alpha=gamma will be used; this common value will be set as one of the usual significance levels (0.001, 0.01, 0.05, 0.1). Note however that alpha and gamma have different roles. The FDR is estimated by the simple method proposed by: Dalmasso, Broet, Moreau (2005), by taking n=1 in their formula. The adjusted p-value of a given p-value pi is defined as the smallest alpha0 for which the null hypothesis attached to pi is rejected by Conservative SGoF with alpha=gamma=alpha0. Actually, Conservative.SGoF function provides these adjusted p-values by restricting alpha0 to the set of original p-values (Castro Conde and de Uña Álvarez , 2015).

Value

A list containing the following values:

`Rejections`	The number of effects declared by SGoF.
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	The adjusted p-values.
`data`	The original p-values.
`alpha`	The specified significance level for the metatest.
`gamma`	The specified p-value threshold.
`call`	The matched call.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Castro Conde I and de Uña Álvarez J (2015). Adjusted p-values for SGoF multiple test procedure. Biometrical Journal; 57(1): 108-122. DOI:10.1002/bimj.201300238.

Dalmasso C, Broet P and Moreau T (2005). A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

de Uña-Álvarez J (2011). On the statistical properties of SGoF multitesting method. Statistical Applications in Genetics and Molecular Biology, Vol. 10, Iss. 1, Article 18.

Examples


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-SGoF(p)
summary(res)   #number of rejected nulls, estimated FDR
plot(res)   #adjusted p-values
p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-SGoF(p)
summary(res)   #number of rejected nulls, estimated FDR
plot(res)   #adjusted p-values

Summary of a Bayesian.SGoF object

Description

Summary of the most important results given by the Bayesian SGoF procedure.

Usage

## S3 method for class 'Bayesian.SGoF'
summary(object, ...)
## S3 method for class 'Bayesian.SGoF'
summary(object, ...)

Arguments

`object`	A Bayesian.SGoF object.
`...`	Additional arguments affecting the summary produced.

Value

`Rejections`	The number of effects declared by the Bayesian SGoF procedure.
`FDR`	The estimated false discovery rate.
`Posterior`	The posterior probability that the complete null hypothesis is true considering the non informative election.
`s`	The proportion of p-values falling below gamma.
`s.alpha`	The first integer from 0 to n such that the Bayesian pre-test rejects the complete null hypothesis.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Dalmasso C, Broet P and Moreau T (2005) A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

Examples




res<-Bayesian.SGoF(Hedenfalk$x)
summary(res)   


res<-Bayesian.SGoF(Hedenfalk$x)
summary(res)

Summary of a BBSGoF object

Description

Summary of the most important results given by the BBSGoF procedure.

Usage

## S3 method for class 'BBSGoF'
summary(object,...)
## S3 method for class 'BBSGoF'
summary(object,...)

Arguments

`object`	A BBSGoF object.
`...`	Additional arguments affecting the summary produced.

Value

`Rejections`	The number of effects declared by BB-SGoF with automatic k.
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	Table of adjusted p-values falling under gamma.
`Tarone.pvalue.auto`	The p-values of Tarone’s test for the automatic k.
`beta.parameters`	The estimated parameters of the Beta(a,b) model for the automatic k.
`betabinomial.parameters`	The estimated parameters of the Betabinomial(p,ro) model for the automatic k.
`sd.betabinomial.parameters`	The standard deviation of the estimated parameters of the Betabinomial(p,ro) model for the automatic k.
`automatic.blocks`	The automatic number of blocks.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Dalmasso C, Broet P and Moreau T (2005) A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

de Uña Álvarez J (2012). The Beta-Binomial SGoF method for multiple dependent tests. Statistical Applications in Genetics and Molecular Biology, Vol. 11, Iss. 3, Article 14.

Examples


p<-runif(387)^2  #387 p-values, intersection null violated

res<-BBSGoF(p)
summary(res)    #automatic number of blocks, number of rejected nulls, 
		#estimated FDR, beta and beta-binomial parameters,
		#Tarone test of no correlation 


p<-runif(387)^2  #387 p-values, intersection null violated

res<-BBSGoF(p)
summary(res)    #automatic number of blocks, number of rejected nulls, 
		#estimated FDR, beta and beta-binomial parameters,
		#Tarone test of no correlation

Summary of a BH object

Description

Summary of the most important results given by the BH procedure.

Usage

## S3 method for class 'BH'
summary(object, ...)
## S3 method for class 'BH'
summary(object, ...)

Arguments

`object`	A BH object.
`...`	Additional arguments affecting the summary produced.

Value

`Rejections`	The number of effects declared by the BH procedure.
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	Table of adjusted p-values falling under alpha.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Benjamini Y and Hochberg Y (1995): Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological) 57, 289–300.

Dalmasso C, Broet P and Moreau T (2005) A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

Examples



res<-BH(Hedenfalk$x)
summary(res) 
res<-BH(Hedenfalk$x)
summary(res)

Summary of a Binomial.SGoF object

Description

Summary of the most important results given by the Binomial SGoF procedure.

Usage

## S3 method for class 'Binomial.SGoF'
summary(object, ...)
## S3 method for class 'Binomial.SGoF'
summary(object, ...)

Arguments

`object`	A Binomial.SGoF object.
`...`	Additional arguments affecting the summary produced.

Value

`Rejections`	The number of effects declared by the Binomial SGoF procedure .
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	Table of adjusted p-values falling under gamma.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Carvajal Rodríguez A, de Uña Álvarez J and Rolán Álvarez E (2009). A new multitest correction (Binomial.SGoF) that increases its statistical power when increasing the number of tests. BMC Bioinformatics 10:209.

Castro Conde I and de Uña Álvarez J (2015). Adjusted p-values for SGoF multiple test procedure. Biometrical Journal; 57(1): 108-122. DOI: 10.1002/bimj.201300238

Dalmasso C, Broet P and Moreau T (2005) A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

Examples




p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-Binomial.SGoF(p)
summary(res)   #number of rejected nulls, estimated FDR


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-Binomial.SGoF(p)
summary(res)   #number of rejected nulls, estimated FDR

Summary of a BY object

Description

Summary of the most important results given by the BY procedure.

Usage

## S3 method for class 'BY'
summary(object, ...)
## S3 method for class 'BY'
summary(object, ...)

Arguments

`object`	A BY object.
`...`	Additional arguments affecting the summary produced.

Value

`Rejections`	The number of effects declared by the BY method.
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	Table of adjusted p-values falling under alpha.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Benjamini Y and Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165–1188.

Dalmasso C, Broet P and Moreau T (2005) A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

Examples



res<-BY(Hedenfalk$x)
summary(res) 
res<-BY(Hedenfalk$x)
summary(res)

Summary of a SGoF object

Description

Summary of the most important results given by the Conservative SGoF procedure.

Usage

## S3 method for class 'SGoF'
summary(object, ...)
## S3 method for class 'SGoF'
summary(object, ...)

Arguments

`object`	A SGoF object.
`...`	Additional arguments affecting the summary produced.

Value

`Rejections`	The number of effects declared by SGoF.
`FDR`	The estimated false discovery rate.
`Adjusted.pvalues`	Table of adjusted p-values falling under gamma.

Author(s)

Irene Castro Conde and Jacobo de Uña Álvarez

References

Castro Conde I and de Uña Álvarez J (2015). Adjusted p-values for SGoF multiple test procedure. Biometrical Journal; 57(1): 108-122. DOI: 10.1002/bimj.201300238

Dalmasso C, Broet P and Moreau T (2005) A simple procedure for estimating the false discovery rate. Bioinformatics 21:660–668

de Uña Álvarez J (2011). On the statistical properties of SGoF multitesting method. Statistical Applications in Genetics and Molecular Biology, Vol. 10, Iss. 1, Article 18.

Examples


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-SGoF(p)
summary(res)   #number of rejected nulls, estimated FDR


p<-runif(387)^2  #387 independent p-values, non-uniform intersection null violated

res<-SGoF(p)
summary(res)   #number of rejected nulls, estimated FDR

Package 'sgof'

Help Index

Multiple hypothesis testing

Description

Details

Author(s)

References

Bayesian SGoF multiple testing procedure

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

BBSGoF multiple testing procedure.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Benjamini-Hochberg (BH) multiple testing procedure

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Binomial SGoF multiple testing procedure

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Benjamini-Yekutieli (BY) multiple testing procedure

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Discrete SGoF multiple testing procedure

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Hedenfalk data

Description

Usage

Format

References

Examples

Plot of a BBSGoF object.

Description

Usage

Arguments

Author(s)

References

See Also