945 resultados para joint hypothesis tests
Resumo:
Acknowledgments This work has been undertaken with the support of the A*MIDEX project (n ∘ ANR-11-IDEX-0001-02) funded by the “Investissements d’Avenir” French Government program, managed by the French National Research Agency (ANR). We are grateful to Julian Williams, Editor Badi H. Baltagi and an anonymous referee for helpful comments. We are responsible for any errors.
Resumo:
A satellite based observation system can continuously or repeatedly generate a user state vector time series that may contain useful information. One typical example is the collection of International GNSS Services (IGS) station daily and weekly combined solutions. Another example is the epoch-by-epoch kinematic position time series of a receiver derived by a GPS real time kinematic (RTK) technique. Although some multivariate analysis techniques have been adopted to assess the noise characteristics of multivariate state time series, statistic testings are limited to univariate time series. After review of frequently used hypotheses test statistics in univariate analysis of GNSS state time series, the paper presents a number of T-squared multivariate analysis statistics for use in the analysis of multivariate GNSS state time series. These T-squared test statistics have taken the correlation between coordinate components into account, which is neglected in univariate analysis. Numerical analysis was conducted with the multi-year time series of an IGS station to schematically demonstrate the results from the multivariate hypothesis testing in comparison with the univariate hypothesis testing results. The results have demonstrated that, in general, the testing for multivariate mean shifts and outliers tends to reject less data samples than the testing for univariate mean shifts and outliers under the same confidence level. It is noted that neither univariate nor multivariate data analysis methods are intended to replace physical analysis. Instead, these should be treated as complementary statistical methods for a prior or posteriori investigations. Physical analysis is necessary subsequently to refine and interpret the results.
Resumo:
Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.
Resumo:
In longitudinal data analysis, our primary interest is in the regression parameters for the marginal expectations of the longitudinal responses; the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly for correlated discrete outcome data. Marginal modeling approaches such as generalized estimating equations (GEEs) have received much attention in the context of longitudinal regression. These methods are based on the estimates of the first two moments of the data and the working correlation structure. The confidence regions and hypothesis tests are based on the asymptotic normality. The methods are sensitive to misspecification of the variance function and the working correlation structure. Because of such misspecifications, the estimates can be inefficient and inconsistent, and inference may give incorrect results. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its characteristics and asymptotic properties. We also provide an algorithm based on EL principles for the estimation of the regression parameters and the construction of a confidence region for the parameter of interest. We extend our approach to variable selection for highdimensional longitudinal data with many covariates. In this situation it is necessary to identify a submodel that adequately represents the data. Including redundant variables may impact the model’s accuracy and efficiency for inference. We propose a penalized empirical likelihood (PEL) variable selection based on GEEs; the variable selection and the estimation of the coefficients are carried out simultaneously. We discuss its characteristics and asymptotic properties, and present an algorithm for optimizing PEL. Simulation studies show that when the model assumptions are correct, our method performs as well as existing methods, and when the model is misspecified, it has clear advantages. We have applied the method to two case examples.
Resumo:
This research investigates how to obtain accurate and reliable positioning results with global navigation satellite systems (GNSS). The work provides a theoretical framework for reliability control in GNSS carrier phase ambiguity resolution, which is the key technique for precise GNSS positioning in centimetre levels. The proposed approach includes identification and exclusion procedures of unreliable solutions and hypothesis tests, allowing the reliability of solutions to be controlled in the aspects of mathematical models, integer estimation and ambiguity acceptance tests. Extensive experimental results with both simulation and observed data sets effectively demonstrate the reliability performance characteristics based on the proposed theoretical framework and procedures.
Resumo:
In this paper, we consider the problem of finding a spectrum hole of a specified bandwidth in a given wide band of interest. We propose a new, simple and easily implementable sub-Nyquist sampling scheme for signal acquisition and a spectrum hole search algorithm that exploits sparsity in the primary spectral occupancy in the frequency domain by testing a group of adjacent subbands in a single test. The sampling scheme deliberately introduces aliasing during signal acquisition, resulting in a signal that is the sum of signals from adjacent sub-bands. Energy-based hypothesis tests are used to provide an occupancy decision over the group of subbands, and this forms the basis of the proposed algorithm to find contiguous spectrum holes. We extend this framework to a multi-stage sensing algorithm that can be employed in a variety of spectrum sensing scenarios, including non-contiguous spectrum hole search. Further, we provide the analytical means to optimize the hypothesis tests with respect to the detection thresholds, number of samples and group size to minimize the detection delay under a given error rate constraint. Depending on the sparsity and SNR, the proposed algorithms can lead to significantly lower detection delays compared to a conventional bin-by-bin energy detection scheme; the latter is in fact a special case of the group test when the group size is set to 1. We validate our analytical results via Monte Carlo simulations.
Resumo:
This paper investigates the use of adaptive group testing to find a spectrum hole of a specified bandwidth in a given wideband of interest. We propose a group testing-based spectrum hole search algorithm that exploits sparsity in the primary spectral occupancy by testing a group of adjacent subbands in a single test. This is enabled by a simple and easily implementable sub-Nyquist sampling scheme for signal acquisition by the cognitive radios (CRs). The sampling scheme deliberately introduces aliasing during signal acquisition, resulting in a signal that is the sum of signals from adjacent subbands. Energy-based hypothesis tests are used to provide an occupancy decision over the group of subbands, and this forms the basis of the proposed algorithm to find contiguous spectrum holes of a specified bandwidth. We extend this framework to a multistage sensing algorithm that can be employed in a variety of spectrum sensing scenarios, including noncontiguous spectrum hole search. Furthermore, we provide the analytical means to optimize the group tests with respect to the detection thresholds, number of samples, group size, and number of stages to minimize the detection delay under a given error probability constraint. Our analysis allows one to identify the sparsity and SNR regimes where group testing can lead to significantly lower detection delays compared with a conventional bin-by-bin energy detection scheme; the latter is, in fact, a special case of the group test when the group size is set to 1 bin. We validate our analytical results via Monte Carlo simulations.
Resumo:
Command and control regulation programs, particularly input constraints, typically fail to achieve stated objectives, because fishermen may substitute unregulated for regulated inputs. It is, thus, essential to have an understanding of the internal structure of production technology. A primal formulation is used to estimate a translog production function at the vessels level that includes fishing effort and fisherman’s skill. The flexibility of the selected functional permits the analysis of the substitution possibilities among inputs by estimating the elasticity of substitution with no prior constraints. Particular attention is paid to the empirical validation of fishing effort as an aggregate input, which implies either, the acceptation of the joint hypothesis that inputs making up effort are weakly separable from the inputs out of the subgroup or considering that effort is an intermediate input produced by a non-separable two stage technology. Cross sectional data from the Spanish purse seine fleet operating in the VIII Division European anchovy fishery provide evidence of limited input substitution possibilities among the inputs making up the empirically validated fishing effort translog micro-production function.
Resumo:
DNA microarray, or DNA chip, is a technology that allows us to obtain the expression level of many genes in a single experiment. The fact that numerical expression values can be easily obtained gives us the possibility to use multiple statistical techniques of data analysis. In this project microarray data is obtained from Gene Expression Omnibus, the repository of National Center for Biotechnology Information (NCBI). Then, the noise is removed and data is normalized, also we use hypothesis tests to find the most relevant genes that may be involved in a disease and use machine learning methods like KNN, Random Forest or Kmeans. For performing the analysis we use Bioconductor, packages in R for the analysis of biological data, and we conduct a case study in Alzheimer disease. The complete code can be found in https://github.com/alberto-poncelas/ bioc-alzheimer
Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data
Resumo:
We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/
Resumo:
The phylogenetic relationships among the Ergasilidae genera are poorly understood. In this study, 14 species from four genera in the Ergasilidae including Sinergasilus, Ergasilus, Pseudergasilus, and Paraergasilus were collected in China, and their phylogenetic relationships were examined using neighbor-joining, maximum parsimony, maximum likelihood, and Bayesian inference methods based on partial sequences of 18S and 28S ribosomal deoxyribonucleic acid, respectively. All the analyses suggest that the Sinergasilus and Paraergasilus are both monophyletic, but the Ergasilus is polyphyletic rather than monophyletic. Considering the relationships among the four genera, the phylogenetic analyses and subsequent hypothesis tests all suggest that Pseudergasilus clustered with some Ergasilus species may have a closer relationship with Sinergasilus rather than with Paraergasilus. It is proposed that the Sinergasilus and the Pseudergasilus species might have evolved from Ergasilus species.
Resumo:
One of the main pillars in the development of inclusive schools is the initial teacher training. Before determining if it is necessary to make changes (and of what type) in training programs or curriculum guides related to the attention to diversity and inclusive education, the attitudes of future education professionals in this area should be analyzed. This includes the identification of the relevant predictors of inclusive attitudes. The research reported in this article pursued this objective, doing so with a quantitative survey methodology based on the use of cross-sectional structured data collection and statistical analyses related to the quality of the attitude questionnaire (factor analysis and Cronbach's alpha), descriptive statistics, correlations, hypothesis tests for difference of means, and regression analysis in order to predict attitudes towards inclusion in education. Firstly, the results show that the participants held very positive attitudes toward the inclusion of students with special educational needs. Particularly, older respondents, those with a longer training and, to a lesser extent, women and those who had been in touch with disabled people stood out within this attitude. Secondly, it is evidenced that self-transcendence values and, more weakly, contact, function as robust predictors of attitudes of future practitioners towards the inclusion of students with special needs. Some applications for the initial professionalization of educators are suggested in the discussion.
Resumo:
A sample of 445 consumers resident in distinct Lisbon areas was analyzed through direct observations in order to discover each lifestyle’s current proportion, applying the Whitaker Lifestyle™ Method. The findings of the conducted hypothesis tests on the population proportion unveil that Neo-Traditional and Modern Whitaker lifestyles have the significantly highest proportion, while the overall presence of different lifestyles varies across neighborhoods. The research further demonstrates the validity of Whitaker observation techniques, media consumption differences among lifestyles and the importance of style and aesthetics while segmenting consumers by lifestyles. Finally, market opportunities are provided for firms operating in Lisbon.
Resumo:
Euclidean distance matrix analysis (EDMA) methods are used to distinguish whether or not significant difference exists between conformational samples of antibody complementarity determining region (CDR) loops, isolated LI loop and LI in three-loop assembly (LI, L3 and H3) obtained from Monte Carlo simulation. After the significant difference is detected, the specific inter-Ca distance which contributes to the difference is identified using EDMA.The estimated and improved mean forms of the conformational samples of isolated LI loop and LI loop in three-loop assembly, CDR loops of antibody binding site, are described using EDMA and distance geometry (DGEOM). To the best of our knowledge, it is the first time the EDMA methods are used to analyze conformational samples of molecules obtained from Monte Carlo simulations. Therefore, validations of the EDMA methods using both positive control and negative control tests for the conformational samples of isolated LI loop and LI in three-loop assembly must be done. The EDMA-I bootstrap null hypothesis tests showed false positive results for the comparison of six samples of the isolated LI loop and true positive results for comparison of conformational samples of isolated LI loop and LI in three-loop assembly. The bootstrap confidence interval tests revealed true negative results for comparisons of six samples of the isolated LI loop, and false negative results for the conformational comparisons between isolated LI loop and LI in three-loop assembly. Different conformational sample sizes are further explored by combining the samples of isolated LI loop to increase the sample size, or by clustering the sample using self-organizing map (SOM) to narrow the conformational distribution of the samples being comparedmolecular conformations. However, there is no improvement made for both bootstrap null hypothesis and confidence interval tests. These results show that more work is required before EDMA methods can be used reliably as a method for comparison of samples obtained by Monte Carlo simulations.