960 resultados para hypothesis tests
Resumo:
A satellite based observation system can continuously or repeatedly generate a user state vector time series that may contain useful information. One typical example is the collection of International GNSS Services (IGS) station daily and weekly combined solutions. Another example is the epoch-by-epoch kinematic position time series of a receiver derived by a GPS real time kinematic (RTK) technique. Although some multivariate analysis techniques have been adopted to assess the noise characteristics of multivariate state time series, statistic testings are limited to univariate time series. After review of frequently used hypotheses test statistics in univariate analysis of GNSS state time series, the paper presents a number of T-squared multivariate analysis statistics for use in the analysis of multivariate GNSS state time series. These T-squared test statistics have taken the correlation between coordinate components into account, which is neglected in univariate analysis. Numerical analysis was conducted with the multi-year time series of an IGS station to schematically demonstrate the results from the multivariate hypothesis testing in comparison with the univariate hypothesis testing results. The results have demonstrated that, in general, the testing for multivariate mean shifts and outliers tends to reject less data samples than the testing for univariate mean shifts and outliers under the same confidence level. It is noted that neither univariate nor multivariate data analysis methods are intended to replace physical analysis. Instead, these should be treated as complementary statistical methods for a prior or posteriori investigations. Physical analysis is necessary subsequently to refine and interpret the results.
Resumo:
Acknowledgments This work has been undertaken with the support of the A*MIDEX project (n ∘ ANR-11-IDEX-0001-02) funded by the “Investissements d’Avenir” French Government program, managed by the French National Research Agency (ANR). We are grateful to Julian Williams, Editor Badi H. Baltagi and an anonymous referee for helpful comments. We are responsible for any errors.
Resumo:
Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.
Resumo:
This research investigates how to obtain accurate and reliable positioning results with global navigation satellite systems (GNSS). The work provides a theoretical framework for reliability control in GNSS carrier phase ambiguity resolution, which is the key technique for precise GNSS positioning in centimetre levels. The proposed approach includes identification and exclusion procedures of unreliable solutions and hypothesis tests, allowing the reliability of solutions to be controlled in the aspects of mathematical models, integer estimation and ambiguity acceptance tests. Extensive experimental results with both simulation and observed data sets effectively demonstrate the reliability performance characteristics based on the proposed theoretical framework and procedures.
Resumo:
In this paper, we consider the problem of finding a spectrum hole of a specified bandwidth in a given wide band of interest. We propose a new, simple and easily implementable sub-Nyquist sampling scheme for signal acquisition and a spectrum hole search algorithm that exploits sparsity in the primary spectral occupancy in the frequency domain by testing a group of adjacent subbands in a single test. The sampling scheme deliberately introduces aliasing during signal acquisition, resulting in a signal that is the sum of signals from adjacent sub-bands. Energy-based hypothesis tests are used to provide an occupancy decision over the group of subbands, and this forms the basis of the proposed algorithm to find contiguous spectrum holes. We extend this framework to a multi-stage sensing algorithm that can be employed in a variety of spectrum sensing scenarios, including non-contiguous spectrum hole search. Further, we provide the analytical means to optimize the hypothesis tests with respect to the detection thresholds, number of samples and group size to minimize the detection delay under a given error rate constraint. Depending on the sparsity and SNR, the proposed algorithms can lead to significantly lower detection delays compared to a conventional bin-by-bin energy detection scheme; the latter is in fact a special case of the group test when the group size is set to 1. We validate our analytical results via Monte Carlo simulations.
Resumo:
This paper investigates the use of adaptive group testing to find a spectrum hole of a specified bandwidth in a given wideband of interest. We propose a group testing-based spectrum hole search algorithm that exploits sparsity in the primary spectral occupancy by testing a group of adjacent subbands in a single test. This is enabled by a simple and easily implementable sub-Nyquist sampling scheme for signal acquisition by the cognitive radios (CRs). The sampling scheme deliberately introduces aliasing during signal acquisition, resulting in a signal that is the sum of signals from adjacent subbands. Energy-based hypothesis tests are used to provide an occupancy decision over the group of subbands, and this forms the basis of the proposed algorithm to find contiguous spectrum holes of a specified bandwidth. We extend this framework to a multistage sensing algorithm that can be employed in a variety of spectrum sensing scenarios, including noncontiguous spectrum hole search. Furthermore, we provide the analytical means to optimize the group tests with respect to the detection thresholds, number of samples, group size, and number of stages to minimize the detection delay under a given error probability constraint. Our analysis allows one to identify the sparsity and SNR regimes where group testing can lead to significantly lower detection delays compared with a conventional bin-by-bin energy detection scheme; the latter is, in fact, a special case of the group test when the group size is set to 1 bin. We validate our analytical results via Monte Carlo simulations.
Resumo:
DNA microarray, or DNA chip, is a technology that allows us to obtain the expression level of many genes in a single experiment. The fact that numerical expression values can be easily obtained gives us the possibility to use multiple statistical techniques of data analysis. In this project microarray data is obtained from Gene Expression Omnibus, the repository of National Center for Biotechnology Information (NCBI). Then, the noise is removed and data is normalized, also we use hypothesis tests to find the most relevant genes that may be involved in a disease and use machine learning methods like KNN, Random Forest or Kmeans. For performing the analysis we use Bioconductor, packages in R for the analysis of biological data, and we conduct a case study in Alzheimer disease. The complete code can be found in https://github.com/alberto-poncelas/ bioc-alzheimer
Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data
Resumo:
We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/
Resumo:
The phylogenetic relationships among the Ergasilidae genera are poorly understood. In this study, 14 species from four genera in the Ergasilidae including Sinergasilus, Ergasilus, Pseudergasilus, and Paraergasilus were collected in China, and their phylogenetic relationships were examined using neighbor-joining, maximum parsimony, maximum likelihood, and Bayesian inference methods based on partial sequences of 18S and 28S ribosomal deoxyribonucleic acid, respectively. All the analyses suggest that the Sinergasilus and Paraergasilus are both monophyletic, but the Ergasilus is polyphyletic rather than monophyletic. Considering the relationships among the four genera, the phylogenetic analyses and subsequent hypothesis tests all suggest that Pseudergasilus clustered with some Ergasilus species may have a closer relationship with Sinergasilus rather than with Paraergasilus. It is proposed that the Sinergasilus and the Pseudergasilus species might have evolved from Ergasilus species.
Resumo:
One of the main pillars in the development of inclusive schools is the initial teacher training. Before determining if it is necessary to make changes (and of what type) in training programs or curriculum guides related to the attention to diversity and inclusive education, the attitudes of future education professionals in this area should be analyzed. This includes the identification of the relevant predictors of inclusive attitudes. The research reported in this article pursued this objective, doing so with a quantitative survey methodology based on the use of cross-sectional structured data collection and statistical analyses related to the quality of the attitude questionnaire (factor analysis and Cronbach's alpha), descriptive statistics, correlations, hypothesis tests for difference of means, and regression analysis in order to predict attitudes towards inclusion in education. Firstly, the results show that the participants held very positive attitudes toward the inclusion of students with special educational needs. Particularly, older respondents, those with a longer training and, to a lesser extent, women and those who had been in touch with disabled people stood out within this attitude. Secondly, it is evidenced that self-transcendence values and, more weakly, contact, function as robust predictors of attitudes of future practitioners towards the inclusion of students with special needs. Some applications for the initial professionalization of educators are suggested in the discussion.
Resumo:
A sample of 445 consumers resident in distinct Lisbon areas was analyzed through direct observations in order to discover each lifestyle’s current proportion, applying the Whitaker Lifestyle™ Method. The findings of the conducted hypothesis tests on the population proportion unveil that Neo-Traditional and Modern Whitaker lifestyles have the significantly highest proportion, while the overall presence of different lifestyles varies across neighborhoods. The research further demonstrates the validity of Whitaker observation techniques, media consumption differences among lifestyles and the importance of style and aesthetics while segmenting consumers by lifestyles. Finally, market opportunities are provided for firms operating in Lisbon.
Resumo:
Euclidean distance matrix analysis (EDMA) methods are used to distinguish whether or not significant difference exists between conformational samples of antibody complementarity determining region (CDR) loops, isolated LI loop and LI in three-loop assembly (LI, L3 and H3) obtained from Monte Carlo simulation. After the significant difference is detected, the specific inter-Ca distance which contributes to the difference is identified using EDMA.The estimated and improved mean forms of the conformational samples of isolated LI loop and LI loop in three-loop assembly, CDR loops of antibody binding site, are described using EDMA and distance geometry (DGEOM). To the best of our knowledge, it is the first time the EDMA methods are used to analyze conformational samples of molecules obtained from Monte Carlo simulations. Therefore, validations of the EDMA methods using both positive control and negative control tests for the conformational samples of isolated LI loop and LI in three-loop assembly must be done. The EDMA-I bootstrap null hypothesis tests showed false positive results for the comparison of six samples of the isolated LI loop and true positive results for comparison of conformational samples of isolated LI loop and LI in three-loop assembly. The bootstrap confidence interval tests revealed true negative results for comparisons of six samples of the isolated LI loop, and false negative results for the conformational comparisons between isolated LI loop and LI in three-loop assembly. Different conformational sample sizes are further explored by combining the samples of isolated LI loop to increase the sample size, or by clustering the sample using self-organizing map (SOM) to narrow the conformational distribution of the samples being comparedmolecular conformations. However, there is no improvement made for both bootstrap null hypothesis and confidence interval tests. These results show that more work is required before EDMA methods can be used reliably as a method for comparison of samples obtained by Monte Carlo simulations.
Resumo:
En este artículo se estudia la posibilidad de introducir seguros de desempleo en Colombia. En una primera parte, se propone una revisión de literatura de los seguros de desempleo en la cual se exponen las ventajas generadas por una cobertura contra este riesgo, así como sus inconvenientes. En una segunda parte, se estudian varios escenarios para introducir seguros de desempleo en Colombia. Después de haber presentado el contexto del mercado laboral y de las normas que lo vigilan, se proponen varios diseños que abordan la gestión y la administración del riesgo de desempleo en Colombia. Igualmente se presentan algunas consideraciones teóricas para la valoración del costo del aseguramiento, las cuales incorporan los efectos del riesgo moral sobre la duración y la incidencia del desempleo.
Resumo:
Many well-established statistical methods in genetics were developed in a climate of severe constraints on computational power. Recent advances in simulation methodology now bring modern, flexible statistical methods within the reach of scientists having access to a desktop workstation. We illustrate the potential advantages now available by considering the problem of assessing departures from Hardy-Weinberg (HW) equilibrium. Several hypothesis tests of HW have been established, as well as a variety of point estimation methods for the parameter which measures departures from HW under the inbreeding model. We propose a computational, Bayesian method for assessing departures from HW, which has a number of important advantages over existing approaches. The method incorporates the effects-of uncertainty about the nuisance parameters--the allele frequencies--as well as the boundary constraints on f (which are functions of the nuisance parameters). Results are naturally presented visually, exploiting the graphics capabilities of modern computer environments to allow straightforward interpretation. Perhaps most importantly, the method is founded on a flexible, likelihood-based modelling framework, which can incorporate the inbreeding model if appropriate, but also allows the assumptions of the model to he investigated and, if necessary, relaxed. Under appropriate conditions, information can be shared across loci and, possibly, across populations, leading to more precise estimation. The advantages of the method are illustrated by application both to simulated data and to data analysed by alternative methods in the recent literature.