965 resultados para statistical hypotheses
Resumo:
Background: A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data. Results: The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (l1-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the INSILICO1, INSILICO2 and INSILICO3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published Saccharomyces cerevisae cell cycle transcript profiling data sets capture known regulatory associations. In each S. cerevisiae LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification. Conclusion: A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational – experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.
Resumo:
The export of sediments from coastal catchments can have detrimental impacts on estuaries and near shore reef ecosystems such as the Great Barrier Reef. Catchment management approaches aimed at reducing sediment loads require monitoring to evaluate their effectiveness in reducing loads over time. However, load estimation is not a trivial task due to the complex behaviour of constituents in natural streams, the variability of water flows and often a limited amount of data. Regression is commonly used for load estimation and provides a fundamental tool for trend estimation by standardising the other time specific covariates such as flow. This study investigates whether load estimates and resultant power to detect trends can be enhanced by (i) modelling the error structure so that temporal correlation can be better quantified, (ii) making use of predictive variables, and (iii) by identifying an efficient and feasible sampling strategy that may be used to reduce sampling error. To achieve this, we propose a new regression model that includes an innovative compounding errors model structure and uses two additional predictive variables (average discounted flow and turbidity). By combining this modelling approach with a new, regularly optimised, sampling strategy, which adds uniformity to the event sampling strategy, the predictive power was increased to 90%. Using the enhanced regression model proposed here, it was possible to detect a trend of 20% over 20 years. This result is in stark contrast to previous conclusions presented in the literature. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
We consider the development of statistical models for prediction of constituent concentration of riverine pollutants, which is a key step in load estimation from frequent flow rate data and less frequently collected concentration data. We consider how to capture the impacts of past flow patterns via the average discounted flow (ADF) which discounts the past flux based on the time lapsed - more recent fluxes are given more weight. However, the effectiveness of ADF depends critically on the choice of the discount factor which reflects the unknown environmental cumulating process of the concentration compounds. We propose to choose the discount factor by maximizing the adjusted R-2 values or the Nash-Sutcliffe model efficiency coefficient. The R2 values are also adjusted to take account of the number of parameters in the model fit. The resulting optimal discount factor can be interpreted as a measure of constituent exhaustion rate during flood events. To evaluate the performance of the proposed regression estimators, we examine two different sampling scenarios by resampling fortnightly and opportunistically from two real daily datasets, which come from two United States Geological Survey (USGS) gaging stations located in Des Plaines River and Illinois River basin. The generalized rating-curve approach produces biased estimates of the total sediment loads by -30% to 83%, whereas the new approaches produce relatively much lower biases, ranging from -24% to 35%. This substantial improvement in the estimates of the total load is due to the fact that predictability of concentration is greatly improved by the additional predictors.
Resumo:
Ordinal qualitative data are often collected for phenotypical measurements in plant pathology and other biological sciences. Statistical methods, such as t tests or analysis of variance, are usually used to analyze ordinal data when comparing two groups or multiple groups. However, the underlying assumptions such as normality and homogeneous variances are often violated for qualitative data. To this end, we investigated an alternative methodology, rank regression, for analyzing the ordinal data. The rank-based methods are essentially based on pairwise comparisons and, therefore, can deal with qualitative data naturally. They require neither normality assumption nor data transformation. Apart from robustness against outliers and high efficiency, the rank regression can also incorporate covariate effects in the same way as the ordinary regression. By reanalyzing a data set from a wheat Fusarium crown rot study, we illustrated the use of the rank regression methodology and demonstrated that the rank regression models appear to be more appropriate and sensible for analyzing nonnormal data and data with outliers.
Resumo:
Power calculation and sample size determination are critical in designing environmental monitoring programs. The traditional approach based on comparing the mean values may become statistically inappropriate and even invalid when substantial proportions of the response values are below the detection limits or censored because strong distributional assumptions have to be made on the censored observations when implementing the traditional procedures. In this paper, we propose a quantile methodology that is robust to outliers and can also handle data with a substantial proportion of below-detection-limit observations without the need of imputing the censored values. As a demonstration, we applied the methods to a nutrient monitoring project, which is a part of the Perth Long-Term Ocean Outlet Monitoring Program. In this example, the sample size required by our quantile methodology is, in fact, smaller than that by the traditional t-test, illustrating the merit of our method.
Resumo:
The charge at which adsorption of orgamc compounds attains a maximum ( \sigma MAX M) at an electrochenucal interface is analysed using several multi-state models in a hierarchical manner The analysis is based on statistical mechamcal results for the following models (A) two-state site parity, (B) two-state muhl-slte, and (C) three-state site parity The coulombic interactions due to permanent and reduced dipole effects (using mean field approximation), electrostatic field effects and specific substrate interactions have been taken into account. The simplest model in the hierarchy (two-state site parity) yields the exphcit dependence of ( \sigma MAX M) on the permanent dipole moment, polarizability of the solvent and the adsorbate, lattice spacing, effective coordination number, etc Other models in the baerarchy bring to hght the influence of the solvent structure and the role of substrate interactions, etc As a result of this approach, the "composition" of oM.x m terms of the fundamental molecular constants becomes clear. With a view to use these molecular results to maxamum advantage, the derived results for ( \sigma MAX M) have been converted into those involving experimentally observable parameters lake Co, C 1, E N, etc Wherever possible, some of the earlier phenomenologlcal relations reported for ( \sigma MAX M), notably by Parsons, Damaskm and Frumkln, and Trasattl, are shown to have a certain molecular basis, vlz a simple two-state sate panty model.As a corollary to the hxerarcbacal modelling, \sigma MAX M and the potential corresponding to at (Emax) are shown to be constants independent of 0max or Corg for all models The lmphcatlon of our analysis f o r OmMa x with respect to that predicted by the generalized surface layer equation (which postulates Om~ and Ema x varlaUon with 0) is discussed in detail Finally we discuss an passing o M. and the electrosorptlon valency an this context.
Resumo:
A pressed-plate Fe electrode for alkalines storage batteries, designed using a statistical method (fractional factorial technique), is described. Parameters such as the configuration of the base grid, electrode compaction temperature and pressure, binder composition, mixing time, etc. have been optimised using this method. The optimised electrodes have a capacity of 300 plus /minus 5 mA h/g of active material (mixture of Fe and magnetite) at 7 h rate to a cut-off voltage of 8.86V vs. Hg/HgO, OH exp 17 ref.
Resumo:
In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions be- come similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that prob- ability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this man- ifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspace-based method for unsupervised domain adaptation, and show that both these manifold-based techniques outperform the cor- responding approaches based on the MMD. Furthermore, we show that our subspace-based approach yields state-of- the-art results on a standard object recognition benchmark.
Resumo:
This paper presents a Multi-Hypotheses Tracking (MHT) approach that allows solving ambiguities that arise with previous methods of associating targets and tracks within a highly volatile vehicular environment. The previous approach based on the Dempster–Shafer Theory assumes that associations between tracks and targets are unique; this was shown to allow the formation of ghost tracks when there was too much ambiguity or conflict for the system to take a meaningful decision. The MHT algorithm described in this paper removes this uniqueness condition, allowing the system to include ambiguity and even to prevent making any decision if available data are poor. We provide a general introduction to the Dempster–Shafer Theory and present the previously used approach. Then, we explain our MHT mechanism and provide evidence of its increased performance in reducing the amount of ghost tracks and false positive processed by the tracking system.
Resumo:
To facilitate marketing and export, the Australian macadamia industry requires accurate crop forecasts. Each year, two levels of crop predictions are produced for this industry. The first is an overall longer-term forecast based on tree census data of growers in the Australian Macadamia Society (AMS). This data set currently accounts for around 70% of total production, and is supplemented by our best estimates of non-AMS orchards. Given these total tree numbers, average yields per tree are needed to complete the long-term forecasts. Yields from regional variety trials were initially used, but were found to be consistently higher than the average yields that growers were obtaining. Hence, a statistical model was developed using growers' historical yields, also taken from the AMS database. This model accounted for the effects of tree age, variety, year, region and tree spacing, and explained 65% of the total variation in the yield per tree data. The second level of crop prediction is an annual climate adjustment of these overall long-term estimates, taking into account the expected effects on production of the previous year's climate. This adjustment is based on relative historical yields, measured as the percentage deviance between expected and actual production. The dominant climatic variables are observed temperature, evaporation, solar radiation and modelled water stress. Initially, a number of alternate statistical models showed good agreement within the historical data, with jack-knife cross-validation R2 values of 96% or better. However, forecasts varied quite widely between these alternate models. Exploratory multivariate analyses and nearest-neighbour methods were used to investigate these differences. For 2001-2003, the overall forecasts were in the right direction (when compared with the long-term expected values), but were over-estimates. In 2004 the forecast was well under the observed production, and in 2005 the revised models produced a forecast within 5.1% of the actual production. Over the first five years of forecasting, the absolute deviance for the climate-adjustment models averaged 10.1%, just outside the targeted objective of 10%.
Resumo:
The recently introduced generalized pencil of Sudarshan which gives an exact ray picture of wave optics is analysed in some situations of interest to wave optics. A relationship between ray dispersion and statistical inhomogeneity of the field is obtained. A paraxial approximation which preserves the rectilinear propagation character of the generalized pencils is presented. Under this approximation the pencils can be computed directly from the field conditions on a plane, without the necessity to compute the cross-spectral density function in the entire space as an intermediate quantity. The paraxial results are illustrated with examples. The pencils are shown to exhibit an interesting scaling behaviour in the far-zone. This scaling leads to a natural generalization of the Fraunhofer range criterion and of the classical van Cittert-Zernike theorem to planar sources of arbitrary state of coherence. The recently derived results of radiometry with partially coherent sources are shown to be simple consequences of this scaling.
Resumo:
This academic work begins with a compact presentation of the general background to the study, which also includes an autobiography for the interest in this research. The presentation provides readers who know little of the topic of this research and of the structure of the educational system as well as of the value given to education in Nigeria. It further concentrates on the dynamic interplay of the effect of academic and professional qualification and teachers' job effectiveness in secondary schools in Nigeria in particular, and in Africa in general. The aim of this study is to produce a systematic analysis and rich theoretical and empirical description of teachers' teaching competencies. The theoretical part comprises a comprehensive literature review that focuses on research conducted in the areas of academic and professional qualification and teachers' job effectiveness, teaching competencies, and the role of teacher education with particular emphasis on school effectiveness and improvement. This research benefits greatly from the functionalist conception of education, which is built upon two emphases: the application of the scientific method to the objective social world, and the use of an analogy between the individual 'organism' and 'society'. To this end, it offers us an opportunity to define terms systematically and to view problems as always being interrelated with other components of society. The empirical part involves describing and interpreting what educational objectives can be achieved with the help of teachers' teaching competencies in close connection to educational planning, teacher training and development, and achieving them without waste. The data used in this study were collected between 2002 and 2003 from teachers, principals, supervisors of education from the Ministry of Education and Post Primary Schools Board in the Rivers State of Nigeria (N=300). The data were collected from interviews, documents, observation, and questionnaires and were analyzed using both qualitative and quantitative methods to strengthen the validity of the findings. The data collected were analyzed to answer the specific research questions and hypotheses posited in this study. The data analysis involved the use of multiple statistical procedures: Percentages Mean Point Value, T-test of Significance, One-Way Analysis of Variance (ANOVA), and Cross Tabulation. The results obtained from the data analysis show that teachers require professional knowledge and professional teaching skills, as well as a broad base of general knowledge (e.g., morality, service, cultural capital, institutional survey). Above all, in order to carry out instructional processes effectively, teachers should be both academically and professionally trained. This study revealed that teachers are not however expected to have an extraordinary memory, but rather looked upon as persons capable of thinking in the right direction. This study may provide a solution to the problem of teacher education and school effectiveness in Nigeria. For this reason, I offer this treatise to anyone seriously committed in improving schools in developing countries in general and in Nigeria in particular to improve the lives of all its citizens. In particular, I write this to encourage educational planners, education policy makers, curriculum developers, principals, teachers, and students of education interested in empirical information and methods to conceptualize the issue this study has raised and to provide them with useful suggestions to help them improve secondary schooling in Nigeria. Though, multiple audiences exist for any text. For this reason, I trust that the academic community will find this piece of work a useful addition to the existing literature on school effectiveness and school improvement. Through integrating concepts from a number of disciplines, I aim to describe as holistic a representation as space could allow of the components of school effectiveness and quality improvement. A new perspective on teachers' professional competencies, which not only take into consideration the unique characteristics of the variables used in this study, but also recommend their environmental and cultural derivation. In addition, researchers should focus their attention on the ways in which both professional and non-professional teachers construct and apply their methodological competencies, such as their grouping procedures and behaviors to the schooling of students. Keywords: Professional Training, Academic Training, Professionally Qualified, Academically Qualified, Professional Qualification, Academic Qualification, Job Effectiveness, Job Efficiency, Educational Planning, Teacher Training and Development, Nigeria.
Resumo:
In genetic epidemiology, population-based disease registries are commonly used to collect genotype or other risk factor information concerning affected subjects and their relatives. This work presents two new approaches for the statistical inference of ascertained data: a conditional and full likelihood approaches for the disease with variable age at onset phenotype using familial data obtained from population-based registry of incident cases. The aim is to obtain statistically reliable estimates of the general population parameters. The statistical analysis of familial data with variable age at onset becomes more complicated when some of the study subjects are non-susceptible, that is to say these subjects never get the disease. A statistical model for a variable age at onset with long-term survivors is proposed for studies of familial aggregation, using latent variable approach, as well as for prospective studies of genetic association studies with candidate genes. In addition, we explore the possibility of a genetic explanation of the observed increase in the incidence of Type 1 diabetes (T1D) in Finland in recent decades and the hypothesis of non-Mendelian transmission of T1D associated genes. Both classical and Bayesian statistical inference were used in the modelling and estimation. Despite the fact that this work contains five studies with different statistical models, they all concern data obtained from nationwide registries of T1D and genetics of T1D. In the analyses of T1D data, non-Mendelian transmission of T1D susceptibility alleles was not observed. In addition, non-Mendelian transmission of T1D susceptibility genes did not make a plausible explanation for the increase in T1D incidence in Finland. Instead, the Human Leucocyte Antigen associations with T1D were confirmed in the population-based analysis, which combines T1D registry information, reference sample of healthy subjects and birth cohort information of the Finnish population. Finally, a substantial familial variation in the susceptibility of T1D nephropathy was observed. The presented studies show the benefits of sophisticated statistical modelling to explore risk factors for complex diseases.
Resumo:
Crime analysts have traditionally received little guidance from academic researchers in key tasks in the analysis process, specifically the testing of multiple hypotheses and evaluating evidence in a scientific fashion. This article attempts to fill this gap by outlining a method (the Analysis of Competing Hypotheses) of systematically analysing multiple explanations for crime problems. The method is systematic, avoids many cognitive errors common in analysis, and is explicit. It is argued that the implementation of this approach makes analytic products audit-able, the reasoning underpinning them transparent, and provides intelligence managers a rational professional development tool for individual analysts.