952 resultados para Multivariate statistical method
Resumo:
Researchers in ecology commonly use multivariate analyses (e.g. redundancy analysis, canonical correspondence analysis, Mantel correlation, multivariate analysis of variance) to interpret patterns in biological data and relate these patterns to environmental predictors. There has been, however, little recognition of the errors associated with biological data and the influence that these may have on predictions derived from ecological hypotheses. We present a permutational method that assesses the effects of taxonomic uncertainty on the multivariate analyses typically used in the analysis of ecological data. The procedure is based on iterative randomizations that randomly re-assign non identified species in each site to any of the other species found in the remaining sites. After each re-assignment of species identities, the multivariate method at stake is run and a parameter of interest is calculated. Consequently, one can estimate a range of plausible values for the parameter of interest under different scenarios of re-assigned species identities. We demonstrate the use of our approach in the calculation of two parameters with an example involving tropical tree species from western Amazonia: 1) the Mantel correlation between compositional similarity and environmental distances between pairs of sites, and; 2) the variance explained by environmental predictors in redundancy analysis (RDA). We also investigated the effects of increasing taxonomic uncertainty (i.e. number of unidentified species), and the taxonomic resolution at which morphospecies are determined (genus-resolution, family-resolution, or fully undetermined species) on the uncertainty range of these parameters. To achieve this, we performed simulations on a tree dataset from southern Mexico by randomly selecting a portion of the species contained in the dataset and classifying them as unidentified at each level of decreasing taxonomic resolution. An analysis of covariance showed that both taxonomic uncertainty and resolution significantly influence the uncertainty range of the resulting parameters. Increasing taxonomic uncertainty expands our uncertainty of the parameters estimated both in the Mantel test and RDA. The effects of increasing taxonomic resolution, however, are not as evident. The method presented in this study improves the traditional approaches to study compositional change in ecological communities by accounting for some of the uncertainty inherent to biological data. We hope that this approach can be routinely used to estimate any parameter of interest obtained from compositional data tables when faced with taxonomic uncertainty.
Resumo:
Columbia university contributions to philosophy and psychology, vol. XVI, no. 3.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
A procedure for calculating critical level and power of likelihood ratio test, based on a Monte-Carlo simulation method is proposed. General principles of software building for its realization are given. Some examples of its application are shown.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Reliability has emerged as a critical design constraint especially in memories. Designers are going to great lengths to guarantee fault free operation of the underlying silicon by adopting redundancy-based techniques, which essentially try to detect and correct every single error. However, such techniques come at a cost of large area, power and performance overheads which making many researchers to doubt their efficiency especially for error resilient systems where 100% accuracy is not always required. In this paper, we present an alternative method focusing on the confinement of the resulting output error induced by any reliability issues. By focusing on memory faults, rather than correcting every single error the proposed method exploits the statistical characteristics of any target application and replaces any erroneous data with the best available estimate of that data. To realize the proposed method a RISC processor is augmented with custom instructions and special-purpose functional units. We apply the method on the proposed enhanced processor by studying the statistical characteristics of the various algorithms involved in a popular multimedia application. Our experimental results show that in contrast to state-of-the-art fault tolerance approaches, we are able to reduce runtime and area overhead by 71.3% and 83.3% respectively.
Resumo:
This dissertation proposes statistical methods to formulate, estimate and apply complex transportation models. Two main problems are part of the analyses conducted and presented in this dissertation. The first method solves an econometric problem and is concerned with the joint estimation of models that contain both discrete and continuous decision variables. The use of ordered models along with a regression is proposed and their effectiveness is evaluated with respect to unordered models. Procedure to calculate and optimize the log-likelihood functions of both discrete-continuous approaches are derived, and difficulties associated with the estimation of unordered models explained. Numerical approximation methods based on the Genz algortithm are implemented in order to solve the multidimensional integral associated with the unordered modeling structure. The problems deriving from the lack of smoothness of the probit model around the maximum of the log-likelihood function, which makes the optimization and the calculation of standard deviations very difficult, are carefully analyzed. A methodology to perform out-of-sample validation in the context of a joint model is proposed. Comprehensive numerical experiments have been conducted on both simulated and real data. In particular, the discrete-continuous models are estimated and applied to vehicle ownership and use models on data extracted from the 2009 National Household Travel Survey. The second part of this work offers a comprehensive statistical analysis of free-flow speed distribution; the method is applied to data collected on a sample of roads in Italy. A linear mixed model that includes speed quantiles in its predictors is estimated. Results show that there is no road effect in the analysis of free-flow speeds, which is particularly important for model transferability. A very general framework to predict random effects with few observations and incomplete access to model covariates is formulated and applied to predict the distribution of free-flow speed quantiles. The speed distribution of most road sections is successfully predicted; jack-knife estimates are calculated and used to explain why some sections are poorly predicted. Eventually, this work contributes to the literature in transportation modeling by proposing econometric model formulations for discrete-continuous variables, more efficient methods for the calculation of multivariate normal probabilities, and random effects models for free-flow speed estimation that takes into account the survey design. All methods are rigorously validated on both real and simulated data.
Resumo:
The present Dissertation shows how recent statistical analysis tools and open datasets can be exploited to improve modelling accuracy in two distinct yet interconnected domains of flood hazard (FH) assessment. In the first Part, unsupervised artificial neural networks are employed as regional models for sub-daily rainfall extremes. The models aim to learn a robust relation to estimate locally the parameters of Gumbel distributions of extreme rainfall depths for any sub-daily duration (1-24h). The predictions depend on twenty morphoclimatic descriptors. A large study area in north-central Italy is adopted, where 2238 annual maximum series are available. Validation is performed over an independent set of 100 gauges. Our results show that multivariate ANNs may remarkably improve the estimation of percentiles relative to the benchmark approach from the literature, where Gumbel parameters depend on mean annual precipitation. Finally, we show that the very nature of the proposed ANN models makes them suitable for interpolating predicted sub-daily rainfall quantiles across space and time-aggregation intervals. In the second Part, decision trees are used to combine a selected blend of input geomorphic descriptors for predicting FH. Relative to existing DEM-based approaches, this method is innovative, as it relies on the combination of three characteristics: (1) simple multivariate models, (2) a set of exclusively DEM-based descriptors as input, and (3) an existing FH map as reference information. First, the methods are applied to northern Italy, represented with the MERIT DEM (∼90m resolution), and second, to the whole of Italy, represented with the EU-DEM (25m resolution). The results show that multivariate approaches may (a) significantly enhance flood-prone areas delineation relative to a selected univariate one, (b) provide accurate predictions of expected inundation depths, (c) produce encouraging results in extrapolation, (d) complete the information of imperfect reference maps, and (e) conveniently convert binary maps into continuous representation of FH.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
A flow injection method for the quantitative analysis of ketoconazole in tablets, based on the reaction with iron (III) ions, is presented. Ketoconazole forms a red complex with iron ions in an acid medium, with maximum absorbance at 495 nm. The detection limit was estimated to be 1×10--4 mol L-1; the quantitation limit is about 3×10--4 mol L-1 and approximately 30 determinations can be performed in an hour. The results were compared with those obtained with a reference HPLC method. Statistical comparisons were done using the Student's t procedure and the F test. Complete agreement was found at the 0.95 significance level between the proposed flow injection and the HPLC procedures. The two methods present similar precision, i.e., for HPLC the mean relative standard deviation was ca. 1.2% and for FIA ca. 1.6%.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
An investigation was carried out to study the potential use of the angular distribution of scattered photons by human breast samples for a rapid identification of neoplasias of breast tissues. This technique has possible applications as diagnostic aid for breast cancer. In this work, a commercial powder diffractometer was used to obtain the scattering profiles from breast tissues histopathologically classified as normal breast tissues, fibroadenomas (benign breast diseases) and carcinomas (malignant breast diseases), in the interval 0.02 angstrom(-1) < x < 0.62 angstrom(-1). The experimental methods and data corrections are discussed in detail, and they included background subtraction, polarization, self-attenuation and geometric effects. The validation of the experimental procedure was achieved through an analysis of water sample. The results showed that the scattering profile is a unique impression of each type of tissue, being correlated with their microscopic morphological features. Multivariate analysis was applied to these profiles in order to verify if the information carried by these scattering profiles allow the differentiation between normal, benign and malignant breast tissues. The statistical analysis results showed that a correct identification of 75% of the analyzed samples is accomplished. The values of sensibility and specificity of this method in correctly differentiating between normal and neoplastic samples were 95.6% and 82.3%, respectively, while the values for differentiation between benign and malignant neoplasias were 78.6% and 62.5%. These initial results indicate the feasible use of commercial powder diffractometer to provide a rapid diagnostic with a high sensitivity.
Resumo:
Background Minimal residual disease is an important independent prognostic factor in childhood acute lymphoblastic leukemia. The classical detection methods such as multiparameter flow cytometry and real-time quantitative polymerase chain reaction analysis are expensive, time-consuming and complex, and require considerable technical expertise. Design and Methods We analyzed 229 consecutive children with acute lymphoblastic leukemia treated according to the GBTLI-99 protocol at three different Brazilian centers. Minimal residual disease was analyzed in bone marrow samples at diagnosis and on days 14 and 28 by conventional homo/heteroduplex polymerase chain reaction using a simplified approach with consensus primers for IG and TCR gene rearrangements. Results At least one marker was detected by polymerase chain reaction in 96.4%, of the patients. By combining the minimal residual disease results obtained on days 14 and 28, three different prognostic groups were identified: minimal residual disease negative on days 14 and 28, positive on day 14/negative on day 28, and positive on both. Five-year event-free survival rates were 85%, 75.6%,, and 27.8%, respectively (p<0.0001). The same pattern of stratification held true for the group of intensively treated children. When analyzed in other subgroups of patients such as those at standard and high risk at diagnosis, those with positive B-derived CD10, patients positive for the TEL/AML1 transcript, and patients in morphological remission on a day 28 marrow, the event-free survival rate was found to be significantly lower in patients with positive minimal residual disease on day 28. Multivariate analysis demonstrated that the detection of minimal residual disease on day 28 is the most significant prognostic factor. Conclusions This simplified strategy for detection of minimal residual disease was feasible, reproducible, cheaper and simpler when compared with other methods, and allowed powerful discrimination between children with acute lymphoblastic leukemia with a good and poor outcome.