970 resultados para Statistical Method
Resumo:
Surface flow types (SFT) are advocated as ecologically relevant hydraulic units, often mapped visually from the bankside to characterise rapidly the physical habitat of rivers. SFT mapping is simple, non-invasive and cost-efficient. However, it is also qualitative, subjective and plagued by difficulties in recording accurately the spatial extent of SFT units. Quantitative validation of the underlying physical habitat parameters is often lacking, and does not consistently differentiate between SFTs. Here, we investigate explicitly the accuracy, reliability and statistical separability of traditionally mapped SFTs as indicators of physical habitat, using independent, hydraulic and topographic data collected during three surveys of a c. 50m reach of the River Arrow, Warwickshire, England. We also explore the potential of a novel remote sensing approach, comprising a small unmanned aerial system (sUAS) and Structure-from-Motion photogrammetry (SfM), as an alternative method of physical habitat characterisation. Our key findings indicate that SFT mapping accuracy is highly variable, with overall mapping accuracy not exceeding 74%. Results from analysis of similarity (ANOSIM) tests found that strong differences did not exist between all SFT pairs. This leads us to question the suitability of SFTs for characterising physical habitat for river science and management applications. In contrast, the sUAS-SfM approach provided high resolution, spatially continuous, spatially explicit, quantitative measurements of water depth and point cloud roughness at the microscale (spatial scales ≤1m). Such data are acquired rapidly, inexpensively, and provide new opportunities for examining the heterogeneity of physical habitat over a range of spatial and temporal scales. Whilst continued refinement of the sUAS-SfM approach is required, we propose that this method offers an opportunity to move away from broad, mesoscale classifications of physical habitat (spatial scales 10-100m), and towards continuous, quantitative measurements of the continuum of hydraulic and geomorphic conditions which actually exists at the microscale.
Resumo:
The quality of a heuristic solution to a NP-hard combinatorial problem is hard to assess. A few studies have advocated and tested statistical bounds as a method for assessment. These studies indicate that statistical bounds are superior to the more widely known and used deterministic bounds. However, the previous studies have been limited to a few metaheuristics and combinatorial problems and, hence, the general performance of statistical bounds in combinatorial optimization remains an open question. This work complements the existing literature on statistical bounds by testing them on the metaheuristic Greedy Randomized Adaptive Search Procedures (GRASP) and four combinatorial problems. Our findings confirm previous results that statistical bounds are reliable for the p-median problem, while we note that they also seem reliable for the set covering problem. For the quadratic assignment problem, the statistical bounds has previously been found reliable when obtained from the Genetic algorithm whereas in this work they found less reliable. Finally, we provide statistical bounds to four 2-path network design problem instances for which the optimum is currently unknown.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
La stratégie actuelle de contrôle de la qualité de l’anode est inadéquate pour détecter les anodes défectueuses avant qu’elles ne soient installées dans les cuves d’électrolyse. Des travaux antérieurs ont porté sur la modélisation du procédé de fabrication des anodes afin de prédire leurs propriétés directement après la cuisson en utilisant des méthodes statistiques multivariées. La stratégie de carottage des anodes utilisée à l’usine partenaire fait en sorte que ce modèle ne peut être utilisé que pour prédire les propriétés des anodes cuites aux positions les plus chaudes et les plus froides du four à cuire. Le travail actuel propose une stratégie pour considérer l’histoire thermique des anodes cuites à n’importe quelle position et permettre de prédire leurs propriétés. Il est montré qu’en combinant des variables binaires pour définir l’alvéole et la position de cuisson avec les données routinières mesurées sur le four à cuire, les profils de température des anodes cuites à différentes positions peuvent être prédits. Également, ces données ont été incluses dans le modèle pour la prédiction des propriétés des anodes. Les résultats de prédiction ont été validés en effectuant du carottage supplémentaire et les performances du modèle sont concluantes pour la densité apparente et réelle, la force de compression, la réactivité à l’air et le Lc et ce peu importe la position de cuisson.
Resumo:
The protein lysate array is an emerging technology for quantifying the protein concentration ratios in multiple biological samples. It is gaining popularity, and has the potential to answer questions about post-translational modifications and protein pathway relationships. Statistical inference for a parametric quantification procedure has been inadequately addressed in the literature, mainly due to two challenges: the increasing dimension of the parameter space and the need to account for dependence in the data. Each chapter of this thesis addresses one of these issues. In Chapter 1, an introduction to the protein lysate array quantification is presented, followed by the motivations and goals for this thesis work. In Chapter 2, we develop a multi-step procedure for the Sigmoidal models, ensuring consistent estimation of the concentration level with full asymptotic efficiency. The results obtained in this chapter justify inferential procedures based on large-sample approximations. Simulation studies and real data analysis are used to illustrate the performance of the proposed method in finite-samples. The multi-step procedure is simpler in both theory and computation than the single-step least squares method that has been used in current practice. In Chapter 3, we introduce a new model to account for the dependence structure of the errors by a nonlinear mixed effects model. We consider a method to approximate the maximum likelihood estimator of all the parameters. Using the simulation studies on various error structures, we show that for data with non-i.i.d. errors the proposed method leads to more accurate estimates and better confidence intervals than the existing single-step least squares method.
Resumo:
This dissertation proposes statistical methods to formulate, estimate and apply complex transportation models. Two main problems are part of the analyses conducted and presented in this dissertation. The first method solves an econometric problem and is concerned with the joint estimation of models that contain both discrete and continuous decision variables. The use of ordered models along with a regression is proposed and their effectiveness is evaluated with respect to unordered models. Procedure to calculate and optimize the log-likelihood functions of both discrete-continuous approaches are derived, and difficulties associated with the estimation of unordered models explained. Numerical approximation methods based on the Genz algortithm are implemented in order to solve the multidimensional integral associated with the unordered modeling structure. The problems deriving from the lack of smoothness of the probit model around the maximum of the log-likelihood function, which makes the optimization and the calculation of standard deviations very difficult, are carefully analyzed. A methodology to perform out-of-sample validation in the context of a joint model is proposed. Comprehensive numerical experiments have been conducted on both simulated and real data. In particular, the discrete-continuous models are estimated and applied to vehicle ownership and use models on data extracted from the 2009 National Household Travel Survey. The second part of this work offers a comprehensive statistical analysis of free-flow speed distribution; the method is applied to data collected on a sample of roads in Italy. A linear mixed model that includes speed quantiles in its predictors is estimated. Results show that there is no road effect in the analysis of free-flow speeds, which is particularly important for model transferability. A very general framework to predict random effects with few observations and incomplete access to model covariates is formulated and applied to predict the distribution of free-flow speed quantiles. The speed distribution of most road sections is successfully predicted; jack-knife estimates are calculated and used to explain why some sections are poorly predicted. Eventually, this work contributes to the literature in transportation modeling by proposing econometric model formulations for discrete-continuous variables, more efficient methods for the calculation of multivariate normal probabilities, and random effects models for free-flow speed estimation that takes into account the survey design. All methods are rigorously validated on both real and simulated data.
Resumo:
Background: Body composition is affected by diseases, and affects responses to medical treatments, dosage of medicines, etc., while an abnormal body composition contributes to the causation of many chronic diseases. While we have reliable biochemical tests for certain nutritional parameters of body composition, such as iron or iodine status, and we have harnessed nuclear physics to estimate the body’s content of trace elements, the very basic quantification of body fat content and muscle mass remains highly problematic. Both body fat and muscle mass are vitally important, as they have opposing influences on chronic disease, but they have seldom been estimated as part of population health surveillance. Instead, most national surveys have merely reported BMI and waist, or sometimes the waist/hip ratio; these indices are convenient but do not have any specific biological meaning. Anthropometry offers a practical and inexpensive method for muscle and fat estimation in clinical and epidemiological settings; however, its use is imperfect due to many limitations, such as a shortage of reference data, misuse of terminology, unclear assumptions, and the absence of properly validated anthropometric equations. To date, anthropometric methods are not sensitive enough to detect muscle and fat loss. Aims: The aim of this thesis is to estimate Adipose/fat and muscle mass in health disease and during weight loss through; 1. evaluating and critiquing the literature, to identify the best-published prediction equations for adipose/fat and muscle mass estimation; 2. to derive and validate adipose tissue and muscle mass prediction equations; and 3.to evaluate the prediction equations along with anthropometric indices and the best equations retrieved from the literature in health, metabolic illness and during weight loss. Methods: a Systematic review using Cochrane Review method was used for reviewing muscle mass estimation papers that used MRI as the reference method. Fat mass estimation papers were critically reviewed. Mixed ethnic, age and body mass data that underwent whole body magnetic resonance imaging to quantify adipose tissue and muscle mass (dependent variable) and anthropometry (independent variable) were used in the derivation/validation analysis. Multiple regression and Bland-Altman plot were applied to evaluate the prediction equations. To determine how well the equations identify metabolic illness, English and Scottish health surveys were studied. Statistical analysis using multiple regression and binary logistic regression were applied to assess model fit and associations. Also, populations were divided into quintiles and relative risk was analysed. Finally, the prediction equations were evaluated by applying them to a pilot study of 10 subjects who underwent whole-body MRI, anthropometric measurements and muscle strength before and after weight loss to determine how well the equations identify adipose/fat mass and muscle mass change. Results: The estimation of fat mass has serious problems. Despite advances in technology and science, prediction equations for the estimation of fat mass depend on limited historical reference data and remain dependent upon assumptions that have not yet been properly validated for different population groups. Muscle mass does not have the same conceptual problems; however, its measurement is still problematic and reference data are scarce. The derivation and validation analysis in this thesis was satisfactory, compared to prediction equations in the literature they were similar or even better. Applying the prediction equations in metabolic illness and during weight loss presented an understanding on how well the equations identify metabolic illness showing significant associations with diabetes, hypertension, HbA1c and blood pressure. And moderate to high correlations with MRI-measured adipose tissue and muscle mass before and after weight loss. Conclusion: Adipose tissue mass and to an extent muscle mass can now be estimated for many purposes as population or groups means. However, these equations must not be used for assessing fatness and categorising individuals. Further exploration in different populations and health surveys would be valuable.
Resumo:
As condições de ambiente térmico e aéreo, no interior de instalações para animais, alteram-se durante o dia, devido à influência do ambiente externo. Para que análises estatísticas e geoestatísticas sejam representativas, uma grande quantidade de pontos distribuídos espacialmente na área da instalação deve ser monitorada. Este trabalho propõe que a variação no tempo das variáveis ambientais de interesse para a produção animal, monitoradas no interior de instalações para animais, pode ser modelada com precisão a partir de registros discretos no tempo. O objetivo deste trabalho foi desenvolver um método numérico para corrigir as variações temporais dessas variáveis ambientais, transformando os dados para que tais observações independam do tempo gasto durante a aferição. O método proposto aproximou os valores registrados com retardos de tempo aos esperados no exato momento de interesse, caso os dados fossem medidos simultaneamente neste momento em todos os pontos distribuídos espacialmente. O modelo de correção numérica para variáveis ambientais foi validado para o parâmetro ambiental temperatura do ar, sendo que os valores corrigidos pelo método não diferiram pelo teste Tukey, a 5% de probabilidade dos valores reais registrados por meio de dataloggers.
Resumo:
We investigate the directional distribution of heavy neutral atoms in the heliosphere by using heavy neutral maps generated with the IBEX-Lo instrument over three years from 2009 to 2011. The interstellar neutral (ISN) O&Ne gas flow was found in the first-year heavy neutral map at 601 keV and its flow direction and temperature were studied. However, due to the low counting statistics, researchers have not treated the full sky maps in detail. The main goal of this study is to evaluate the statistical significance of each pixel in the heavy neutral maps to get a better understanding of the directional distribution of heavy neutral atoms in the heliosphere. Here, we examine three statistical analysis methods: the signal-to-noise filter, the confidence limit method, and the cluster analysis method. These methods allow us to exclude background from areas where the heavy neutral signal is statistically significant. These methods also allow the consistent detection of heavy neutral atom structures. The main emission feature expands toward lower longitude and higher latitude from the observational peak of the ISN O&Ne gas flow. We call this emission the extended tail. It may be an imprint of the secondary oxygen atoms generated by charge exchange between ISN hydrogen atoms and oxygen ions in the outer heliosheath.
Resumo:
Purpose: To develop an effective method for evaluating the quality of Cortex berberidis from different geographical origins. Methods: A simple, precise and accurate high performance liquid chromatography (HPLC) method was first developed for simultaneous quantification of four active alkaloids (magnoflorine, jatrorrhizine, palmatine, and berberine) in Cortex berberidis obtained from Qinghai, Tibet and Sichuan Provinces of China. Method validation was performed in terms of precision, repeatability, stability, accuracy, and linearity. Besides, partial least squares discriminant analysis (PLS-DA) and one-way analysis of variance (ANOVA) were applied to study the quality variations of Cortex berberidis from various geographical origins. Results: The proposed HPLC method showed good linearity, precision, repeatability, and accuracy. The four alkaloids were detected in all samples of Cortex berberidis. Among them, magnoflorine (36.46 - 87.30 mg/g) consistently showed the highest amounts in all the samples, followed by berberine (16.00 - 37.50 mg/g). The content varied in the range of 0.66 - 4.57 mg/g for palmatine and 1.53 - 16.26 mg/g for jatrorrhizine, respectively. The total content of the four alkaloids ranged from 67.62 to 114.79 mg/g. Moreover, the results obtained by the PLS-DA and ANOVA showed that magnoflorine level and the total content of these four alkaloids in Qinghai and Tibet samples were significantly higher (p < 0.01) than those in Sichuan samples. Conclusion: Quantification of multi-ingredients by HPLC combined with statistical methods provide an effective approach for achieving origin discrimination and quality evaluation of Cortex berberidis. The quality of Cortex berberidis closely correlates to the geographical origin of the samples, with Cortex berberidis samples from Qinghai and Tibet exhibiting superior qualities to those from Sichuan.
Resumo:
The main purpose of this paper is to propose and test a model to assess the degree of conditions favorability in the adoption of agile methods to develop software where traditional methods predominate. In order to achieve this aim, a survey was applied on software developers of a Brazilian public retail bank. Two different statistical techniques were used in order to assess the quantitative data from the closed questions in the survey. The first, exploratory factorial analysis validated the structure of perspectives related to the agile model of the proposed assessment. The second, frequency distribution analysis to categorize the answers. Qualitative data from the survey opened question were analyzed with the technique of qualitative thematic content analysis. As a result, the paper proposes a model to assess the degree of favorability conditions in the adoption of Agile practices within the context of the proposed study.
Resumo:
The synthetic control method (SCM) is a new, popular method developed for the purpose of estimating the effect of an intervention when only one single unit has been exposed. Other similar, unexposed units are combined into a synthetic control unit intended to mimic the evolution in the exposed unit, had it not been subject to exposure. As the inference relies on only a single observational unit, the statistical inferential issue is a challenge. In this paper, we examine the statistical properties of the estimator, study a number of features potentially yielding uncertainty in the estimator, discuss the rationale for statistical inference in relation to SCM, and provide a Web-app for researchers to aid in their decision of whether SCM is powerful for a specific case study. We conclude that SCM is powerful with a limited number of controls in the donor pool and a fairly short pre-intervention time period. This holds as long as the parameter of interest is a parametric specification of the intervention effect, and the duration of post-intervention period is reasonably long, and the fit of the synthetic control unit to the exposed unit in the pre-intervention period is good.
Resumo:
The Swedish-speaking minority in Finland, often described as an ‘elite minority’, holds a special position in the country. With linguistic rights protected by the constitution of Finland, Swedish-speakers, as a minority of only 5.3%, are often described in public discourse and in academic and statistical studies as happier, healthier and more well off economically than the Finnish-speaking majority. As such, the minority is a unique example of language minorities in Europe. Knowledge derived from qualitatively grounded studies on the topic is however lacking, meaning that there is a gap in understanding of the nature and complexity of the minority. Drawing on ethnographic research conducted in four different locations in Finland over a period of 12 months, this thesis provides a theoretically grounded and empirically informed rich account of the identifications and sites of belonging of this diverse minority. The thesis makes a contribution to theoretical, methodological and empirical research on the Swedish-speaking minority, debates around identity and belonging, and ethnographic methodological approaches. Making use of novel methodology in studying Swedish-speaking Finns, this thesis moves beyond generalisations and simplifications on its nature and character. Drawing on rich ethnographic empirical material, the thesis interrogates various aspects of the lived experience of Swedish-speaking Finns by combining the concepts of belonging and identification. Some of the issues explored are the way in which belonging can be regionally specific, how Swedish-speakers create Swedish-spaces, how language use is situational and variable and acts as a marker of identity, and finally how identifications and sites of belonging among the minority are extremely varied and complex. The thesis concludes that there are various sites of belonging and identification available to Swedish-speakers, and these need to be studied and considered in order to gain an accurate picture of the lived experience of the minority. It also argues that while identifications are based on collective imagery, this imagery can vary among Swedish-speakers and identifications are multiple and situational. Finally, while language is a key commonality for the minority, the meanings attached to it are not only concerned with ‘Finland Swedishness’, but connected to various other factors, such as the context a person grew up in and the region one lives in. The complex issues affecting the lived experience of Swedish-speaking Finns cannot be understood without the contribution of findings from qualitative research. This thesis therefore points towards a new kind of understanding of Swedish-speaking Finns, moving away from stereotypes and simplifications, shifting our gaze towards a richer perception of the minority.
Resumo:
In this thesis we will see that the DNA sequence is constantly shaped by the interactions with its environment at multiple levels, showing footprints of DNA methylation, of its 3D organization and, in the case of bacteria, of the interaction with the host organisms. In the first chapter, we will see that analyzing the distribution of distances between consecutive dinucleotides of the same type along the sequence, we can detect epigenetic and structural footprints. In particular, we will see that CG distance distribution allows to distinguish among organisms of different biological complexity, depending on how much CG sites are involved in DNA methylation. Moreover, we will see that CG and TA can be described by the same fitting function, suggesting a relationship between the two. We will also provide an interpretation of the observed trend, simulating a positioning process guided by the presence and absence of memory. In the end, we will focus on TA distance distribution, characterizing deviations from the trend predicted by the best fitting function, and identifying specific patterns that might be related to peculiar mechanical properties of the DNA and also to epigenetic and structural processes. In the second chapter, we will see how we can map the 3D structure of the DNA onto its sequence. In particular, we devised a network-based algorithm that produces a genome assembly starting from its 3D configuration, using as inputs Hi-C contact maps. Specifically, we will see how we can identify the different chromosomes and reconstruct their sequences by exploiting the spectral properties of the Laplacian operator of a network. In the third chapter, we will see a novel method for source clustering and source attribution, based on a network approach, that allows to identify host-bacteria interaction starting from the detection of Single-Nucleotide Polymorphisms along the sequence of bacterial genomes.
Resumo:
The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it.