919 resultados para Multi-Factor ModeI, Missing Data
Resumo:
The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^
Resumo:
In this paper we examine multi-objective linear programming problems in the face of data uncertainty both in the objective function and the constraints. First, we derive a formula for the radius of robust feasibility guaranteeing constraint feasibility for all possible scenarios within a specified uncertainty set under affine data parametrization. We then present numerically tractable optimality conditions for minmax robust weakly efficient solutions, i.e., the weakly efficient solutions of the robust counterpart. We also consider highly robust weakly efficient solutions, i.e., robust feasible solutions which are weakly efficient for any possible instance of the objective matrix within a specified uncertainty set, providing lower bounds for the radius of highly robust efficiency guaranteeing the existence of this type of solutions under affine and rank-1 objective data uncertainty. Finally, we provide numerically tractable optimality conditions for highly robust weakly efficient solutions.
Resumo:
The classical problem of agricultural productivity measurement has regained interest owing to recent price hikes in world food markets. At the same time, there is a new methodological debate on the appropriate identification strategies for addressing endogeneity and collinearity problems in production function estimation. We examine the plausibility of four established and innovative identification strategies for the case of agriculture and test a set of related estimators using farm-level panel datasets from seven EU countries. The newly suggested control function and dynamic panel approaches provide attractive conceptual improvements over the received ‘within’ and duality models. Even so, empirical implementation of the conceptual sophistications built into these estimators does not always live up to expectations. This is particularly true for the dynamic panel estimator, which mostly failed to identify reasonable elasticities for the (quasi-) fixed factors. Less demanding proxy approaches represent an interesting alternative for agricultural applications. In our EU sample, we find very low shadow prices for labour, land and fixed capital across countries. The production elasticity of materials is high, so improving the availability of working capital is the most promising way to increase agricultural productivity.
Resumo:
Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.
Resumo:
The use of finite element analysis (FEA) to design electrical motors has increased significantly in the past few years due the increasingly better performance of modern computers. Even though the analytical software remains the most used tool, the FEA is widely used to refine the analysis and gives the final design to be prototyped. The power factor, a standard data of motor manufactures data sheet is important because it shows how much reactive power is consumed by the motor. This data becomes important when the motor is connected to network. However, the calculation of power factor is not an easy task. Due to the saturation phenomena the input motor current has a high level of harmonics that cannot be neglected. In this work the FEA is used to evaluate a proposed (not limitative) methodology to estimate the power factor or displacement factor of a small single-phase induction motor. Results of simulations and test are compared.
Resumo:
The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. We show in this paper that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.
Resumo:
There are many methods for the analysis and design of embedded cantilever retaining walls. They involve various different simplifications of the pressure distribution to allow calculation of the limiting equilibrium retained height and the bending moment when the retained height is less than the limiting equilibrium value, i.e. the serviceability case. Recently, a new method for determining the serviceability earth pressure and bending moment has been proposed. This method makes an assumption defining the point of zero net pressure. This assumption implies that the passive pressure is not fully mobilised immediately below the excavation level. The finite element analyses presented in this paper examine the net pressure distribution on walls in which the retained height is less, than the limiting equilibrium value. The study shows that for all practical walls, the earth pressure distributions on the front and back of the wall are at their limit values, Kp and K-a respectively, when the lumped factor of safety F-r is less than or equal to2.0. A rectilinear net pressure distribution is proposed that is intuitively logical. It produces good predictions of the complete bending moment diagram for walls in the service configuration and the proposed method gives results that have excellent agreement with centrifuge model tests. The study shows that the method for determining the serviceability bending moment suggested by Padfield and Mair(1) in the CIRIA Report 104 gives excellent predictions of the maximum bending moment in practical cantilever walls. It provides the missing data that have been needed to verify and justify the CIRIA 104 method.
Resumo:
Dissertação apresentada para obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores – Sistemas Digitais e Percepcionais pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
Resumo:
BACKGROUND: Chest pain is a common complaint in primary care, with coronary heart disease (CHD) being the most concerning of many potential causes. Systematic reviews on the sensitivity and specificity of symptoms and signs summarize the evidence about which of them are most useful in making a diagnosis. Previous meta-analyses are dominated by studies of patients referred to specialists. Moreover, as the analysis is typically based on study-level data, the statistical analyses in these reviews are limited while meta-analyses based on individual patient data can provide additional information. Our patient-level meta-analysis has three unique aims. First, we strive to determine the diagnostic accuracy of symptoms and signs for myocardial ischemia in primary care. Second, we investigate associations between study- or patient-level characteristics and measures of diagnostic accuracy. Third, we aim to validate existing clinical prediction rules for diagnosing myocardial ischemia in primary care. This article describes the methods of our study and six prospective studies of primary care patients with chest pain. Later articles will describe the main results. METHODS/DESIGN: We will conduct a systematic review and IPD meta-analysis of studies evaluating the diagnostic accuracy of symptoms and signs for diagnosing coronary heart disease in primary care. We will perform bivariate analyses to determine the sensitivity, specificity and likelihood ratios of individual symptoms and signs and multivariate analyses to explore the diagnostic value of an optimal combination of all symptoms and signs based on all data of all studies. We will validate existing clinical prediction rules from each of the included studies by calculating measures of diagnostic accuracy separately by study. DISCUSSION: Our study will face several methodological challenges. First, the number of studies will be limited. Second, the investigators of original studies defined some outcomes and predictors differently. Third, the studies did not collect the same standard clinical data set. Fourth, missing data, varying from partly missing to fully missing, will have to be dealt with.Despite these limitations, we aim to summarize the available evidence regarding the diagnostic accuracy of symptoms and signs for diagnosing CHD in patients presenting with chest pain in primary care. REVIEW REGISTRATION: Centre for Reviews and Dissemination (University of York): CRD42011001170.
Resumo:
Attrition in longitudinal studies can lead to biased results. The study is motivated by the unexpected observation that alcohol consumption decreased despite increased availability, which may be due to sample attrition of heavy drinkers. Several imputation methods have been proposed, but rarely compared in longitudinal studies of alcohol consumption. The imputation of consumption level measurements is computationally particularly challenging due to alcohol consumption being a semi-continuous variable (dichotomous drinking status and continuous volume among drinkers), and the non-normality of data in the continuous part. Data come from a longitudinal study in Denmark with four waves (2003-2006) and 1771 individuals at baseline. Five techniques for missing data are compared: Last value carried forward (LVCF) was used as a single, and Hotdeck, Heckman modelling, multivariate imputation by chained equations (MICE), and a Bayesian approach as multiple imputation methods. Predictive mean matching was used to account for non-normality, where instead of imputing regression estimates, "real" observed values from similar cases are imputed. Methods were also compared by means of a simulated dataset. The simulation showed that the Bayesian approach yielded the most unbiased estimates for imputation. The finding of no increase in consumption levels despite a higher availability remained unaltered. Copyright (C) 2011 John Wiley & Sons, Ltd.
Resumo:
Background: A patient's chest pain raises concern for the possibility of coronary heart disease (CHD). An easy to use clinical prediction rule has been derived from the TOPIC study in Lausanne. Our objective is to validate this clinical score for ruling out CHD in primary care patients with chest pain. Methods: This secondary analysis used data collected from a oneyear follow-up cohort study attending 76 GPs in Germany. Patients attending their GP with chest pain were questioned on their age, gender, duration of chest pain (1-60 min), sternal pain location, pain increases with exertion, absence of tenderness point at palpation, cardiovascular risks factors, and personal history of cardiovascular disease. Area under the curve (ROC), sensitivity and specificity of the Lausanne CHD score were calculated for patients with full data. Results: 1190 patients were included. Full data was available for 509 patients (42.8%). Missing data was not related to having CHD (p = 0.397) or having a cardiovascular risk factor (p = 0.275). 76 (14.9%) were diagnosed with a CHD. Prevalence of CHD were respectively of 68/344 (19.8%), 2/62 (3.2%), 6/103 (5.8%) in the high, intermediate and low risk category. ROC was of 72.9 (CI95% 66.8; 78.9). Ruling out patients with low risk has a sensitivity of 92.1% (CI95% 83.0; 96.7) and a specificity of 22.4% (CI95% 18.6%; 26.7%). Conclusion: The Lausanne CHD score shows reasonably good sensitivity and can be used to rule out coronary events in patients with chest pain. Patients at risk of CHD for other rarer reasons should nevertheless also be investigated.