Biblioteca Digital

953 resultados para Linear multivariate methods

The development and testing of a multivariate model of differentiation of self among Hispanics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Individuals of Hispanic origin are the nation's largest minority (13.4%). Therefore, there is a need for models and methods that are culturally appropriate for mental health research with this burgeoning population. This is an especially salient issue when applying family systems theories to Hispanics, who are heavily influenced by family bonds in a way that appears to be different from the more individualistic non-Hispanic White culture. Bowen asserted that his family systems' concept of differentiation of self, which values both individuality and connectedness, could be universally applied. However, there is a paucity of research systematically assessing the applicability of the differentiation of self construct in ethnic minority populations. ^ This dissertation tested a multivariate model of differentiation of self with a Hispanic sample. The manner in which the construct of differentiation of self was being assessed and how accurately it represented this particular ethnic minority group's functioning was examined. Additionally, the proposed model included key contextual variables (e.g., anxiety, relationship satisfaction, attachment and acculturation related variables) which have been shown to be related to the differentiation process. ^ The results from structural equation modeling (SEM) analyses confirmed and extended previous research, and helped to illuminate the complex relationships between key factors that need to be considered in order to better understand individuals with this cultural background. Overall results indicated that the manner in which Hispanic individuals negotiate the boundaries of interconnectedness with a sense of individual expression appears to be different from their non-Hispanic White counterparts in some important ways. These findings illustrate the need for research on Hispanic individuals that provides a more culturally sensitive framework. ^

Improved estimation of postmortem interval with Cardiac Troponin T and improved analytical methods

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate knowledge of the time since death, or postmortem interval (PMI), has enormous legal, criminological, and psychological impact. In this study, an investigation was made to determine whether the relationship between the degradation of the human cardiac structure protein Cardiac Troponin T and PMI could be used as an indicator of time since death, thus providing a rapid, high resolution, sensitive, and automated methodology for the determination of PMI. ^ The use of Cardiac Troponin T (cTnT), a protein found in heart tissue, as a selective marker for cardiac muscle damage has shown great promise in the determination of PMI. An optimized conventional immunoassay method was developed to quantify intact and fragmented cTnT. A small sample of cardiac tissue, which is less affected than other tissues by external factors, was taken, homogenized, extracted with magnetic microparticles, separated by SDS-PAGE, and visualized with Western blot by probing with monoclonal antibody against cTnT. This step was followed by labeling and available scanners. This conventional immunoassay provides a proper detection and quantitation of cTnT protein in cardiac tissue as a complex matrix; however, this method does not provide the analyst with immediate results. Therefore, a competitive separation method using capillary electrophoresis with laser-induced fluorescence (CE-LIF) was developed to study the interaction between human cTnT protein and monoclonal anti-TroponinT antibody. ^ Analysis of the results revealed a linear relationship between the percent of degraded cTnT and the log of the PMI, indicating that intact cTnT could be detected in human heart tissue up to 10 days postmortem at room temperature and beyond two weeks at 4C. The data presented demonstrates that this technique can provide an extended time range during which PMI can be more accurately estimated as compared to currently used methods. The data demonstrates that this technique represents a major advance in time of death determination through a fast and reliable, semi-quantitative measurement of a biochemical marker from an organ protected from outside factors. ^

Evaluation of the evidential value of the elemental composition of glass, ink and paper by laser-based microspectrochemical methods

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Elemental analysis can become an important piece of evidence to assist the solution of a case. The work presented in this dissertation aims to evaluate the evidential value of the elemental composition of three particular matrices: ink, paper and glass. In the first part of this study, the analytical performance of LIBS and LA-ICP-MS methods was evaluated for paper, writing inks and printing inks. A total of 350 ink specimens were examined including black and blue gel inks, ballpoint inks, inkjets and toners originating from several manufacturing sources and/or batches. The paper collection set consisted of over 200 paper specimens originating from 20 different paper sources produced by 10 different plants. Micro-homogeneity studies show smaller variation of elemental compositions within a single source (i.e., sheet, pen or cartridge) than the observed variation between different sources (i.e., brands, types, batches). Significant and detectable differences in the elemental profile of the inks and paper were observed between samples originating from different sources (discrimination of 87–100% of samples, depending on the sample set under investigation and the method applied). These results support the use of elemental analysis, using LA-ICP-MS and LIBS, for the examination of documents and provide additional discrimination to the currently used techniques in document examination. In the second part of this study, a direct comparison between four analytical methods (µ-XRF, solution-ICP-MS, LA-ICP-MS and LIBS) was conducted for glass analyses using interlaboratory studies. The data provided by 21 participants were used to assess the performance of the analytical methods in associating glass samples from the same source and differentiating different sources, as well as the use of different match criteria (confidence interval (±6s, ±5s, ±4s, ±3s, ±2s), modified confidence interval, t-test (sequential univariate, p=0.05 and p=0.01), t-test with Bonferroni correction (for multivariate comparisons), range overlap, and Hotelling's T2 tests. Error rates (Type 1 and Type 2) are reported for the use of each of these match criteria and depend on the heterogeneity of the glass sources, the repeatability between analytical measurements, and the number of elements that were measured. The study provided recommendations for analytical performance-based parameters for µ-XRF and LA-ICP-MS as well as the best performing match criteria for both analytical techniques, which can be applied now by forensic glass examiners.

Reliability analysis of encapsulation methods for lead -based paint through an accelerated life testing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The adverse health effects of long-term exposure to lead are well established, with major uptake into the human body occurring mainly through oral ingestion by young children. Lead-based paint was frequently used in homes built before 1978, particularly in inner-city areas. Minority populations experience the effects of lead poisoning disproportionately. ^ Lead-based paint abatement is costly. In the United States, residents of about 400,000 homes, occupied by 900,000 young children, lack the means to correct lead-based paint hazards. The magnitude of this problem demands research on affordable methods of hazard control. One method is encapsulation, defined as any covering or coating that acts as a permanent barrier between the lead-based paint surface and the environment. ^ Two encapsulants were tested for reliability and effective life span through an accelerated lifetime experiment that applied stresses exceeding those encountered under normal use conditions. The resulting time-to-failure data were used to extrapolate the failure time under conditions of normal use. Statistical analysis and models of the test data allow forecasting of long-term reliability relative to the 20-year encapsulation requirement. Typical housing material specimens simulating walls and doors coated with lead-based paint were overstressed before encapsulation. A second, un-aged set was also tested. Specimens were monitored after the stress test with a surface chemical testing pad to identify the presence of lead breaking through the encapsulant. ^ Graphical analysis proposed by Shapiro and Meeker and the general log-linear model developed by Cox were used to obtain results. Findings for the 80% reliability time to failure varied, with close to 21 years of life under normal use conditions for encapsulant A. The application of product A on the aged gypsum and aged wood substrates yielded slightly lower times. Encapsulant B had an 80% reliable life of 19.78 years. ^ This study reveals that encapsulation technologies can offer safe and effective control of lead-based paint hazards and may be less expensive than other options. The U.S. Department of Health and Human Services and the CDC are committed to eliminating childhood lead poisoning by 2010. This ambitious target is feasible, provided there is an efficient application of innovative technology, a goal to which this study aims to contribute. ^

Comparison of Some Improved Estimators for Linear Regression Model under Different Conditions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multiple linear regression model plays a key role in statistical inference and it has extensive applications in business, environmental, physical and social sciences. Multicollinearity has been a considerable problem in multiple regression analysis. When the regressor variables are multicollinear, it becomes difficult to make precise statistical inferences about the regression coefficients. There are some statistical methods that can be used, which are discussed in this thesis are ridge regression, Liu, two parameter biased and LASSO estimators. Firstly, an analytical comparison on the basis of risk was made among ridge, Liu and LASSO estimators under orthonormal regression model. I found that LASSO dominates least squares, ridge and Liu estimators over a significant portion of the parameter space for large dimension. Secondly, a simulation study was conducted to compare performance of ridge, Liu and two parameter biased estimator by their mean squared error criterion. I found that two parameter biased estimator performs better than its corresponding ridge regression estimator. Overall, Liu estimator performs better than both ridge and two parameter biased estimator.

Applications of capillary electrophoresis and mass spectrometry for the analysis of thiosalts

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thiosalt species are unstable, partially oxidized sulfur oxyanions formed in sulfur-rich environments but also during the flotation and milling of sulfidic minerals especially those containing pyrite (FeS₂) and pyrrhotite (Fe₍₁₋ₓ₎S, x = 0 to 0.2). Detecting and quantifying the major thiosalt species such as sulfate (SO₄²⁻), thiosulfate (S₂O₃²⁻), trithionate (S₃O₆²⁻), tetrathionate (S₄O₆²⁻) and higher polythionates (SₓO₆²⁻, where 3 ≤ x ≤ 10) in the milling process and in the treated tailings is important to understand how thiosalts are generated and provides insight into potential treatment. As these species are unstable, a fast and reliable analytical technique is required for their analysis. Three capillary zone electrophoresis (CZE) methods using indirect UV-vis detection were developed for the simultaneous separation and determination of five thiosalt anions: SO₄²⁻, S₂O₃²⁻, S₃O₆²⁻, S₄O₆²⁻ and S₅O₆²⁻. Both univariate and multivariate experimental design approaches were used to optimize the most critical factors (background electrolyte (BGE) and instrumental conditions) to achieve fast separation and quantitative analysis of the thiosalt species. The mathematically predicted responses for the multivariate experiments were in good agreement with the experimental results. Limits of detection (LODs) (S/N = 3) for the methods were between 0.09 and 0.34 μg/mL without a sample stacking technique and nearly four-fold increase in LODs with the application of field-amplified sample stacking. As direct analysis of thiosalts by mass spectrometry (MS) is limited by their low m/z values and detection in negative mode electrospray ionization (ESI), which is typically less sensitive than positive ESI, imidazolium-based (IP-L-Imid and IP-T-Imid) and phosphonium-based (IP-T-Phos) tricationic ion-pairing reagents were used to form stable high mass ions non-covalent +1 ion-pairs with these species for ESI-MS analysis and the association constants (Kassoc) determined for these ion-pairs. Kassoc values were between 6.85 × 10² M⁻¹ and 3.56 × 10⁵ M⁻¹ with the linear IP-L-Imid; 1.89 ×10³ M⁻¹ and 1.05 × 10⁵ M⁻¹ with the trigonal IP-T-Imid ion-pairs; and 7.51×10² M⁻¹ and 4.91× 10⁴ M⁻¹ with the trigonal IP-T-Phos ion-pairs. The highest formation constants were obtained for S₃O₆²⁻ and the imidazolium-based linear ion-pairing reagent (IP-L-Imid), whereas the lowest were for IP-L-Imid: SO₄²⁻ ion-pair.

Sea surface and the sea floor regionalization of the Southern Ocean by multivariate cluster analysis, links to ArcGIS project files

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study subdivides the Weddell Sea, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis uses 28 environmental variables for the sea surface, 25 variables for the seabed and 9 variables for the analysis between surface and bottom variables. The data were taken during the years 1983-2013. Some data were interpolated. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared for the identification of the most reasonable method. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested. For the seabed 8 and 12 clusters were identified as reasonable numbers for clustering the Weddell Sea. For the sea surface the numbers 8 and 13 and for the top/bottom analysis 8 and 3 were identified, respectively. Additionally, the results of 20 clusters are presented for the three alternatives offering the first small scale environmental regionalization of the Weddell Sea. Especially the results of 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.

Bayesian Inference in Large-scale Problems

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

New Advancements of Scalable Statistical Methods for Learning Latent Structures in Big Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Hyperspectral Image Classification with Nonlinear Methods

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis introduces two related lines of study on classification of hyperspectral images with nonlinear methods. First, it describes a quantitative and systematic evaluation, by the author, of each major component in a pipeline for classifying hyperspectral images (HSI) developed earlier in a joint collaboration [23]. The pipeline, with novel use of nonlinear classification methods, has reached beyond the state of the art in classification accuracy on commonly used benchmarking HSI data [6], [13]. More importantly, it provides a clutter map, with respect to a predetermined set of classes, toward the real application situations where the image pixels not necessarily fall into a predetermined set of classes to be identified, detected or classified with.

The particular components evaluated are a) band selection with band-wise entropy spread, b) feature transformation with spatial filters and spectral expansion with derivatives c) graph spectral transformation via locally linear embedding for dimension reduction, and d) statistical ensemble for clutter detection. The quantitative evaluation of the pipeline verifies that these components are indispensable to high-accuracy classification.

Secondly, the work extends the HSI classification pipeline with a single HSI data cube to multiple HSI data cubes. Each cube, with feature variation, is to be classified of multiple classes. The main challenge is deriving the cube-wise classification from pixel-wise classification. The thesis presents the initial attempt to circumvent it, and discuss the potential for further improvement.

Statistical methods for polyhedral shape classification with incomplete data - application to cryo-electron tomographic images

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A certain type of bacterial inclusion, known as a bacterial microcompartment, was recently identified and imaged through cryo-electron tomography. A reconstructed 3D object from single-axis limited angle tilt-series cryo-electron tomography contains missing regions and this problem is known as the missing wedge problem. Due to missing regions on the reconstructed images, analyzing their 3D structures is a challenging problem. The existing methods overcome this problem by aligning and averaging several similar shaped objects. These schemes work well if the objects are symmetric and several objects with almost similar shapes and sizes are available. Since the bacterial inclusions studied here are not symmetric, are deformed, and show a wide range of shapes and sizes, the existing approaches are not appropriate. This research develops new statistical methods for analyzing geometric properties, such as volume, symmetry, aspect ratio, polyhedral structures etc., of these bacterial inclusions in presence of missing data. These methods work with deformed and non-symmetric varied shaped objects and do not necessitate multiple objects for handling the missing wedge problem. The developed methods and contributions include: (a) an improved method for manual image segmentation, (b) a new approach to 'complete' the segmented and reconstructed incomplete 3D images, (c) a polyhedral structural distance model to predict the polyhedral shapes of these microstructures, (d) a new shape descriptor for polyhedral shapes, named as polyhedron profile statistic, and (e) the Bayes classifier, linear discriminant analysis and support vector machine based classifiers for supervised incomplete polyhedral shape classification. Finally, the predicted 3D shapes for these bacterial microstructures belong to the Johnson solids family, and these shapes along with their other geometric properties are important for better understanding of their chemical and biological characteristics.

Methods for Imputing Missing Values and Synthesizing Confidential Values for Continuous and Magnitude Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract

Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.

The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.

The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.

The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.

Socioeconomic inequalities of cardiovascular risk factors among manufacturing employees in the Republic of Ireland: a cross-sectional study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives: To explore socioeconomic differences in four cardiovascular disease risk factors (overweight/obesity, smoking, hypertension, height) among manufacturing employees in the Republic of Ireland (ROI). Methods: Cross-sectional analysis of 850 manufacturing employees aged 18–64 years. Education and job position served as socioeconomic indicators. Group-specific differences in prevalence were assessed with the Chi-squared test. Multivariate regression models were explored if education and job position were independent predictors of the CVD risk factors. Cochran–Armitage test for trend was used to assess the presence of a social gradient. Results: A social gradient was found across educational levels for smoking and height. Employees with the highest education were less likely to smoke compared to the least educated employees (OR 0.2, [95% CI 0.1–0.4]; p b 0.001). Lower educational attainment was associated with a reduction in mean height. Non-linear differences were found in both educational level and job position for obesity/overweight. Managers were more than twice as likely to be overweight or obese relative to those employees in the lowest job position (OR 2.4 [95% CI 1.3–4.6]; p = 0.008). Conclusion: Socioeconomic inequalities in height, smoking and overweight/obesity were highlighted within a sub-section of the working population in ROI.

RELIBILITY ANALYSIS OF SIMPLE SLOPES AND SOIL-STRUCTURES WITH LINEAR LIMIT STATES

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The first objective of this research was to develop closed-form and numerical probabilistic methods of analysis that can be applied to otherwise conventional methods of unreinforced and geosynthetic reinforced slopes and walls. These probabilistic methods explicitly include random variability of soil and reinforcement, spatial variability of the soil, and cross-correlation between soil input parameters on probability of failure. The quantitative impact of simultaneously considering the influence of random and/or spatial variability in soil properties in combination with cross-correlation in soil properties is investigated for the first time in the research literature. Depending on the magnitude of these statistical descriptors, margins of safety based on conventional notions of safety may be very different from margins of safety expressed in terms of probability of failure (or reliability index). The thesis work also shows that intuitive notions of margin of safety using conventional factor of safety and probability of failure can be brought into alignment when cross-correlation between soil properties is considered in a rigorous manner. The second objective of this thesis work was to develop a general closed-form solution to compute the true probability of failure (or reliability index) of a simple linear limit state function with one load term and one resistance term expressed first in general probabilistic terms and then migrated to a LRFD format for the purpose of LRFD calibration. The formulation considers contributions to probability of failure due to model type, uncertainty in bias values, bias dependencies, uncertainty in estimates of nominal values for correlated and uncorrelated load and resistance terms, and average margin of safety expressed as the operational factor of safety (OFS). Bias is defined as the ratio of measured to predicted value. Parametric analyses were carried out to show that ignoring possible correlations between random variables can lead to conservative (safe) values of resistance factor in some cases and in other cases to non-conservative (unsafe) values. Example LRFD calibrations were carried out using different load and resistance models for the pullout internal stability limit state of steel strip and geosynthetic reinforced soil walls together with matching bias data reported in the literature.

Linear Hyperspectral Unmixing Using L0-norm Approximations and Nonnegative Matrix Factorization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spectral unmixing (SU) is a technique to characterize mixed pixels of the hyperspectral images measured by remote sensors. Most of the existing spectral unmixing algorithms are developed using the linear mixing models. Since the number of endmembers/materials present at each mixed pixel is normally scanty compared with the number of total endmembers (the dimension of spectral library), the problem becomes sparse. This thesis introduces sparse hyperspectral unmixing methods for the linear mixing model through two different scenarios. In the first scenario, the library of spectral signatures is assumed to be known and the main problem is to find the minimum number of endmembers under a reasonable small approximation error. Mathematically, the corresponding problem is called the $\ell_0$-norm problem which is NP-hard problem. Our main study for the first part of thesis is to find more accurate and reliable approximations of $\ell_0$-norm term and propose sparse unmixing methods via such approximations. The resulting methods are shown considerable improvements to reconstruct the fractional abundances of endmembers in comparison with state-of-the-art methods such as having lower reconstruction errors. In the second part of the thesis, the first scenario (i.e., dictionary-aided semiblind unmixing scheme) will be generalized as the blind unmixing scenario that the library of spectral signatures is also estimated. We apply the nonnegative matrix factorization (NMF) method for proposing new unmixing methods due to its noticeable supports such as considering the nonnegativity constraints of two decomposed matrices. Furthermore, we introduce new cost functions through some statistical and physical features of spectral signatures of materials (SSoM) and hyperspectral pixels such as the collaborative property of hyperspectral pixels and the mathematical representation of the concentrated energy of SSoM for the first few subbands. Finally, we introduce sparse unmixing methods for the blind scenario and evaluate the efficiency of the proposed methods via simulations over synthetic and real hyperspectral data sets. The results illustrate considerable enhancements to estimate the spectral library of materials and their fractional abundances such as smaller values of spectral angle distance (SAD) and abundance angle distance (AAD) as well.

«
1
2
...
50
51
52
53
54
55
56
...
63
64
»