969 resultados para multivariate methods
Resumo:
National guidance and clinical guidelines recommended multidisciplinary teams (MDTs) for cancer services in order to bring specialists in relevant disciplines together, ensure clinical decisions are fully informed, and to coordinate care effectively. However, the effectiveness of cancer teams was not previously evaluated systematically. A random sample of 72 breast cancer teams in England was studied (548 members in six core disciplines), stratified by region and caseload. Information about team constitution, processes, effectiveness, clinical performance, and members' mental well-being was gathered using appropriate instruments. Two input variables, team workload (P=0.009) and the proportion of breast care nurses (P=0.003), positively predicted overall clinical performance in multivariate analysis using a two-stage regression model. There were significant correlations between individual team inputs, team composition variables, and clinical performance. Some disciplines consistently perceived their team's effectiveness differently from the mean. Teams with shared leadership of their clinical decision-making were most effective. The mental well-being of team members appeared significantly better than in previous studies of cancer clinicians, the NHS, and the general population. This study established that team composition, working methods, and workloads are related to measures of effectiveness, including the quality of clinical care. © 2003 Cancer Research UK.
Resumo:
The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.
Resumo:
Objective In this study, we have used a chemometrics-based method to correlate key liposomal adjuvant attributes with in-vivo immune responses based on multivariate analysis. Methods The liposomal adjuvant composed of the cationic lipid dimethyldioctadecylammonium bromide (DDA) and trehalose 6,6-dibehenate (TDB) was modified with 1,2-distearoyl-sn-glycero-3-phosphocholine at a range of mol% ratios, and the main liposomal characteristics (liposome size and zeta potential) was measured along with their immunological performance as an adjuvant for the novel, postexposure fusion tuberculosis vaccine, Ag85B-ESAT-6-Rv2660c (H56 vaccine). Partial least square regression analysis was applied to correlate and cluster liposomal adjuvants particle characteristics with in-vivo derived immunological performances (IgG, IgG1, IgG2b, spleen proliferation, IL-2, IL-5, IL-6, IL-10, IFN-γ). Key findings While a range of factors varied in the formulations, decreasing the 1,2-distearoyl-sn-glycero-3-phosphocholine content (and subsequent zeta potential) together built the strongest variables in the model. Enhanced DDA and TDB content (and subsequent zeta potential) stimulated a response skewed towards a cell mediated immunity, with the model identifying correlations with IFN-γ, IL-2 and IL-6. Conclusion This study demonstrates the application of chemometrics-based correlations and clustering, which can inform liposomal adjuvant design.
Resumo:
Individuals of Hispanic origin are the nation's largest minority (13.4%). Therefore, there is a need for models and methods that are culturally appropriate for mental health research with this burgeoning population. This is an especially salient issue when applying family systems theories to Hispanics, who are heavily influenced by family bonds in a way that appears to be different from the more individualistic non-Hispanic White culture. Bowen asserted that his family systems' concept of differentiation of self, which values both individuality and connectedness, could be universally applied. However, there is a paucity of research systematically assessing the applicability of the differentiation of self construct in ethnic minority populations. ^ This dissertation tested a multivariate model of differentiation of self with a Hispanic sample. The manner in which the construct of differentiation of self was being assessed and how accurately it represented this particular ethnic minority group's functioning was examined. Additionally, the proposed model included key contextual variables (e.g., anxiety, relationship satisfaction, attachment and acculturation related variables) which have been shown to be related to the differentiation process. ^ The results from structural equation modeling (SEM) analyses confirmed and extended previous research, and helped to illuminate the complex relationships between key factors that need to be considered in order to better understand individuals with this cultural background. Overall results indicated that the manner in which Hispanic individuals negotiate the boundaries of interconnectedness with a sense of individual expression appears to be different from their non-Hispanic White counterparts in some important ways. These findings illustrate the need for research on Hispanic individuals that provides a more culturally sensitive framework. ^
Resumo:
Elemental analysis can become an important piece of evidence to assist the solution of a case. The work presented in this dissertation aims to evaluate the evidential value of the elemental composition of three particular matrices: ink, paper and glass. In the first part of this study, the analytical performance of LIBS and LA-ICP-MS methods was evaluated for paper, writing inks and printing inks. A total of 350 ink specimens were examined including black and blue gel inks, ballpoint inks, inkjets and toners originating from several manufacturing sources and/or batches. The paper collection set consisted of over 200 paper specimens originating from 20 different paper sources produced by 10 different plants. Micro-homogeneity studies show smaller variation of elemental compositions within a single source (i.e., sheet, pen or cartridge) than the observed variation between different sources (i.e., brands, types, batches). Significant and detectable differences in the elemental profile of the inks and paper were observed between samples originating from different sources (discrimination of 87–100% of samples, depending on the sample set under investigation and the method applied). These results support the use of elemental analysis, using LA-ICP-MS and LIBS, for the examination of documents and provide additional discrimination to the currently used techniques in document examination. In the second part of this study, a direct comparison between four analytical methods (µ-XRF, solution-ICP-MS, LA-ICP-MS and LIBS) was conducted for glass analyses using interlaboratory studies. The data provided by 21 participants were used to assess the performance of the analytical methods in associating glass samples from the same source and differentiating different sources, as well as the use of different match criteria (confidence interval (±6s, ±5s, ±4s, ±3s, ±2s), modified confidence interval, t-test (sequential univariate, p=0.05 and p=0.01), t-test with Bonferroni correction (for multivariate comparisons), range overlap, and Hotelling's T2 tests. Error rates (Type 1 and Type 2) are reported for the use of each of these match criteria and depend on the heterogeneity of the glass sources, the repeatability between analytical measurements, and the number of elements that were measured. The study provided recommendations for analytical performance-based parameters for µ-XRF and LA-ICP-MS as well as the best performing match criteria for both analytical techniques, which can be applied now by forensic glass examiners.
Resumo:
This study subdivides the Weddell Sea, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis uses 28 environmental variables for the sea surface, 25 variables for the seabed and 9 variables for the analysis between surface and bottom variables. The data were taken during the years 1983-2013. Some data were interpolated. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared for the identification of the most reasonable method. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested. For the seabed 8 and 12 clusters were identified as reasonable numbers for clustering the Weddell Sea. For the sea surface the numbers 8 and 13 and for the top/bottom analysis 8 and 3 were identified, respectively. Additionally, the results of 20 clusters are presented for the three alternatives offering the first small scale environmental regionalization of the Weddell Sea. Especially the results of 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Abstract
Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.
The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.
The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.
The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
La stratégie actuelle de contrôle de la qualité de l’anode est inadéquate pour détecter les anodes défectueuses avant qu’elles ne soient installées dans les cuves d’électrolyse. Des travaux antérieurs ont porté sur la modélisation du procédé de fabrication des anodes afin de prédire leurs propriétés directement après la cuisson en utilisant des méthodes statistiques multivariées. La stratégie de carottage des anodes utilisée à l’usine partenaire fait en sorte que ce modèle ne peut être utilisé que pour prédire les propriétés des anodes cuites aux positions les plus chaudes et les plus froides du four à cuire. Le travail actuel propose une stratégie pour considérer l’histoire thermique des anodes cuites à n’importe quelle position et permettre de prédire leurs propriétés. Il est montré qu’en combinant des variables binaires pour définir l’alvéole et la position de cuisson avec les données routinières mesurées sur le four à cuire, les profils de température des anodes cuites à différentes positions peuvent être prédits. Également, ces données ont été incluses dans le modèle pour la prédiction des propriétés des anodes. Les résultats de prédiction ont été validés en effectuant du carottage supplémentaire et les performances du modèle sont concluantes pour la densité apparente et réelle, la force de compression, la réactivité à l’air et le Lc et ce peu importe la position de cuisson.
Resumo:
This dissertation proposes statistical methods to formulate, estimate and apply complex transportation models. Two main problems are part of the analyses conducted and presented in this dissertation. The first method solves an econometric problem and is concerned with the joint estimation of models that contain both discrete and continuous decision variables. The use of ordered models along with a regression is proposed and their effectiveness is evaluated with respect to unordered models. Procedure to calculate and optimize the log-likelihood functions of both discrete-continuous approaches are derived, and difficulties associated with the estimation of unordered models explained. Numerical approximation methods based on the Genz algortithm are implemented in order to solve the multidimensional integral associated with the unordered modeling structure. The problems deriving from the lack of smoothness of the probit model around the maximum of the log-likelihood function, which makes the optimization and the calculation of standard deviations very difficult, are carefully analyzed. A methodology to perform out-of-sample validation in the context of a joint model is proposed. Comprehensive numerical experiments have been conducted on both simulated and real data. In particular, the discrete-continuous models are estimated and applied to vehicle ownership and use models on data extracted from the 2009 National Household Travel Survey. The second part of this work offers a comprehensive statistical analysis of free-flow speed distribution; the method is applied to data collected on a sample of roads in Italy. A linear mixed model that includes speed quantiles in its predictors is estimated. Results show that there is no road effect in the analysis of free-flow speeds, which is particularly important for model transferability. A very general framework to predict random effects with few observations and incomplete access to model covariates is formulated and applied to predict the distribution of free-flow speed quantiles. The speed distribution of most road sections is successfully predicted; jack-knife estimates are calculated and used to explain why some sections are poorly predicted. Eventually, this work contributes to the literature in transportation modeling by proposing econometric model formulations for discrete-continuous variables, more efficient methods for the calculation of multivariate normal probabilities, and random effects models for free-flow speed estimation that takes into account the survey design. All methods are rigorously validated on both real and simulated data.
Resumo:
Background: The evidence base on end-of-life care in acute stroke is limited, particularly with regard to recognising dying and related decision-making. There is also limited evidence to support the use of end-of-life care pathways (standardised care plans) for patients who are dying after stroke. Aim: This study aimed to explore the clinical decision-making involved in placing patients on an end-of-life care pathway, evaluate predictors of care pathway use, and investigate the role of families in decision-making. The study also aimed to examine experiences of end-of-life care pathway use for stroke patients, their relatives and the multi-disciplinary health care team. Methods: A mixed methods design was adopted. Data were collected in four Scottish acute stroke units. Case-notes were identified prospectively from 100 consecutive stroke deaths and reviewed. Multivariate analysis was performed on case-note data. Semi-structured interviews were conducted with 17 relatives of stroke decedents and 23 healthcare professionals, using a modified grounded theory approach to collect and analyse data. The VOICES survey tool was also administered to the bereaved relatives and data were analysed using descriptive statistics and thematic analysis of free-text responses. Results: Relatives often played an important role in influencing aspects of end-of-life care, including decisions to use an end-of-life care pathway. Some relatives experienced enduring distress with their perceived responsibility for care decisions. Relatives felt unprepared for and were distressed by prolonged dying processes, which were often associated with severe dysphagia. Pro-active information-giving by staff was reported as supportive by relatives. Healthcare professionals generally avoided discussing place of care with families. Decisions to use an end-of-life care pathway were not predicted by patients’ demographic characteristics; decisions were generally made in consultation with families and the extended health care team, and were made within regular working hours. Conclusion: Distressing stroke-related issues were more prominent in participants’ accounts than concerns with the end-of-life care pathway used. Relatives sometimes perceived themselves as responsible for important clinical decisions. Witnessing prolonged dying processes was difficult for healthcare professionals and families, particularly in relation to the management of persistent major swallowing difficulties.
Resumo:
Aim: To evaluate the association between oral health status, socio-demographic and behavioral factors with the pattern of maturity of normal epithelial oral mucosa. Methods: Exfoliative cytology specimens were collected from 117 men from the border of the tongue and floor of the mouth on opposite sides. Cells were stained with the Papanicolaou method and classified into: anucleated, superficial cells with nuclei, intermediate and parabasal cells. Quantification was made by selecting the first 100 cells in each glass slide. Sociodemographic and behavioral variables were collected from a structured questionnaire. Oral health was analyzed by clinical examination, recording decayed, missing and filled teeth index (DMFT) and use of prostheses. Multivariable linear regression models were applied. Results: No significant differences for all studied variables influenced the pattern of maturation of the oral mucosa except for alcohol consumption. There was an increase of cell surface layers of the epithelium with the chronic use of alcohol. Conclusions: It is appropriate to use Papanicolaou cytopathological technique to analyze the maturation pattern of exposed subjects, with a strong recommendation for those who use alcohol - a risk factor for oral cancer, in which a change in the proportion of cell types is easily detected.