863 resultados para Factor Analysis, Statistical
Resumo:
"UIUCDCS-R-74-653"
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
In Statnotes 24 and 25, multiple linear regression, a statistical method that examines the relationship between a single dependent variable (Y) and two or more independent variables (X), was described. The principle objective of such an analysis was to determine which of the X variables had a significant influence on Y and to construct an equation that predicts Y from the X variables. ‘Principal components analysis’ (PCA) and ‘factor analysis’ (FA) are also methods of examining the relationships between different variables but they differ from multiple regression in that no distinction is made between the dependent and independent variables, all variables being essentially treated the same. Originally, PCA and FA were regarded as distinct methods but in recent times they have been combined into a single analysis, PCA often being the first stage of a FA. The basic objective of a PCA/FA is to examine the relationships between the variables or the ‘structure’ of the variables and to determine whether these relationships can be explained by a smaller number of ‘factors’. This statnote describes the use of PCA/FA in the analysis of the differences between the DNA profiles of different MRSA strains introduced in Statnote 26.
Resumo:
This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.
Resumo:
This study is an exploratory analysis of an operational measure for resource development strategies, and an exploratory analysis of internal organizational contingencies influencing choices of these strategies in charitable nonprofit organizations. The study provides conceptual guidance for advancing understanding about resource development in the nonprofit sector. The statistical findings are, however, inconclusive without further rigorous examination. A three category typology based on organization technology is initially presented to define the strategies. Three dimensions of internal organizational contingencies explored represent organization identity, professional staff, and boards of directors. Based on relevant literature and key informant interviews, an original survey was administered by mail to a national sample of nonprofit organizations. The survey collected data on indicators of the proposed strategy types and selected contingencies. Factor analysis extracted two of the initial categories in the typology. The Building Resource Development Infrastructure Strategy encompasses information technology, personnel, legal structures, and policies facilitating fund development. The Building Resource Development Infrastructure Strategy encompasses the mission, service niche, and type of service delivery forming the basis for seeking financial support. Linear regressions with each strategy type as the dependent variable identified distinct and common contingencies which may partly explain choices of strategies. Discriminant analysis suggests the potential predictive accuracy of the contingencies. Follow-up case studies with survey respondents provide additional criteria for operationalizing future measures of resource development strategies, and support and expand the analysis on contingencies. The typology offers a beginning framework for defining alternative approaches to resource development, and for exploring organization capacity specific to each approach. Contingencies that may be integral components of organization capacity are funding, leadership frame, background and experience, staff and volunteer effort, board member support, and relationships in the external environment. Based on these findings, management questions are offered for nonprofit organization stakeholders to consider in planning for resource development. Lessons learned in designing and conducting this study are also provided to enhance future related research. ^
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Dynamic positron emission tomography (PET) imaging can be used to track the distribution of injected radio-labelled molecules over time in vivo. This is a powerful technique, which provides researchers and clinicians the opportunity to study the status of healthy and pathological tissue by examining how it processes substances of interest. Widely used tracers include 18F-uorodeoxyglucose, an analog of glucose, which is used as the radiotracer in over ninety percent of PET scans. This radiotracer provides a way of quantifying the distribution of glucose utilisation in vivo. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue function. As the residue represents the amount of tracer remaining in the tissue, this can be thought of as a survival function; these functions been examined in great detail by the statistics community. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as ow, ux and volume of distribution. This thesis presents a Markov chain formulation of blood tissue exchange and explores how this relates to established compartmental forms. A nonparametric approach to the estimation of the residue is examined and the improvement in this model relative to compartmental model is evaluated using simulations and cross-validation techniques. The reference distribution of the test statistics, generated in comparing the models, is also studied. We explore these models further with simulated studies and an FDG-PET dataset from subjects with gliomas, which has previously been analysed with compartmental modelling. We also consider the performance of a recently proposed mixture modelling technique in this study.
Resumo:
Background: There are innumerable diabetes studies that have investigated associations between risk factors, protective factors, and health outcomes; however, these individual predictors are part of a complex network of interacting forces. Moreover, there is little awareness about resilience or its importance in chronic disease in adulthood, especially diabetes. Thus, this is the first study to: (1) extensively investigate the relationships among a host of predictors and multiple adaptive outcomes; and (2) conceptualise a resilience model among people with diabetes. Methods: This cross-sectional study was divided into two research studies. Study One was to translate two diabetes-specific instruments (Problem Areas In Diabetes, PAID; Diabetes Coping Measure, DCM) into a Chinese version and to examine their psychometric properties for use in Study Two in a convenience sample of 205 outpatients with type 2 diabetes. In Study Two, an integrated theoretical model is developed and evaluated using the structural equation modelling (SEM) technique. A self-administered questionnaire was completed by 345 people with type 2 diabetes from the endocrine outpatient departments of three hospitals in Taiwan. Results: Confirmatory factor analyses confirmed a one-factor structure of the PAID-C which was similar to the original version of the PAID. Strong content validity of the PAID-C was demonstrated. The PAID-C was associated with HbA1c and diabetes self-care behaviours, confirming satisfactory criterion validity. There was a moderate relationship between the PAID-C and the Perceived Stress Scale, supporting satisfactory convergent validity. The PAID-C also demonstrated satisfactory stability and high internal consistency. A four-factor structure and strong content validity of the DCM-C was confirmed. Criterion validity demonstrated that the DCM-C was significantly associated with HbA1c and diabetes self-care behaviours. There was a statistical correlation between the DCM-C and the Revised Ways of Coping Checklist, suggesting satisfactory convergent validity. Test-retest reliability demonstrated satisfactory stability of the DCM-C. The total scale of the DCM-C showed adequate internal consistency. Age, duration of diabetes, diabetes symptoms, diabetes distress, physical activity, coping strategies, and social support were the most consistent factors associated with adaptive outcomes in adults with diabetes. Resilience was positively associated with coping strategies, social support, health-related quality of life, and diabetes self-care behaviours. Results of the structural equation modelling revealed protective factors had a significant direct effect on adaptive outcomes; however, the construct of risk factors was not significantly related to adaptive outcomes. Moreover, resilience can moderate the relationships among protective factors and adaptive outcomes, but there were no interaction effects of risk factors and resilience on adaptive outcomes. Conclusion: This study contributes to an understanding of how risk factors and protective factors work together to influence adaptive outcomes in blood sugar control, health-related quality of life, and diabetes self-care behaviours. Additionally, resilience is a positive personality characteristic and may be importantly involved in the adjustment process among people living with type 2 diabetes.
Resumo:
OBJECTIVES: To develop and validate a wandering typology. ---------- DESIGN: Cross-sectional, correlational descriptive design. ---------- SETTING:: Twenty-two nursing homes and six assisted living facilities. ---------- PARTICIPANTS: One hundred forty-two residents with dementia who spoke English, met Diagnostic and Statistical Manual for Mental Disorders, Fourth Edition, criteria for dementia, scored less than 24 on the Mini-Mental State Examination (MMSE), were ambulatory (with or without assistive device), and maintained a stable regime of psychotropic medications were studied. ---------- MEASUREMENTS: Data on wandering were collected using direct observations, plotted serially according to rate and duration to yield 21 parameters, and reduced through factor analysis to four components: high rate, high duration, low to moderate rate and duration, and time of day. Other measures included the MMSE, Minimum Data Set 2.0 mobility items, Cumulative Illness Rating Scale—Geriatric, and tympanic body temperature readings. ---------- RESULTS: Three groups of wanderers were identified through cluster analysis: classic, moderate, and subclinical. MMSE, mobility, and cardiac and upper and lower gastrointestinal problems differed between groups of wanderers and in comparison with nonwanderers. ---------- CONCLUSION: Results have implications for improving identification of wanderers and treatment of possible contributing factors.
Resumo:
The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.
Resumo:
Advances in symptom management strategies through a better understanding of cancer symptom clusters depend on the identification of symptom clusters that are valid and reliable. The purpose of this exploratory research was to investigate alternative analytical approaches to identify symptom clusters for patients with cancer, using readily accessible statistical methods, and to justify which methods of identification may be appropriate for this context. Three studies were undertaken: (1) a systematic review of the literature, to identify analytical methods commonly used for symptom cluster identification for cancer patients; (2) a secondary data analysis to identify symptom clusters and compare alternative methods, as a guide to best practice approaches in cross-sectional studies; and (3) a secondary data analysis to investigate the stability of symptom clusters over time. The systematic literature review identified, in 10 years prior to March 2007, 13 cross-sectional studies implementing multivariate methods to identify cancer related symptom clusters. The methods commonly used to group symptoms were exploratory factor analysis, hierarchical cluster analysis and principal components analysis. Common factor analysis methods were recommended as the best practice cross-sectional methods for cancer symptom cluster identification. A comparison of alternative common factor analysis methods was conducted, in a secondary analysis of a sample of 219 ambulatory cancer patients with mixed diagnoses, assessed within one month of commencing chemotherapy treatment. Principal axis factoring, unweighted least squares and image factor analysis identified five consistent symptom clusters, based on patient self-reported distress ratings of 42 physical symptoms. Extraction of an additional cluster was necessary when using alpha factor analysis to determine clinically relevant symptom clusters. The recommended approaches for symptom cluster identification using nonmultivariate normal data were: principal axis factoring or unweighted least squares for factor extraction, followed by oblique rotation; and use of the scree plot and Minimum Average Partial procedure to determine the number of factors. In contrast to other studies which typically interpret pattern coefficients alone, in these studies symptom clusters were determined on the basis of structure coefficients. This approach was adopted for the stability of the results as structure coefficients are correlations between factors and symptoms unaffected by the correlations between factors. Symptoms could be associated with multiple clusters as a foundation for investigating potential interventions. The stability of these five symptom clusters was investigated in separate common factor analyses, 6 and 12 months after chemotherapy commenced. Five qualitatively consistent symptom clusters were identified over time (Musculoskeletal-discomforts/lethargy, Oral-discomforts, Gastrointestinaldiscomforts, Vasomotor-symptoms, Gastrointestinal-toxicities), but at 12 months two additional clusters were determined (Lethargy and Gastrointestinal/digestive symptoms). Future studies should include physical, psychological, and cognitive symptoms. Further investigation of the identified symptom clusters is required for validation, to examine causality, and potentially to suggest interventions for symptom management. Future studies should use longitudinal analyses to investigate change in symptom clusters, the influence of patient related factors, and the impact on outcomes (e.g., daily functioning) over time.
Resumo:
Purpose: To undertake rigorous psychometric testing of the newly developed contemporary work environment measure (the Brisbane Practice Environment Measure [B-PEM]) using exploratory factor analysis and confirmatory factor analysis. Methods: Content validity of the 33-item measure was established by a panel of experts. Initial testing involved 195 nursing staff using principal component factor analysis with varimax rotation (orthogonal) and Cronbach's alpha coefficients. Confirmatory factor analysis was conducted using data from a further 983 nursing staff. Results: Principal component factor analysis yielded a four-factor solution with eigenvalues greater than 1 that explained 52.53% of the variance. These factors were then verified using confirmatory factor analysis. Goodness-of-fit indices showed an acceptable fit overall with the full model, explaining 21% to 73% of the variance. Deletion of items took place throughout the evolution of the instrument, resulting in a 26-item, four-factor measure called the Brisbane Practice Environment Measure-Tested. Conclusions: The B-PEM has undergone rigorous psychometric testing, providing evidence of internal consistency and goodness-of-fit indices within acceptable ranges. The measure can be utilised as a subscale or total score reflective of a contemporary nursing work environment. Clinical Relevance: An up-to-date instrument to measure practice environment may be useful for nursing leaders to monitor the workplace and to assist in identifying areas for improvement, facilitating greater job satisfaction and retention.
Resumo:
Background Outcome expectancies are a key cognitive construct in the etiology, assessment and treatment of Substance Use Disorders. There is a research and clinical need for a cannabis expectancy measure validated in a clinical sample of cannabis users. Method The Cannabis Expectancy Questionnaire (CEQ) was subjected to exploratory (n = 501, mean age 27.45, 78% male) and confirmatory (n = 505, mean age 27.69, 78% male) factor analysis in two separate samples of cannabis users attending an outpatient cannabis treatment program. Weekly cannabis consumption was clinically assessed and patients completed the Severity of Dependence Scale-Cannabis (SDS-C) and the General Health Questionnaire (GHQ-28). Results Two factors representing Negative Cannabis Expectancies and Positive Cannabis Expectancies were identified. These provided a robust statistical and conceptual fit for the data. Internal reliabilities were high. Negative expectancies were associated with greater dependence severity (as measured by the SDS) and positive expectancies with higher consumption. The interaction of positive and negative expectancies was consistently significantly associated with self-reported functioning across all four GHQ-28 scales (Somatic Concerns, Anxiety, Social Dysfunction and Depression). Specifically, within the context of high positive cannabis expectancy, higher negative expectancy was predictive of more impaired functioning. By contrast, within the context of low positive cannabis expectancy, higher negative expectancy was predictive of better functioning. Conclusions The CEQ is the first cannabis expectancy measure to be validated in a sample of cannabis users in treatment. Negative and positive cannabis expectancy domains were uniquely associated with consumption, dependence severity and self-reported mental health functioning.
Resumo:
Analytical expressions are derived for the mean and variance, of estimates of the bispectrum of a real-time series assuming a cosinusoidal model. The effects of spectral leakage, inherent in discrete Fourier transform operation when the modes present in the signal have a nonintegral number of wavelengths in the record, are included in the analysis. A single phase-coupled triad of modes can cause the bispectrum to have a nonzero mean value over the entire region of computation owing to leakage. The variance of bispectral estimates in the presence of leakage has contributions from individual modes and from triads of phase-coupled modes. Time-domain windowing reduces the leakage. The theoretical expressions for the mean and variance of bispectral estimates are derived in terms of a function dependent on an arbitrary symmetric time-domain window applied to the record. the number of data, and the statistics of the phase coupling among triads of modes. The theoretical results are verified by numerical simulations for simple test cases and applied to laboratory data to examine phase coupling in a hypothesis testing framework
Resumo:
Many studies into construction procurement methods reveal evidence of a need to change the culture and attitude in the construction industry, transition from traditional adversarial relationships to cooperative and collaborative relationships. At the same time there is also increasing concern and discussion on alternative procurement methods, involving a movement away from traditional procurement systems. Relational contracting approaches, such as partnering and relationship management, are business strategies that align the objectives of clients, commercial participants and stakeholders. It provides a collaborative environment and a framework for all participants to adapt their behaviour to project objectives and allows for engagement of those subcontractors and suppliers down the supply chain. The efficacy of relationship management in the client and contractor groups is proven and well documented. However, the industry has a history of slow implementation of relational contracting down the supply chain. Furthermore, there exists little research on relationship management conducted in the supply chain context. This research aims to explore the association between relational contracting structures and processes and supply chain sustainability in the civil engineering construction industry. It endeavours to shed light on the practices and prerequisites for relationship management implementation success and for supply sustainability to develop. The research methodology is a triangulated approach based on Cheung.s (2006) earlier research where questionnaire survey, interviews and case studies were conducted. This new research includes a face-to-face questionnaire survey that was carried out with 100 professionals from 27 contracting organisations in Queensland from June 2008 to January 2009. A follow-up survey sub-questionnaire, further examining project participants. perspectives was sent to another group of professionals (as identified in the main questionnaire survey). Statistical analysis including multiple regression, correlation, principal component factor analysis and analysis of variance were used to identify the underlying dimensions and test the relationships among variables. Interviews and case studies were conducted to assist in providing a deeper understanding as well as explaining findings of the quantitative study. The qualitative approaches also gave the opportunity to critique and validate the research findings. This research presents the implementation of relationship management from the contractor.s perspective. Findings show that the adaption of relational contracting approach in the supply chain is found to be limited; contractors still prefer to keep the suppliers and subcontractors at arm.s length. This research shows that the degree of match and mismatch between organisational structuring and organisational process has an impact on staff.s commitment level and performance effectiveness. Key issues affecting performance effectiveness and relationship effectiveness include total influence between parties, access to information, personal acquaintance, communication process, risk identification, timely problem solving and commercial framework. Findings also indicate that alliance and Early Contractor Involvement (ECI) projects achieve higher performance effectiveness at both short-term and long-term levels compared to projects with either no or partial relationship management adopted.