26 resultados para Automatic Analysis of Multivariate Categorical Data Sets
em University of Queensland eSpace - Australia
Resumo:
This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.
Resumo:
We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright (C) 2003 John Wiley Sons, Ltd.
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
The aim of this report is to describe the use of WinBUGS for two datasets that arise from typical population pharmacokinetic studies. The first dataset relates to gentamicin concentration-time data that arose as part of routine clinical care of 55 neonates. The second dataset incorporated data from 96 patients receiving enoxaparin. Both datasets were originally analyzed by using NONMEM. In the first instance, although NONMEM provided reasonable estimates of the fixed effects parameters it was unable to provide satisfactory estimates of the between-subject variance. In the second instance, the use of NONMEM resulted in the development of a successful model, albeit with limited available information on the between-subject variability of the pharmacokinetic parameters. WinBUGS was used to develop a model for both of these datasets. Model comparison for the enoxaparin dataset was performed by using the posterior distribution of the log-likelihood and a posterior predictive check. The use of WinBUGS supported the same structural models tried in NONMEM. For the gentamicin dataset a one-compartment model with intravenous infusion was developed, and the population parameters including the full between-subject variance-covariance matrix were available. Analysis of the enoxaparin dataset supported a two compartment model as superior to the one-compartment model, based on the posterior predictive check. Again, the full between-subject variance-covariance matrix parameters were available. Fully Bayesian approaches using MCMC methods, via WinBUGS, can offer added value for analysis of population pharmacokinetic data.
Resumo:
Background Our aim was to calculate the global burden of disease and risk factors for 2001, to examine regional trends from 1990 to 2001, and to provide a starting point for the analysis of the Disease Control Priorities Project (DCPP). Methods We calculated mortality, incidence, prevalence, and disability adjusted life years (DALYs) for 136 diseases and injuries, for seven income/geographic country groups. To assess trends, we re-estimated all-cause mortality for 1990 with the same methods as for 2001. We estimated mortality and disease burden attributable to 19 risk factors. Findings About 56 million people died in 2001. Of these, 10.6 million were children, 99% of whom lived in low-and-middle-income countries. More than half of child deaths in 2001 were attributable to acute respiratory infections, measles, diarrhoea, malaria, and HIV/AIDS. The ten leading diseases for global disease burden were perinatal conditions, lower respiratory infections, ischaemic heart disease, cerebrovascular disease, HIV/AIDS, diarrhoeal diseases, unipolar major depression, malaria, chronic obstructive pulmonary disease, and tuberculosis. There was a 20% reduction in global disease burden per head due to communicable, maternal, perinatal, and nutritional conditions between 1990 and 2001. Almost half the disease burden in low-and-middle-income countries is now from non-communicable diseases (disease burden per head in Sub-Saharan Africa and the low-and-middle-income countries of Europe and Central Asia increased between 1990 and 2001). Undernutrition remains the leading risk factor for health loss. An estimated 45% of global mortality and 36% of global disease burden are attributable to the joint hazardous effects of the 19 risk factors studied. Uncertainty in all-cause mortality estimates ranged from around 1% in high-income countries to 15-20% in Sub-Saharan Africa. Uncertainty was larger for mortality from specific diseases, and for incidence and prevalence of non-fatal outcomes. Interpretation Despite uncertainties about mortality and burden of disease estimates, our findings suggest that substantial gains in health have been achieved in most populations, countered by the HIV/AIDS epidemic in Sub-Saharan Africa and setbacks in adult mortality in countries of the former Soviet Union. our results on major disease, injury, and risk factor causes of loss of health, together with information on the cost-effectiveness of interventions, can assist in accelerating progress towards better health and reducing the persistent differentials in health between poor and rich countries.
Resumo:
Background & Aims: Steatosis is a frequent histologic finding in chronic hepatitis C (CHC), but it is unclear whether steatosis is an independent predictor for liver fibrosis. We evaluated the association between steatosis and fibrosis and their common correlates in persons with CHC and in subgroup analyses according to hepatitis C virus (HCV) genotype and body mass index. Methods: We conducted a meta-analysis on individual data from 3068 patients with histologically confirmed CHC recruited from 10 clinical centers in Italy, Switzerland, France, Australia, and the United States. Results: Steatosis was present in 1561 patients (50.9%) and fibrosis in 2688 (87.6%). HCV genotype was 1 in :1694 cases (55.2%), 2 in 563 (18.4%), 3 in 669 (21.8%), and 4 in :142 (4.6%). By stepwise logistic regression, steatosis was associated independently with genotype 3, the presence of fibrosis, diabetes, hepatic inflammation, ongoing alcohol abuse, higher body mass index, and older age. Fibrosis was associated independently with inflammatory activity, steatosis, male sex, and older age, whereas HCV genotype 2 was associated with reduced fibrosis. In the subgroup analyses, the association between steatosis and fibrosis invariably was dependent on a simultaneous association between steatosis and hepatic inflammation. Conclusions: In this large and geographically different group of CHC patients, steatosis is confirmed as significantly and independently associated with fibrosis in CHC. Hepatic inflammation may mediate fibrogenesis in patients with liver steatosis. Control of metabolic factors (such as overweight, via lifestyle adjustments) appears important in the management of CHC.
Resumo:
Historically, few articles have addressed the use of district level mill production data for analysing the effect of varietal change on sugarcane productivity trends. This appears to be due to lack of compiled district data sets and appropriate methods by which to analyse these data. Recently, varietal data on tonnes of sugarcane per hectare (TCH), sugar content (CCS), and their product, tonnes of sugar content per hectare (TSH) on a district basis, have been compiled. This study was conducted to develop a methodology for regular analysis of such data from mill districts to assess productivity trends over time, accounting for variety and variety x environment interaction effects for 3 mill districts (Mulgrave, Babinda, and Tully) from 1958 to 1995. Restricted maximum likelihood methodology was used to analyse the district level data and best linear unbiased predictors for random effects, and best linear unbiased estimates for fixed effects were computed in a mixed model analysis. In the combined analysis over districts, Q124 was the top ranking variety for TCH, and Q120 was top ranking for both CCS and TSH. Overall production for TCH increased over the 38-year period investigated. Some of this increase can be attributed to varietal improvement, although the predictors for TCH have shown little progress since the introduction of Q99 in 1976. Although smaller gains have been made in varietal improvement for CCS, overall production for CCS decreased over the 38 years due to non-varietal factors. Varietal improvement in TSH appears to have peaked in the mid-1980s. Overall production for TSH remained stable over time due to the varietal increase in TCH and the non-varietal decrease in CCS.
Resumo:
For the improvement of genetic material suitable for on farm use under low-input conditions, participatory and formal plant breeding strategies are frequently presented as competing options. A common frame of reference to phrase mechanisms and purposes related to breeding strategies will facilitate clearer descriptions of similarities and differences between participatory plant breeding and formal plant breeding. In this paper an attempt is made to develop such a common framework by means of a statistically inspired language that acknowledges the importance of both on farm trials and research centre trials as sources of information for on farm genetic improvement. Key concepts are the genetic correlation between environments, and the heterogeneity of phenotypic and genetic variance over environments. Classic selection response theory is taken as the starting point for the comparison of selection trials (on farm and research centre) with respect to the expected genetic improvement in a target environment (low-input farms). The variance-covariance parameters that form the input for selection response comparisons traditionally come from a mixed model fit to multi-environment trial data. In this paper we propose a recently developed class of mixed models, namely multiplicative mixed models, also called factor-analytic models, for modelling genetic variances and covariances (correlations). Mixed multiplicative models allow genetic variances and covariances to be dependent on quantitative descriptors of the environment, and confer a high flexibility in the choice of variance-covariance structure, without requiring the estimation of a prohibitively high number of parameters. As a result detailed considerations regarding selection response comparisons are facilitated. ne statistical machinery involved is illustrated on an example data set consisting of barley trials from the International Center for Agricultural Research in the Dry Areas (ICARDA). Analysis of the example data showed that participatory plant breeding and formal plant breeding are better interpreted as providing complementary rather than competing information.
Resumo:
To be able to determine the grain size obtained from the addition of a grain refining master alloy, the relationship between grain size (d), solute content (defined by the growth restriction factor Q), and the potency and number density of nucleant particles needs to be understood. A study was undertaken on aluminium alloys where additions of TiB2 and Ti were made to eight wrought aluminum alloys covering a range of alloying elements and compositions. It was found from analysis of the data that d = a/(3)root pct TiB2 + b/Q. From consideration of the experimental data and from further analysis of previously published data, it is shown that the coefficients a and b relate to characteristics of the nucleant particles added by a grain refiner. The term a is related to the maximum density of active TiB2 nucleant particles within the melt, while b is related to their potency. By using the analysis methodology presented in this article, the performance characteristics of different master alloys were defined and the effects of Zr and Si on the poisoning of grain refinement were illustrated.
Resumo:
Objective: Five double-blind, randomized, saline-controlled trials (RCTs) were included in the United States marketing application for an intra-articular hyaluronan (IA-HA) product for the treatment of osteoarthritis (OA) of the knee. We report an integrated analysis of the primary Case Report Form (CRF) data from these trials. Method. Trials were similar in design, patient population and outcome measures - all included the Lequesne Algofunctional Index (LI), a validated composite index of pain and function, evaluating treatment over 3 months. Individual patient data were pooled; a repeated measures analysis of covariance was performed in the intent-to-treat (ITT) population. Analyses utilized both fixed and random effects models. Safety data from the five RCTs were summarized. Results: A total of 1155 patients with radiologically confirmed knee OA were enrolled: 619 received three or five IA-HA injections; 536 received. placebo saline injections. In the active and control groups, mean ages were 61.8 and 61.4 years; 62.4% and 58.8% were women; baseline total Lequesne scores 11.03 and 11.30, respectively. Integrated analysis of the pooled data set found a statistically significant reduction (P < 0.001) in total Lequesne score with hyaluronan (HA) (-2.68) vs placebo (-2.00); estimated difference -0.68 (95% CI: -0.56 to -0.79), effect size 0.20. Additional modeling approaches confirmed robustness of the analyses. Conclusions: This integrated analysis demonstrates that multiple design factors influence the results of RCTs assessing efficacy of intra-articular (IA) therapies, and that integrated analyses based on primary data differ from meta-analyses using transformed data. (C) 2006 OsteoArthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.