924 resultados para Automatic Analysis of Multivariate Categorical Data Sets
Resumo:
Chemotherapy-induced oral mucositis is a frequent therapeutic challenge in cancer patients. The purpose of this retrospective study was to estimate the prevalence and risk factors of oral mucositis in 169 acute lymphoblastic leukaemia (ALL) patients treated according to different chemotherapeutic trials at the Darcy Vargas Children`s Hospital from 1994 to 2005. Demographic data, clinical history, chemotherapeutic treatment and patients` follow-up were recorded. The association of oral mucositis with age, gender, leucocyte counts at diagnosis and treatment was assessed by the chi-squared test and multivariate regression analysis. Seventy-seven ALL patients (46%) developed oral mucositis during the treatment. Patient age (P = 0.33), gender (P = 0.08) and leucocyte counts at diagnosis (P = 0.34) showed no correlation with the occurrence of oral mucositis. Multivariate regression analysis showed a significant risk for oral mucositis (P = 0.009) for ALL patients treated according to the ALL-BFM-95 protocol. These results strongly suggest the greater stomatotoxic effect of the ALL-BFM-95 trial when compared with Brazilian trials. We concluded that chemotherapy-induced oral mucositis should be systematically analysed prospectively in specialized centres for ALL treatment to establish the degree of toxicity of chemotherapeutic drugs and to improve the quality of life of patients based on more effective therapeutic and prophylactic approaches for prevention of its occurrence. Oral Diseases (2008) 14, 761-766
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
Avicennia marina is an important mangrove species with a wide geographical and climatic distribution which suggests that large amounts of genetic diversity are available for conservation and breeding programs. In this study we compare the informativeness of AFLPs and SSRs for assessing genetic diversity within and among individuals, populations and subspecies of A. marina in Australia. Our comparison utilized three SSR loci and three AFLP primer sets that were known to be polymorphic, and could be run in a single analysis on a capillary electrophoresis system, using different-colored fluorescent dyes. A total of 120 individuals representing six populations and three subspecies were samplcd. At the locus level, SSRs were considerably more variable than AFLPs, with a total of 52 alleles and an average heterozygosity of 0.78. Average heterozygosity for AFLPs was 0.193, but all of the 918 bands scored were polymorphic. Thus, AFLPs were considerably more efficient at revealing polymorphic loci than SSRs despite lower average heterozygosities. SSRs detected more genetic differentiation between populations (19 vs 9%) and subspecies (35 vs 11%) than AFLPs. Principal co-ordinate analysis revealed congruent patterns of genetic relationships at the individual, population and subspecific levels for both data sets. Mantel testing confirmed congruence between AFLP and SSR genetic distances among, but not within, population comparisons, indicating that the markers were segregating inde- pendently but that evolutionary groups (populations and subspecies) were similar. Three genetic criteria of importance for defining priorities for ex situ collections or in situ conservation programs (number of alleles, number of locally common alleles and number of private alleles) were correlated between the AFLP and SSR data sets. The congruence between AFLP and SSR data sets suggest that either method, or a combination, is applicable to expanded genetic studies of mangroves. The codominant nature of SSRs makes them ideal for further population-based investigations, such as mating-system analyses, for which the dominant AFLP markers are less well suited. AFLPs may be particularly useful for monitoring propagation programs and identifying duplicates within collections, since a single PCR assay can reveal many loci at once.
Resumo:
Observations of an insect's movement lead to theory on the insect's flight behaviour and the role of movement in the species' population dynamics. This theory leads to predictions of the way the population changes in time under different conditions. If a hypothesis on movement predicts a specific change in the population, then the hypothesis can be tested against observations of population change. Routine pest monitoring of agricultural crops provides a convenient source of data for studying movement into a region and among fields within a region. Examples of the use of statistical and computational methods for testing hypotheses with such data are presented. The types of questions that can be addressed with these methods and the limitations of pest monitoring data when used for this purpose are discussed. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
In this paper we analyzed the adsorption of gases and vapors on graphitised thermal carbon black by using a modified DFT-lattice theory, in which we assume that the behavior of the first layer in the adsorption film is different from those of second and higher layers. The effects of various parameters on the topology of the adsorption isotherm were first investigated, and the model was then applied in the analysis of adsorption data of numerous substances on carbon black. We have found that the first layer in the adsorption film behaves differently from the second and higher layers in such a way that the adsorbate-adsorbate interaction energy in the first layer is less than that of second and higher layers, and the same is observed for the partition function. Furthermore, the adsorbate-adsorbate and adsorbate-adsorbent interaction energies obtained from the fitting are consistently lower than the corresponding values obtained from the viscosity data and calculated from the Lorentz-Berthelot rule, respectively.
Resumo:
The tests that are currently available for the measurement of overexpression of the human epidermal growth factor-2 (HER2) in breast cancer have shown considerable problems in accuracy and interlaboratory reproducibility. Although these problems are partly alleviated by the use of validated, standardised 'kits', there may be considerable cost involved in their use. Prior to testing it may therefore be an advantage to be able to predict from basic pathology data whether a cancer is likely to overexpress HER2. In this study, we have correlated pathology features of cancers with the frequency of HER2 overexpression assessed by immunohistochemistry (IHC) using HercepTest (Dako). In addition, fluorescence in situ hybridisation (FISH) has been used to re-test the equivocal cancers and interobserver variation in assessing HER2 overexpression has been examined by a slide circulation scheme. Of the 1536 cancers, 1144 (74.5%) did not overexpress HER2. Unequivocal overexpression (3+ by IHC) was seen in 186 cancers (12%) and an equivocal result (2+ by IHC) was seen in 206 cancers (13%). Of the 156 IHC 3+ cancers for which complete data was available, 149 (95.5%) were ductal NST and 152 (97%) were histological grade 2 or 3. Only 1 of 124 infiltrating lobular carcinomas (0.8%) showed HER2 overexpression. None of the 49 'special types' of carcinoma showed HER2 overexpression. Re-testing by FISH of a proportion of the IHC 2+ cancers showed that only 25 (23%) of those assessable exhibited HER2 gene amplification, but 46 of the 47 IHC 3+ cancers (98%) were confirmed as showing gene amplification. Circulating slides for the assessment of HER2 score showed a moderate level of agreement between pathologists (kappa 0.4). As a result of this study we would advocate consideration of a triage approach to HER-2 testing. Infiltrating lobular and special types of carcinoma may not need to be routinely tested at presentation nor may grade 1 NST carcinomas in which only 1.4% have been shown to overexpress HER2. Testing of these carcinomas may be performed when HER2 status is required to assist in therapeutic or other clinical/prognostic decision-making. The highest yield of HER2 overexpressing carcinomas is seen in the grade 3 NST subgroup in which 24% are positive by IHC. (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
The use of a fitted parameter watershed model to address water quantity and quality management issues requires that it be calibrated under a wide range of hydrologic conditions. However, rarely does model calibration result in a unique parameter set. Parameter nonuniqueness can lead to predictive nonuniqueness. The extent of model predictive uncertainty should be investigated if management decisions are to be based on model projections. Using models built for four neighboring watersheds in the Neuse River Basin of North Carolina, the application of the automated parameter optimization software PEST in conjunction with the Hydrologic Simulation Program Fortran (HSPF) is demonstrated. Parameter nonuniqueness is illustrated, and a method is presented for calculating many different sets of parameters, all of which acceptably calibrate a watershed model. A regularization methodology is discussed in which models for similar watersheds can be calibrated simultaneously. Using this method, parameter differences between watershed models can be minimized while maintaining fit between model outputs and field observations. In recognition of the fact that parameter nonuniqueness and predictive uncertainty are inherent to the modeling process, PEST's nonlinear predictive analysis functionality is then used to explore the extent of model predictive uncertainty.
Resumo:
The increasing availability of mobility data and the awareness of its importance and value have been motivating many researchers to the development of models and tools for analyzing movement data. This paper presents a brief survey of significant research works about modeling, processing and visualization of data about moving objects. We identified some key research fields that will provide better features for online analysis of movement data. As result of the literature review, we suggest a generic multi-layer architecture for the development of an online analysis processing software tool, which will be used for the definition of the future work of our team.
Resumo:
This paper analyzes musical opus from the point of view of two mathematical tools, namely the entropy and the multidimensional scaling (MDS). The Fourier analysis reveals a fractional dynamics, but the time rhythm variations are diluted along the spectrum. The combination of time-window entropy and MDS copes with the time characteristics and is well suited to treat a large volume of data. The experiments focus on a large number of compositions classified along three sets of musical styles, namely “Classical”, “Jazz”, and “Pop & Rock” compositions. Without lack of generality, the present study describes the application of the tools and the sets of musical compositions in a methodology leading to clear conclusions, but extensions to other possibilities are straightforward. The results reveal significant differences in the musical styles, demonstrating the feasibility of the proposed strategy and motivating further developments toward a dynamical analysis of musical compositions.
Resumo:
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.
Resumo:
27th Annual Conference of the European Cetacean Society. Setúbal, Portugal, 8-10 April 2013.
Resumo:
OBJECTIVE: To estimate the spatial intensity of urban violence events using wavelet-based methods and emergency room data. METHODS: Information on victims attended at the emergency room of a public hospital in the city of São Paulo, Southeastern Brazil, from January 1, 2002 to January 11, 2003 were obtained from hospital records. The spatial distribution of 3,540 events was recorded and a uniform random procedure was used to allocate records with incomplete addresses. Point processes and wavelet analysis technique were used to estimate the spatial intensity, defined as the expected number of events by unit area. RESULTS: Of all georeferenced points, 59% were accidents and 40% were assaults. There is a non-homogeneous spatial distribution of the events with high concentration in two districts and three large avenues in the southern area of the city of São Paulo. CONCLUSIONS: Hospital records combined with methodological tools to estimate intensity of events are useful to study urban violence. The wavelet analysis is useful in the computation of the expected number of events and their respective confidence bands for any sub-region and, consequently, in the specification of risk estimates that could be used in decision-making processes for public policies.
Resumo:
3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.