938 resultados para k-means clustering
Resumo:
In this paper use consider the problem of providing standard errors of the component means in normal mixture models fitted to univariate or multivariate data by maximum likelihood via the EM algorithm. Two methods of estimation of the standard errors are considered: the standard information-based method and the computationally-intensive bootstrap method. They are compared empirically by their application to three real data sets and by a small-scale Monte Carlo experiment.
Resumo:
Osteosarcoma (OS) is the most frequent bone tumor in children and adolescents. Tumor antigens are encoded by genes that are expressed in many types of solid tumors but are silent in normal tissues, with the exception of placenta and male germ-line cells. It has been proposed that antigen tumors are potential tumor markers. The premise of this study is that the identification of novel OS-associated transcripts will lead to a better understanding of the events involved in OS pathogenesis and biology. We analyzed the expression of a panel of seven tumor antigens in OS samples to identify possible tumor markers. After selecting the tumor antigen expressed in most samples of the panel, gene expression profiling was used to identify osteosarcoma-associated molecular alterations. A microarray was employed because of its ability to accurately produce comprehensive expression profiles. PRAME was identified as the tumor antigen expressed in most OS samples; it was detected in 68% of the cases. Microarray results showed differences in expression for genes functioning in cell signaling and adhesion as well as extracellular matrix-related genes, implying that such tumors could indeed differ in regard to distinct patterns of tumorigenesis. The hypothesis inferred in this study was gathered mostly from available data concerning other kinds of tumors. There is circumstantial evidence that PRAME expression might be related to distinct patterns of tumorigenesis. Further investigation is needed to validate the differential expression of genes belonging to tumorigenesis-related pathways in PRAME-positive and PRAME-negative tumors.
Resumo:
In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.
Resumo:
No literature data above atmospheric pressure could be found for the viscosity of TOTIVI. As a consequence, the present viscosity results could only be compared upon extrapolation of the vibrating wire data to 0.1 MPa. Independent viscosity measurements were performed, at atmospheric pressure, using an Ubbelohde capillary in order to compare with the vibrating wire results, extrapolated by means of the above mentioned correlation. The two data sets agree within +/- 1%, which is commensurate with the mutual uncertainty of the experimental methods. Comparisons of the literature data obtained at atmospheric pressure with the present extrapolated vibrating-wire viscosity measurements have shown an agreement within +/- 2% for temperatures up to 339 K and within +/- 3.3% for temperatures up to 368 K. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Solvatochromic UV-Vis shifts of four indicators (4-nitroaniline, 4-nitroanisole, 4-nitrophenol and N,N-dimethy-1-4-nitro aniline) have been measured at 298.15 K in the ternary mixture methano1/1-propanol/acetonitrile (MeOH/1-PrOH/MeCN) in a total of 22 mole fractions, along with 18 additional mole fractions for each of the corresponding binary mixtures, MeOH/1-PrOH, 1-PrOH/MeCN and MeOH/MeCN. These values, combined with our previous experimental results for 2,6-dipheny1-4-(2,4,6-triphenylpyridinium-1-yl)phenolate (Reichardt's betaine dye) in the same mixtures, permitted the computation of the Kamlet-Taft solvent parameters, alpha, beta, and pi*. The rationalization of the spectroscopic behavior of each probe within each mixture's whole mole fraction range was achieved through the use of the Bosch and Roses preferential solvation model. The applied model allowed the identification of synergistic behaviors in MeCN/alcohol mixtures and thus to infer the existence of solvent complexes in solution. Also, the addition of small amounts of MeCN to the binary mixtures was seen to cause a significant variation in pi*, whereas the addition of alcohol to MeCN mixtures always lead to a sudden change in a and The behavior of these parameters in the ternary mixture was shown to be mainly determined by the contributions of the underlying binary mixtures. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
The long term goal of this research is to develop a program able to produce an automatic segmentation and categorization of textual sequences into discourse types. In this preliminary contribution, we present the construction of an algorithm which takes a segmented text as input and attempts to produce a categorization of sequences, such as narrative, argumentative, descriptive and so on. Also, this work aims at investigating a possible convergence between the typological approach developed in particular in the field of text and discourse analysis in French by Adam (2008) and Bronckart (1997) and unsupervised statistical learning.
Resumo:
Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process
Resumo:
BACKGROUND Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD) diagnosis. However, the subjectivity involved in their evaluation has favoured the development of Computer Aided Diagnosis (CAD) Systems. METHODS It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly, Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean Square Error (NMSE) features restricted to be located within a predefined brain activation mask. In order to address the small sample-size problem, the dimension of the feature space was further reduced by: Large Margin Nearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) or Partial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding the classifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis and Energy-based metrics were compared. RESULTS Several experiments were conducted in order to evaluate the proposed LMNN-based feature extraction algorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reduction technique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system was evaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of 92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when a NMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thus outperforming recently reported baseline methods. CONCLUSIONS All the proposed methods turned out to be a valid solution for the presented problem. One of the advances is the robustness of the LMNN algorithm that not only provides higher separation rate between the classes but it also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, their generalization ability is another advance since several experiments were performed on two image modalities (SPECT and PET).
Resumo:
Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices