979 resultados para clustering techniques
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.
Resumo:
The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.
Resumo:
Hydatid disease in tropical areas poses a serious diagnostic problem due to the high frequence of cross-reactivity with other endemic helminthic infections. The enzyme-linked-immunosorbent assay (ELISA) and the double diffusion arc 5 showed respectively a sensitivity of 73% and 57% and a specificity of 84-95% and 100%. However, the specificity of ELISA was greatly increased by using ovine serum and phosphorylcholine in the diluent buffer. The hydatic antigen obtained from ovine cyst fluid showed three main protein bands of 64,58 and 30 KDa using SDS PAGE and immunoblotting. Sera from patients with onchocerciasis, cysticercosis, toxocariasis and Strongyloides infection cross-reacted with the 64 and 58 KDa bands by immunoblotting. However, none of the analyzed sera recognized the 30 KDa band, that seems to be specific in this assay. The immunoblotting showed a sensitivity of 80% and a specificity of 100% when used to recognize the 30 KDa band.
Resumo:
Dissertation presented to obtain the degree of Doctor of Philosophy in Electrical Engineering, speciality on Perceptional Systems, by the Universidade Nova de Lisboa, Faculty of Sciences and Technology
Resumo:
In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.
Resumo:
Dissertação apresentada para obtenção do Grau de Doutor em Engenharia Informática, pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
Resumo:
More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Systems.
Resumo:
The objective of the present study was to determine the prevalence of certain mycoplasma species, i.e., Mycoplasma hominis, Ureaplasma urealyticum and Mycoplasma penetrans, in urethral swabs from HIV-1 infected patients compared to swabs from a control group. Mycoplasmas were detected by routine culture techniques and by the Polymerase Chain Reaction (PCR) technique, using 16SrRNA generic primers of conserved region and Mycoplasma penetrans specific primers. The positivity rates obtained with the two methods were comparable. Nevertheless, PCR was more sensitive, while the culture techniques allowed the quantification of the isolates. The results showed no significant difference (p < 0.05) in positivity rates between the methods used for mycoplasma detection.
Resumo:
Epidemiologic studies have reported an inverse association between dairy product consumption and cardiometabolic risk factors in adults, but this relation is relatively unexplored in adolescents. We hypothesized that a higher dairy product intake is associated with lower cardiometabolic risk factor clustering in adolescents. To test this hypothesis, a cross-sectional study was conducted with 494 adolescents aged 15 to 18 years from the Azorean Archipelago, Portugal. We measured fasting glucose, insulin, total cholesterol, high-density lipoprotein cholesterol, triglycerides, systolic blood pressure, body fat, and cardiorespiratory fitness. We also calculated homeostatic model assessment and total cholesterol/high-density lipoprotein cholesterol ratio. For each one of these variables, a z score was computed using age and sex. A cardiometabolic risk score (CMRS) was constructed by summing up the z scores of all individual risk factors. High risk was considered to exist when an individual had at least 1 SD from this score. Diet was evaluated using a food frequency questionnaire, and the intake of total dairy (included milk, yogurt, and cheese), milk, yogurt, and cheese was categorized as low (equal to or below the median of the total sample) or “appropriate” (above the median of the total sample).The association between dairy product intake and CMRS was evaluated using separate logistic regression, and the results were adjusted for confounders. Adolescents with high milk intake had lower CMRS, compared with those with low intake (10.6% vs 18.1%, P = .018). Adolescents with appropriate milk intake were less likely to have high CMRS than those with low milk intake (odds ratio, 0.531; 95% confidence interval, 0.302-0.931). No association was found between CMRS and total dairy, yogurt, and cheese intake. Only milk intake seems to be inversely related to CMRS in adolescents.
Resumo:
Paleoparasitology is the study of parasites found in archaeological material. The development of this field of research began with histological identification of helminth eggs in mummy tissues, analysis of coprolites, and recently through molecular biology. An approach to the history of paleoparasitology is reviewed in this paper, with special reference to the studies of ancient DNA identified in archaeological material.
Resumo:
Dissertation presented in fulfilment of the requirements for the Master’s degree in Conservation and Restoration