905 resultados para Classification Methods
Resumo:
Luokittelujärjestelmää suunniteltaessa tarkoituksena on rakentaa systeemi, joka pystyy ratkaisemaan mahdollisimman tarkasti tutkittavan ongelma-alueen. Hahmontunnistuksessa tunnistusjärjestelmän ydin on luokitin. Luokittelun sovellusaluekenttä on varsin laaja. Luokitinta tarvitaan mm. hahmontunnistusjärjestelmissä, joista kuvankäsittely toimii hyvänä esimerkkinä. Myös lääketieteen parissa tarkkaa luokittelua tarvitaan paljon. Esimerkiksi potilaan oireiden diagnosointiin tarvitaan luokitin, joka pystyy mittaustuloksista päättelemään mahdollisimman tarkasti, onko potilaalla kyseinen oire vai ei. Väitöskirjassa on tehty similaarisuusmittoihin perustuva luokitin ja sen toimintaa on tarkasteltu mm. lääketieteen paristatulevilla data-aineistoilla, joissa luokittelutehtävänä on tunnistaa potilaan oireen laatu. Väitöskirjassa esitetyn luokittimen etuna on sen yksinkertainen rakenne, josta johtuen se on helppo tehdä sekä ymmärtää. Toinen etu on luokittimentarkkuus. Luokitin saadaan luokittelemaan useita eri ongelmia hyvin tarkasti. Tämä on tärkeää varsinkin lääketieteen parissa, missä jo pieni tarkkuuden parannus luokittelutuloksessa on erittäin tärkeää. Väitöskirjassa ontutkittu useita eri mittoja, joilla voidaan mitata samankaltaisuutta. Mitoille löytyy myös useita parametreja, joille voidaan etsiä juuri kyseiseen luokitteluongelmaan sopivat arvot. Tämä parametrien optimointi ongelma-alueeseen sopivaksi voidaan suorittaa mm. evoluutionääri- algoritmeja käyttäen. Kyseisessä työssä tähän on käytetty geneettistä algoritmia ja differentiaali-evoluutioalgoritmia. Luokittimen etuna on sen joustavuus. Ongelma-alueelle on helppo vaihtaa similaarisuusmitta, jos kyseinen mitta ei ole sopiva tutkittavaan ongelma-alueeseen. Myös eri mittojen parametrien optimointi voi parantaa tuloksia huomattavasti. Kun käytetään eri esikäsittelymenetelmiä ennen luokittelua, tuloksia pystytään parantamaan.
Resumo:
Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.
Resumo:
PURPOSE: To evaluate the clinical characteristics of the 3 classifications of vitreous seeds in retinoblastoma-dust (class 1), spheres (class 2), and clouds (class 3)-and their responses to intravitreal melphalan. DESIGN: Retrospective, bi-institutional cohort study. PARTICIPANTS: A total of 87 patient eyes received 475 intravitreal injections of melphalan (median dose, 30 μg) given weekly, a median of 5 times (range, 1-12 times). METHODS: At presentation, the vitreous seeds were classified into 3 groups: dust, spheres, and clouds. Indirect ophthalmoscopy, fundus photography, ultrasonography, and ultrasonic biomicroscopy were used to evaluate clinical response to weekly intravitreal melphalan injections and time to regression of vitreous seeds. Kaplan-Meier estimates of time to regression and ocular survival, patient survival, and event-free survival (EFS) were calculated and then compared using the Mantel-Cox test of curve. MAIN OUTCOME MEASURES: Time to regression of vitreous seeds, patient survival, ocular survival, and EFS. RESULTS: The difference in time to regression was significantly different for the 3 seed classes (P < 0.0001): the median time to regression was 0.6, 1.7, and 7.7 months for dust, spheres, and clouds, respectively. Eyes with dust received significantly fewer injections and a lower median and cumulative dose of melphalan, whereas eyes with clouds received significantly more injections and a higher median and cumulative dose of melphalan. Overall, the 2-year Kaplan-Meier estimates for ocular survival, patient survival, and EFS (related to target seeds) were 90.4% (95% confidence interval [CI], 79.7-95.6), 100%, and 98.5% (95% CI, 90-99.7), respectively. CONCLUSIONS: The regression and response of vitreous seeds to intravitreal melphalan are different for each seed classification. The vitreous seed classification can be predictive of time to regression, number, median dose, and cumulative dose of intravitreal melphalan injections required.
Resumo:
Purpose: Wolfram syndrome is a degenerative, recessive rare disease with an onset in childhood. It is caused by mutations in WFS1 or CISD2 genes. More than 200 different variations in WFS1 have been described in patients with Wolfram syndrome, which complicates the establishment of clear genotype-phenotype correlation. The purpose of this study was to elucidate the role of WFS1 mutations and update the natural history of the disease. Methods: This study analyzed clinical and genetic data of 412 patients with Wolfram syndrome published in the last 15 years. Results: (i) 15% of published patients do not fulfill the current inclusion criterion; (ii) genotypic prevalence differences may exist among countries; (iii) diabetes mellitus and optic atrophy might not be the first two clinical features in some patients; (iv) mutations are nonuniformly distributed in WFS1; (v) age at onset of diabetes mellitus, hearing defects, and diabetes insipidus may depend on the patient"s genotypic class; and (vi) disease progression rate might depend on genotypic class. Conclusion: New genotype-phenotype correlations were established, disease progression rate for the general population and for the genotypic classes has been calculated, and new diagnostic criteria have been proposed. The conclusions raised could be important for patient management and counseling as well as for the development of treatments for Wolfram syndrome.
Resumo:
The increase of publicly available sequencing data has allowed for rapid progress in our understanding of genome composition. As new information becomes available we should constantly be updating and reanalyzing existing and newly acquired data. In this report we focus on transposable elements (TEs) which make up a significant portion of nearly all sequenced genomes. Our ability to accurately identify and classify these sequences is critical to understanding their impact on host genomes. At the same time, as we demonstrate in this report, problems with existing classification schemes have led to significant misunderstandings of the evolution of both TE sequences and their host genomes. In a pioneering publication Finnegan (1989) proposed classifying all TE sequences into two classes based on transposition mechanisms and structural features: the retrotransposons (class I) and the DNA transposons (class II). We have retraced how ideas regarding TE classification and annotation in both prokaryotic and eukaryotic scientific communities have changed over time. This has led us to observe that: (1) a number of TEs have convergent structural features and/or transposition mechanisms that have led to misleading conclusions regarding their classification, (2) the evolution of TEs is similar to that of viruses by having several unrelated origins, (3) there might be at least 8 classes and 12 orders of TEs including 10 novel orders. In an effort to address these classification issues we propose: (1) the outline of a universal TE classification, (2) a set of methods and classification rules that could be used by all scientific communities involved in the study of TEs, and (3) a 5-year schedule for the establishment of an International Committee for Taxonomy of Transposable Elements (ICTTE).
Resumo:
Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
Over the past few decades, age estimation of living persons has represented a challenging task for many forensic services worldwide. In general, the process for age estimation includes the observation of the degree of maturity reached by some physical attributes, such as dentition or several ossification centers. The estimated chronological age or the probability that an individual belongs to a meaningful class of ages is then obtained from the observed degree of maturity by means of various statistical methods. Among these methods, those developed in a Bayesian framework offer to users the possibility of coherently dealing with the uncertainty associated with age estimation and of assessing in a transparent and logical way the probability that an examined individual is younger or older than a given age threshold. Recently, a Bayesian network for age estimation has been presented in scientific literature; this kind of probabilistic graphical tool may facilitate the use of the probabilistic approach. Probabilities of interest in the network are assigned by means of transition analysis, a statistical parametric model, which links the chronological age and the degree of maturity by means of specific regression models, such as logit or probit models. Since different regression models can be employed in transition analysis, the aim of this paper is to study the influence of the model in the classification of individuals. The analysis was performed using a dataset related to the ossifications status of the medial clavicular epiphysis and results support that the classification of individuals is not dependent on the choice of the regression model.
Resumo:
In the past few decades, the rise of criminal, civil and asylum cases involving young people lacking valid identification documents has generated an increase in the demand of age estimation. The chronological age or the probability that an individual is older or younger than a given age threshold are generally estimated by means of some statistical methods based on observations performed on specific physical attributes. Among these statistical methods, those developed in the Bayesian framework allow users to provide coherent and transparent assignments which fulfill forensic and medico-legal purposes. The application of the Bayesian approach is facilitated by using probabilistic graphical tools, such as Bayesian networks. The aim of this work is to test the performances of the Bayesian network for age estimation recently presented in scientific literature in classifying individuals as older or younger than 18 years of age. For these exploratory analyses, a sample related to the ossification status of the medial clavicular epiphysis available in scientific literature was used. Results obtained in the classification are promising: in the criminal context, the Bayesian network achieved, on the average, a rate of correct classifications of approximatively 97%, whilst in the civil context, the rate is, on the average, close to the 88%. These results encourage the continuation of the development and the testing of the method in order to support its practical application in casework.
Resumo:
Abstract This work studies the multi-label classification of turns in simple English Wikipedia talk pages into dialog acts. The treated dataset was created and multi-labeled by (Ferschke et al., 2012). The first part analyses dependences between labels, in order to examine the annotation coherence and to determine a classification method. Then, a multi-label classification is computed, after transforming the problem into binary relevance. Regarding features, whereas (Ferschke et al., 2012) use features such as uni-, bi-, and trigrams, time distance between turns or the indentation level of the turn, other features are considered here: lemmas, part-of-speech tags and the meaning of verbs (according to WordNet). The dataset authors applied approaches such as Naive Bayes or Support Vector Machines. The present paper proposes, as an alternative, to use Schoenberg transformations which, following the example of kernel methods, transform original Euclidean distances into other Euclidean distances, in a space of high dimensionality. Résumé Ce travail étudie la classification supervisée multi-étiquette en actes de dialogue des tours de parole des contributeurs aux pages de discussion de Simple English Wikipedia (Wikipédia en anglais simple). Le jeu de données considéré a été créé et multi-étiqueté par (Ferschke et al., 2012). Une première partie analyse les relations entre les étiquettes pour examiner la cohérence des annotations et pour déterminer une méthode de classification. Ensuite, une classification supervisée multi-étiquette est effectuée, après recodage binaire des étiquettes. Concernant les variables, alors que (Ferschke et al., 2012) utilisent des caractéristiques telles que les uni-, bi- et trigrammes, le temps entre les tours de parole ou l'indentation d'un tour de parole, d'autres descripteurs sont considérés ici : les lemmes, les catégories morphosyntaxiques et le sens des verbes (selon WordNet). Les auteurs du jeu de données ont employé des approches telles que le Naive Bayes ou les Séparateurs à Vastes Marges (SVM) pour la classification. Cet article propose, de façon alternative, d'utiliser et d'étendre l'analyse discriminante linéaire aux transformations de Schoenberg qui, à l'instar des méthodes à noyau, transforment les distances euclidiennes originales en d'autres distances euclidiennes, dans un espace de haute dimensionnalité.
Resumo:
Objective: We used demographic and clinical data to design practical classification models for prediction of neurocognitive impairment (NCI) in people with HIV infection. Methods: The study population comprised 331 HIV-infected patients with available demographic, clinical, and neurocognitive data collected using a comprehensive battery of neuropsychological tests. Classification and regression trees (CART) were developed to btain detailed and reliable models to predict NCI. Following a practical clinical approach, NCI was considered the main variable for study outcomes, and analyses were performed separately in treatment-naïve and treatment-experienced patients. Results: The study sample comprised 52 treatment-naïve and 279 experienced patients. In the first group, the variables identified as better predictors of NCI were CD4 cell count and age (correct classification [CC]: 79.6%, 3 final nodes). In treatment-experienced patients, the variables most closely related to NCI were years of education, nadir CD4 cell count, central nervous system penetration-effectiveness score, age, employment status, and confounding comorbidities (CC: 82.1%, 7 final nodes). In patients with an undetectable viral load and no comorbidities, we obtained a fairly accurate model in which the main variables were nadir CD4 cell count, current CD4 cell count, time on current treatment, and past highest viral load (CC: 88%, 6 final nodes). Conclusion: Practical classification models to predict NCI in HIV infection can be obtained using demographic and clinical variables. An approach based on CART analyses may facilitate screening for HIV-associated neurocognitive disorders and complement clinical information about risk and protective factors for NCI in HIV-infected patients.
Resumo:
Objective To evaluate the performance of diagnostic centers in the classification of mammography reports from an opportunistic screening undertaken by the Brazilian public health system (SUS) in the municipality of Goiânia, GO, Brazil in 2010. Materials and Methods The present ecological study analyzed data reported to the Sistema de Informação do Controle do Câncer de Mama (SISMAMA) (Breast Cancer Management Information System) by diagnostic centers involved in the mammographic screening developed by the SUS. Based on the frequency of mammograms per BI-RADS® category and on the limits established for the present study, the authors have calculated the rate of conformity for each diagnostic center. Diagnostic centers with equal rates of conformity were considered as having equal performance. Results Fifteen diagnostic centers performed mammographic studies for SUS and reported 31,198 screening mammograms. The performance of the diagnostic centers concerning BI-RADS classification has demonstrated that none of them was in conformity for all categories, one center presented conformity in five categories, two centers, in four categories, three centers, in three categories, two centers, in two categories, four centers, in one category, and three centers with no conformity. Conclusion The results of the present study demonstrate unevenness in the diagnostic centers performance in the classification of mammograms reported to SISMAMA from the opportunistic screening undertaken by SUS.
Resumo:
Objective Quantitative analysis of chest radiographs of patients with and without chronic obstructive pulmonary disease (COPD) determining if the data obtained from such radiographic images could classify such individuals according to the presence or absence of disease. Materials and Methods For such a purpose, three groups of chest radiographic images were utilized, namely: group 1, including 25 individuals with COPD; group 2, including 27 individuals without COPD; and group 3 (utilized for the reclassification /validation of the analysis), including 15 individuals with COPD. The COPD classification was based on spirometry. The variables normalized by retrosternal height were the following: pulmonary width (LARGP); levels of right (ALBDIR) and left (ALBESQ) diaphragmatic eventration; costophrenic angle (ANGCF); and right (DISDIR) and left (DISESQ) intercostal distances. Results As the radiographic images of patients with and without COPD were compared, statistically significant differences were observed between the two groups on the variables related to the diaphragm. In the COPD reclassification the following variables presented the highest indices of correct classification: ANGCF (80%), ALBDIR (73.3%), ALBESQ (86.7%). Conclusion The radiographic assessment of the chest demonstrated that the variables related to the diaphragm allow a better differentiation between individuals with and without COPD.
Resumo:
AbstractRenal cell carcinoma (RCC) is the seventh most common histological type of cancer in the Western world and has shown a sustained increase in its prevalence. The histological classification of RCCs is of utmost importance, considering the significant prognostic and therapeutic implications of its histological subtypes. Imaging methods play an outstanding role in the diagnosis, staging and follow-up of RCC. Clear cell, papillary and chromophobe are the most common histological subtypes of RCC, and their preoperative radiological characterization, either followed or not by confirmatory percutaneous biopsy, may be particularly useful in cases of poor surgical condition, metastatic disease, central mass in a solitary kidney, and in patients eligible for molecular targeted therapy. New strategies recently developed for treating renal cancer, such as cryo and radiofrequency ablation, molecularly targeted therapy and active surveillance also require appropriate preoperative characterization of renal masses. Less common histological types, although sharing nonspecific imaging features, may be suspected on the basis of clinical and epidemiological data. The present study is aimed at reviewing the main clinical and imaging findings of histological RCC subtypes.
Resumo:
Abstract Objective: To assess the cutoff values established by ROC curves to classify18F-NaF uptake as normal or malignant. Materials and Methods: PET/CT images were acquired 1 hour after administration of 185 MBq of18F-NaF. Volumes of interest (VOIs) were drawn on three regions of the skeleton as follows: proximal right humerus diaphysis (HD), proximal right femoral diaphysis (FD) and first vertebral body (VB1), in a total of 254 patients, totalling 762 VOIs. The uptake in the VOIs was classified as normal or malignant on the basis of the radiopharmaceutical distribution pattern and of the CT images. A total of 675 volumes were classified as normal and 52 were classified as malignant. Thirty-five VOIs classified as indeterminate or nonmalignant lesions were excluded from analysis. The standardized uptake value (SUV) measured on the VOIs were plotted on an ROC curve for each one of the three regions. The area under the ROC (AUC) as well as the best cutoff SUVs to classify the VOIs were calculated. The best cutoff values were established as the ones with higher result of the sum of sensitivity and specificity. Results: The AUCs were 0.933, 0.889 and 0.975 for UD, FD and VB1, respectively. The best SUV cutoffs were 9.0 (sensitivity: 73%; specificity: 99%), 8.4 (sensitivity: 79%; specificity: 94%) and 21.0 (sensitivity: 93%; specificity: 95%) for UD, FD and VB1, respectively. Conclusion: The best cutoff value varies according to bone region of analysis and it is not possible to establish one value for the whole body.