820 resultados para Data classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: We used demographic and clinical data to design practical classification models for prediction of neurocognitive impairment (NCI) in people with HIV infection. Methods: The study population comprised 331 HIV-infected patients with available demographic, clinical, and neurocognitive data collected using a comprehensive battery of neuropsychological tests. Classification and regression trees (CART) were developed to btain detailed and reliable models to predict NCI. Following a practical clinical approach, NCI was considered the main variable for study outcomes, and analyses were performed separately in treatment-naïve and treatment-experienced patients. Results: The study sample comprised 52 treatment-naïve and 279 experienced patients. In the first group, the variables identified as better predictors of NCI were CD4 cell count and age (correct classification [CC]: 79.6%, 3 final nodes). In treatment-experienced patients, the variables most closely related to NCI were years of education, nadir CD4 cell count, central nervous system penetration-effectiveness score, age, employment status, and confounding comorbidities (CC: 82.1%, 7 final nodes). In patients with an undetectable viral load and no comorbidities, we obtained a fairly accurate model in which the main variables were nadir CD4 cell count, current CD4 cell count, time on current treatment, and past highest viral load (CC: 88%, 6 final nodes). Conclusion: Practical classification models to predict NCI in HIV infection can be obtained using demographic and clinical variables. An approach based on CART analyses may facilitate screening for HIV-associated neurocognitive disorders and complement clinical information about risk and protective factors for NCI in HIV-infected patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective To evaluate the performance of diagnostic centers in the classification of mammography reports from an opportunistic screening undertaken by the Brazilian public health system (SUS) in the municipality of Goiânia, GO, Brazil in 2010. Materials and Methods The present ecological study analyzed data reported to the Sistema de Informação do Controle do Câncer de Mama (SISMAMA) (Breast Cancer Management Information System) by diagnostic centers involved in the mammographic screening developed by the SUS. Based on the frequency of mammograms per BI-RADS® category and on the limits established for the present study, the authors have calculated the rate of conformity for each diagnostic center. Diagnostic centers with equal rates of conformity were considered as having equal performance. Results Fifteen diagnostic centers performed mammographic studies for SUS and reported 31,198 screening mammograms. The performance of the diagnostic centers concerning BI-RADS classification has demonstrated that none of them was in conformity for all categories, one center presented conformity in five categories, two centers, in four categories, three centers, in three categories, two centers, in two categories, four centers, in one category, and three centers with no conformity. Conclusion The results of the present study demonstrate unevenness in the diagnostic centers performance in the classification of mammograms reported to SISMAMA from the opportunistic screening undertaken by SUS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective Quantitative analysis of chest radiographs of patients with and without chronic obstructive pulmonary disease (COPD) determining if the data obtained from such radiographic images could classify such individuals according to the presence or absence of disease. Materials and Methods For such a purpose, three groups of chest radiographic images were utilized, namely: group 1, including 25 individuals with COPD; group 2, including 27 individuals without COPD; and group 3 (utilized for the reclassification /validation of the analysis), including 15 individuals with COPD. The COPD classification was based on spirometry. The variables normalized by retrosternal height were the following: pulmonary width (LARGP); levels of right (ALBDIR) and left (ALBESQ) diaphragmatic eventration; costophrenic angle (ANGCF); and right (DISDIR) and left (DISESQ) intercostal distances. Results As the radiographic images of patients with and without COPD were compared, statistically significant differences were observed between the two groups on the variables related to the diaphragm. In the COPD reclassification the following variables presented the highest indices of correct classification: ANGCF (80%), ALBDIR (73.3%), ALBESQ (86.7%). Conclusion The radiographic assessment of the chest demonstrated that the variables related to the diaphragm allow a better differentiation between individuals with and without COPD.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The World Health Organization (WHO) plans to submit the 11th revision of the International Classification of Diseases (ICD) to the World Health Assembly in 2018. The WHO is working toward a revised classification system that has an enhanced ability to capture health concepts in a manner that reflects current scientific evidence and that is compatible with contemporary information systems. In this paper, we present recommendations made to the WHO by the ICD revision's Quality and Safety Topic Advisory Group (Q&S TAG) for a new conceptual approach to capturing healthcare-related harms and injuries in ICD-coded data. The Q&S TAG has grouped causes of healthcare-related harm and injuries into four categories that relate to the source of the event: (a) medications and substances, (b) procedures, (c) devices and (d) other aspects of care. Under the proposed multiple coding approach, one of these sources of harm must be coded as part of a cluster of three codes to depict, respectively, a healthcare activity as a 'source' of harm, a 'mode or mechanism' of harm and a consequence of the event summarized by these codes (i.e. injury or harm). Use of this framework depends on the implementation of a new and potentially powerful code-clustering mechanism in ICD-11. This new framework for coding healthcare-related harm has great potential to improve the clinical detail of adverse event descriptions, and the overall quality of coded health data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

AbstractRenal cell carcinoma (RCC) is the seventh most common histological type of cancer in the Western world and has shown a sustained increase in its prevalence. The histological classification of RCCs is of utmost importance, considering the significant prognostic and therapeutic implications of its histological subtypes. Imaging methods play an outstanding role in the diagnosis, staging and follow-up of RCC. Clear cell, papillary and chromophobe are the most common histological subtypes of RCC, and their preoperative radiological characterization, either followed or not by confirmatory percutaneous biopsy, may be particularly useful in cases of poor surgical condition, metastatic disease, central mass in a solitary kidney, and in patients eligible for molecular targeted therapy. New strategies recently developed for treating renal cancer, such as cryo and radiofrequency ablation, molecularly targeted therapy and active surveillance also require appropriate preoperative characterization of renal masses. Less common histological types, although sharing nonspecific imaging features, may be suspected on the basis of clinical and epidemiological data. The present study is aimed at reviewing the main clinical and imaging findings of histological RCC subtypes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Myeloid malignancies (MMs) are a heterogeneous group of hematologic malignancies presenting different incidence, prognosis and survival.1–3 Changing classifications (FAB 1994, WHO 2001 and WHO 2008) and few available epidemiological data complicate incidence comparisons.4,5 Taking this into account, the aims of the present study were: a) to calculate the incidence rates and trends of MMs in the Province of Girona, northeastern Spain, between 1994 and 2008 according to the WHO 2001 classification; and b) to predict the number of MMs cases in Spain during 2013. Data were extracted from the population-based Girona Cancer Registry (GCR) located in the north-east of Catalonia, Spain, and covering a population of 731,864 inhabitants (2008 census). Cases were registered according to the rules of the European Network for Cancer Registries and the Manual for Coding and Reporting Haematological Malignancies (HAEMACARE project). To ensure the complete coverage of MMs in the GCR, and especially myeloproliferative neoplasms (MPN) and myelodysplastic syndromes (MDS), a retrospective search was performed. The ICD-O-2 (1990) codes were converted into their corresponding ICD-O-3 (2000) codes, including MDS, polycythemia vera (PV) and essential thrombocythemia (ET) as malignant diseases. Results of crude rate (CR) and European standardized incidence rate (ASRE) were expressed per 100,000 inhabitants/year

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Breast cancer is the most common diagnosed cancer and the leading cause of cancer death among females worldwide. It is considered a highly heterogeneous disease and it must be classified into more homogeneous groups. Hence, the purpose of this study was to classify breast tumors based on variations in gene expression patterns derived from RNA sequencing by using different class discovery methods. 42 breast tumors paired-samples were sequenced by Illumine Genome Analyzer and the data was analyzed and prepared by TopHat2 and htseq-count. As reported previously, breast cancer could be grouped into five main groups known as basal epithelial-like group, HER2 group, normal breast-like group and two Luminal groups with a distinctive expression profile. Classifying breast tumor samples by using PAM50 method, the most common subtype was Luminal B and was significantly associated with ESR1 and ERBB2 high expression. Luminal A subtype had ESR1 and SLC39A6 significant high expression, whereas HER2 subtype had a high expression of ERBB2 and CNNE1 genes and low luminal epithelial gene expression. Basal-like and normal-like subtypes were associated with low expression of ESR1, PgR and HER2, and had significant high expression of cytokeratins 5 and 17. Our results were similar compared with TGCA breast cancer data results and with known studies related with breast cancer classification. Classifying breast tumors could add significant prognostic and predictive information to standard parameters, and moreover, identify marker genes for each subtype to find a better therapy for patients with breast cancer.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we study the classification of forest types using mathematics based image analysis on satellite data. We are interested in improving classification of forest segments when a combination of information from two or more different satellites is used. The experimental part is based on real satellite data originating from Canada. This thesis gives summary of the mathematics basics of the image analysis and supervised learning , methods that are used in the classification algorithm. Three data sets and four feature sets were investigated in this thesis. The considered feature sets were 1) histograms (quantiles) 2) variance 3) skewness and 4) kurtosis. Good overall performances were achieved when a combination of ASTERBAND and RADARSAT2 data sets was used.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Identification of clouds from satellite images is now a routine task. Observation of clouds from the ground, however, is still needed to acquire a complete description of cloud conditions. Among the standard meteorologicalvariables, solar radiation is the most affected by cloud cover. In this note, a method for using global and diffuse solar radiation data to classify sky conditions into several classes is suggested. A classical maximum-likelihood method is applied for clustering data. The method is applied to a series of four years of solar radiation data and human cloud observations at a site in Catalonia, Spain. With these data, the accuracy of the solar radiation method as compared with human observations is 45% when nine classes of sky conditions are to be distinguished, and it grows significantly to almost 60% when samples are classified in only five different classes. Most errors are explained by limitations in the database; therefore, further work is under way with a more suitable database

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Female sexual dysfunctions, including desire, arousal, orgasm and pain problems, have been shown to be highly prevalent among women around the world. The etiology of these dysfunctions is unclear but associations with health, age, psychological problems, and relationship factors have been identified. Genetic effects explain individual variation in orgasm function to some extent but until now quantitative behavior genetic analyses have not been applied to other sexual functions. In addition, behavior genetics can be applied to exploring the cause of any observed comorbidity between the dysfunctions. Discovering more about the etiology of the dysfunctions may further improve the classification systems which are currently under intense debate. The aims of the present thesis were to evaluate the psychometric properties of a Finnish-language version of a commonly used questionnaire for measuring female sexual function, the Female Sexual Function Index (FSFI), in order to investigate prevalence, comorbidity, and classification, and to explore the balance of genetic and environmental factors in the etiology as well as the associations of a number of biopsychosocial factors with female sexual functions. Female sexual functions were studied through survey methods in a population based sample of Finnish twins and their female siblings. There were two waves of data collection. The first data collection targeted 5,000 female twins aged 33–43 years and the second 7,680 female twins aged 18–33 and their over 18–year-old female siblings (n = 3,983). There was no overlap between the data collections. The combined overall response rate for both data collections was 53% (n = 8,868), with a better response rate in the second (57%) compared to the first (45%). In order to measure female sexual function, the FSFI was used. It includes 19 items which measure female sexual function during the previous four weeks in six subdomains; desire, subjective arousal, lubrication, orgasm, sexual satisfaction, and pain. In line with earlier research in clinical populations, a six factor solution of the Finnish-language version of the FSFI received supported. The internal consistencies of the scales were good to excellent. Some questions about how to avoid overestimating the prevalence of extreme dysfunctions due to women being allocated the score of zero if they had had no sexual activity during the preceding four weeks were raised. The prevalence of female sexual dysfunctions per se ranged from 11% for lubrication dysfunction to 55% for desire dysfunction. The prevalence rates for sexual dysfunction with concomitant sexual distress, in other words, sexual disorders were notably lower ranging from 7% for lubrication disorder to 23% for desire disorder. The comorbidity between the dysfunctions was substantial most notably between arousal and lubrication dysfunction even if these two dysfunctions showed distinct patterns of associations with the other dysfunctions. Genetic influences on individual variation in the six subdomains of FSFI were modest but significant ranging from 3–11% for additive genetic effects and 5–18% for nonadditive genetic effects. The rest of the variation in sexual functions was explained by nonshared environmental influences. A correlated factor model, including additive and nonadditive genetic effects and nonshared environmental effects had the best fit. All in all, every correlation between the genetic factors was significant except between lubrication and pain. All correlations between the nonshared environment factors were significant showing that there is a substantial overlap in genetic and nonshared environmental influences between the dysfunctions. In general, psychological problems, poor satisfaction with the relationship, sexual distress, and poor partner compatibility were associated with more sexual dysfunctions. Age was confounded with relationship length but had over and above relationship length a negative effect on desire and sexual satisfaction and a positive effect on orgasm and pain functions. Alcohol consumption in general was associated with better desire, arousal, lubrication, and orgasm function. Women pregnant with their first child had fewer pain problems than nulliparous nonpregnant women. Multiparous pregnant women had more orgasm problems compared to multiparous nonpregnant women. Having children was associated with less orgasm and pain problems. The conclusions were that desire, subjective arousal, lubrication, orgasm, sexual satisfaction, and pain are separate entities that have distinct associations with a number of different biopsychosocial factors. However, there is also considerable comorbidity between the dysfunctions which are explained by overlap in additive genetic, nonadditive genetic and nonshared environmental influences. Sexual dysfunctions are highly prevalent and are not always associated with sexual distress and this relationship might be moderated by a good relationship and compatibility with partner. Regarding classification, the results supports separate diagnoses for subjective arousal and genital arousal as well as the inclusion of pain under sexual dysfunctions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ABSTRACT Geographic Information System (GIS) is an indispensable software tool in forest planning. In forestry transportation, GIS can manage the data on the road network and solve some problems in transportation, such as route planning. Therefore, the aim of this study was to determine the pattern of the road network and define transport routes using GIS technology. The present research was conducted in a forestry company in the state of Minas Gerais, Brazil. The criteria used to classify the pattern of forest roads were horizontal and vertical geometry, and pavement type. In order to determine transport routes, a data Analysis Model Network was created in ArcGIS using an Extension Network Analyst, allowing finding a route shorter in distance and faster. The results showed a predominance of horizontal geometry classes average (3) and bad (4), indicating presence of winding roads. In the case of vertical geometry criterion, the class of highly mountainous relief (4) possessed the greatest extent of roads. Regarding the type of pavement, the occurrence of secondary coating was higher (75%), followed by primary coating (20%) and asphalt pavement (5%). The best route was the one that allowed the transport vehicle travel in a higher specific speed as a function of road pattern found in the study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Among the challenges of pig farming in today's competitive market, there is factor of the product traceability that ensures, among many points, animal welfare. Vocalization is a valuable tool to identify situations of stress in pigs, and it can be used in welfare records for traceability. The objective of this work was to identify stress in piglets using vocalization, calling this stress on three levels: no stress, moderate stress, and acute stress. An experiment was conducted on a commercial farm in the municipality of Holambra, São Paulo State , where vocalizations of twenty piglets were recorded during the castration procedure, and separated into two groups: without anesthesia and local anesthesia with lidocaine base. For the recording of acoustic signals, a unidirectional microphone was connected to a digital recorder, in which signals were digitized at a frequency of 44,100 Hz. For evaluation of sound signals, Praat® software was used, and different data mining algorithms were applied using Weka® software. The selection of attributes improved model accuracy, and the best attribute selection was used by applying Wrapper method, while the best classification algorithms were the k-NN and Naive Bayes. According to the results, it was possible to classify the level of stress in pigs through their vocalization.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coffee production was closely linked to the economic development of Brazil and, even today, coffee is an important product of the national agriculture. The State of Minas Gerais currently accounts for 52% of the whole coffee area in Brazil. Remote sensing data can provide information for monitoring and mapping of coffee crops, faster and cheaper than conventional methods. In this context, the objective of this study was to assess the effectiveness of coffee crop mapping in Monte Santo de Minas municipality, Minas Gerais State, Brazil, from fraction images derived from MODIS data, in both dry and rainy seasons. The Spectral Linear Mixing Model was used to derive fraction images of soil, coffee, and water/shade. These fraction images served as input data for the supervised automatic classification using the SVM - Support Vector Machine approach. The best results concerning Overall Accuracy and Kappa Index were obtained in the classification of the dry season, with 67% and 0.41, respectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study aimed to identify differences in swine vocalization pattern according to animal gender and different stress conditions. A total of 150 barrow males and 150 females (Dalland® genetic strain), aged 100 days, were used in the experiment. Pigs were exposed to different stressful situations: thirst (no access to water), hunger (no access to food), and thermal stress (THI exceeding 74). For the control treatment, animals were kept under a comfort situation (animals with full access to food and water, with environmental THI lower than 70). Acoustic signals were recorded every 30 minutes, totaling six samples for each stress situation. Afterwards, the audios were analyzed by Praat® 5.1.19 software, generating a sound spectrum. For determination of stress conditions, data were processed by WEKA® 3.5 software, using the decision tree algorithm C4.5, known as J48 in the software environment, considering cross-validation with samples of 10% (10-fold cross-validation). According to the Decision Tree, the acoustic most important attribute for the classification of stress conditions was sound Intensity (root node). It was not possible to identify, using the tested attributes, the animal gender by vocal register. A decision tree was generated for recognition of situations of swine hunger, thirst, and heat stress from records of sound intensity, Pitch frequency, and Formant 1.