908 resultados para CLASSIFICATION AND REGRESSION TREE
Resumo:
Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.
Resumo:
Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.
Resumo:
BACKGROUND: Studies that systematically assess change in ulcerative colitis (UC) extent over time in adult patients are scarce. AIM: To assess changes in disease extent over time and to evaluate clinical parameters associated with this change. METHODS: Data from the Swiss IBD cohort study were analysed. We used logistic regression modelling to identify factors associated with a change in disease extent. RESULTS: A total of 918 UC patients (45.3% females) were included. At diagnosis, UC patients presented with the following disease extent: proctitis [199 patients (21.7%)], left-sided colitis [338 patients (36.8%)] and extensive colitis/pancolitis [381 (41.5%)]. During a median disease duration of 9 [4-16] years, progression and regression was documented in 145 patients (15.8%) and 149 patients (16.2%) respectively. In addition, 624 patients (68.0%) had a stable disease extent. The following factors were identified to be associated with disease progression: treatment with systemic glucocorticoids [odds ratio (OR) 1.704, P = 0.025] and calcineurin inhibitors (OR: 2.716, P = 0.005). No specific factors were found to be associated with disease regression. CONCLUSIONS: Over a median disease duration of 9 [4-16] years, about two-thirds of UC patients maintained the initial disease extent; the remaining one-third had experienced either progression or regression of the disease extent.
Resumo:
The main objective of the study is to form a framework that provides tools to recognise and classify items whose demand is not smooth but varies highly on size and/or frequency. The framework will then be combined with two other classification methods in order to form a three-dimensional classification model. Forecasting and inventory control of these abnormal demand items is difficult. Therefore another object of this study is to find out which statistical forecasting method is most suitable for forecasting of abnormal demand items. The accuracy of different methods is measured by comparing the forecast to the actual demand. Moreover, the study also aims at finding proper alternatives to the inventory control of abnormal demand items. The study is quantitative and the methodology is a case study. The research methods consist of theory, numerical data, current state analysis and testing of the framework in case company. The results of the study show that the framework makes it possible to recognise and classify the abnormal demand items. It is also noticed that the inventory performance of abnormal demand items differs significantly from the performance of smoothly demanded items. This makes the recognition of abnormal demand items very important.
Resumo:
The fossil crown wasp Electrostephanus petiolatus Brues comb. rev.(Stephanidae, Electrostephaninae) is re-described from a single male preserved in middle Eocene Baltic Amber. The holotype was lost or destroyed around the time of World War II and subsequent interpretations of its identity have been based solely on the brief descriptive comments provided by Brues in his original account. The new specimen matches the original description and illustration provided by Brues in every detail and we hereby consider them to be conspecific, selecting the specimen as a neotype for the purpose of stabilizing the nomenclature for this fossil species. This neotype exhibits a free first metasomal tergum and sternum, contrary to the assertion of previous workers who indicated these to be fused. Accordingly, this species does indeed belong to the genus Electrostephanus Brues rather than to Denaeostephanus Engel & Grimaldi (Stephaninae). Electrostephanus petiolatus is transferred to a new subgenus, Electrostephanodes n. subgen. , based on its elongate pseudo- petiole and slender gaster, but may eventually warrant generic status as the phylogenetic placement of these fossil lineages continues to be clarifi ed. A revised key to the Baltic amber crown wasps is provided.
Resumo:
AbstractRenal cell carcinoma (RCC) is the seventh most common histological type of cancer in the Western world and has shown a sustained increase in its prevalence. The histological classification of RCCs is of utmost importance, considering the significant prognostic and therapeutic implications of its histological subtypes. Imaging methods play an outstanding role in the diagnosis, staging and follow-up of RCC. Clear cell, papillary and chromophobe are the most common histological subtypes of RCC, and their preoperative radiological characterization, either followed or not by confirmatory percutaneous biopsy, may be particularly useful in cases of poor surgical condition, metastatic disease, central mass in a solitary kidney, and in patients eligible for molecular targeted therapy. New strategies recently developed for treating renal cancer, such as cryo and radiofrequency ablation, molecularly targeted therapy and active surveillance also require appropriate preoperative characterization of renal masses. Less common histological types, although sharing nonspecific imaging features, may be suspected on the basis of clinical and epidemiological data. The present study is aimed at reviewing the main clinical and imaging findings of histological RCC subtypes.
Resumo:
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
Gestational trophoblastic neoplasia (GTN) is the term to describe a set of malignant placental diseases, including invasive mole, choriocarcinoma, placental site trophoblastic tumor and epithelioid trophoblastic tumor. Both invasive mole and choriocarcinoma respond well to chemotherapy, and cure rates are greater than 90%. Since the advent of chemotherapy, low-risk GTN has been treated with a single agent, usually methotrexate or actinomycin D. Cases of high-risk GTN, however, should be treated with multiagent chemotherapy, and the regimen usually selected is EMA-CO, which combines etoposide, methotrexate, actinomycin D, cyclophosphamide and vincristine. This study reviews the literature about GTN to discuss current knowledge about its diagnosis and treatment.
Resumo:
The horizontal and vertical tree community structure in a lowland Atlantic Rain Forest was investigated through a phytosociological survey in two 0.99 ha plots in the Intervales State Park, São Paulo State. All trees > 5 cm diameter at breast height were recorded. 3,078 individuals belonging to 172 species were identified and recorded. The Shannon diversity index was H' = 3.85 nat.ind.-1. The Myrtaceae family showed the greatest floristic richness (38 species) and the highest density (745 individuals) in the stand. Euterpe edulis Mart. had the highest importance value (33.98%) accounting for 21.8% of all individuals recorded. The quantitative similarity index was higher than the qualitative index, showing little structural variation between plots. However, the large number of uncommon species resulted in pronounced floristic differences. A detrended correspondence analysis (DCA) generated three arbitrary vertical strata. Stratum A (> 26 m), where Sloanea guianensis (Aubl.) Benth. and Virola bicuhyba (Schott. ex A.DC.) Warb. were predominant showed the lowest density. Stratum B (8 m < h < 26 m) had the greatest richness and diversity, and stratum C (< 8 m) showed the highest density. Euterpe edulis, Guapira opposita (Vell.) Reitz, Garcinia gardneriana (Planch. & Triana) Zappi, and Eugenia mosenii (Kausel) Sobral were abundant in strata B and C. The occurrence of strata in tropical forests is discussed and we recommend the use of DCA for others studies of the vertical distribution of tropical forest tree communities.
Resumo:
The aim of this study was to analyze clinical aspects, hearing evolution and efficacy of clinical treatment of patients with sudden sensorineural hearing loss (SSNHL). This was a prospective clinical study of 136 consecutive patients with SSNHL divided into three groups after diagnostic evaluation: patients with defined etiology (DE, N = 13, 10%), concurrent diseases (CD, N = 63, 46.04%) and idiopathic sudden sensorineural hearing loss (ISSHL, N = 60, 43.9%). Initial treatment consisted of prednisone and pentoxifylline. Clinical aspects and hearing evolution for up to 6 months were evaluated. Group CD comprised 73% of patients with metabolic decompensation in the initial evaluation and was significantly older (53.80 years) than groups DE (41.93 years) and ISSHL (39.13 years). Comparison of the mean initial and final hearing loss of the three groups revealed a significant hearing improvement for group CD (P = 0.001) and group ISSHL (P = 0.001). Group DE did not present a significant difference in thresholds. The clinical classification for SSNHL allows the identification of significant differences regarding age, initial and final hearing impairment and likelihood of response to therapy. Elevated age and presence of coexisting disease were associated with a greater initial hearing impact and poorer hearing recovery after 6 months. Patients with defined etiology presented a much more limited response to therapy. The occurrence of decompensated metabolic and cardiovascular diseases and the possibility of first manifestation of auto-immune disease and cerebello-pontine angle tumors justify an adequate protocol for investigation of SSNHL.
Resumo:
The objective of the present study was to evaluate the characteristics of acute kidney injury (AKI) in AIDS patients and the value of RIFLE classification for predicting outcome. The study was conducted on AIDS patients admitted to an infectious diseases hospital inBrazil. The patients with AKI were classified according to the RIFLE classification: R (risk), I (injury), F (failure), L (loss), and E (end-stage renal disease). Univariate and multivariate analyses were used to evaluate the factors associated with AKI. A total of 532 patients with a mean age of 35 ± 8.5 years were included in this study. AKI was observed in 37% of the cases. Patients were classified as "R" (18%), "I" (7.7%) and "F" (11%). Independent risk factors for AKI were thrombocytopenia (OR = 2.9, 95%CI = 1.5-5.6, P < 0.001) and elevation of aspartate aminotransferase (AST) (OR = 3.5, 95%CI = 1.8-6.6, P < 0.001). General mortality was 25.7% and was higher among patients with AKI (40.2 vs17%, P < 0.001). AKI was associated with death and mortality increased according to RIFLE classification - "R" (OR 2.4), "I" (OR 3.0) and "F" (OR 5.1), P < 0.001. AKI is a frequent complication in AIDS patients, which is associated with increased mortality. RIFLE classification is an important indicator of poor outcome for AIDS patients.
Resumo:
This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.
Resumo:
Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP's problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages for object classification is presented. Two main GP approaches for visual pattern classification, namely the block-classifiers and the pixel-classifiers, were studied. Results showed that the pixel-classifiers generally performed better. Using these results, a suitable language was selected for the real-time implementation. Synthetic video data was used in the experiments. The goal of the experiments was to evolve a unique classifier for each texture pattern that existed in the video. The experiments revealed that the system was capable of correctly tracking the textures in the video. The performance of the system was on-par with real-time requirements.