813 resultados para microarray data classification
Resumo:
The main objective of the study is to form a framework that provides tools to recognise and classify items whose demand is not smooth but varies highly on size and/or frequency. The framework will then be combined with two other classification methods in order to form a three-dimensional classification model. Forecasting and inventory control of these abnormal demand items is difficult. Therefore another object of this study is to find out which statistical forecasting method is most suitable for forecasting of abnormal demand items. The accuracy of different methods is measured by comparing the forecast to the actual demand. Moreover, the study also aims at finding proper alternatives to the inventory control of abnormal demand items. The study is quantitative and the methodology is a case study. The research methods consist of theory, numerical data, current state analysis and testing of the framework in case company. The results of the study show that the framework makes it possible to recognise and classify the abnormal demand items. It is also noticed that the inventory performance of abnormal demand items differs significantly from the performance of smoothly demanded items. This makes the recognition of abnormal demand items very important.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
Objective: We used demographic and clinical data to design practical classification models for prediction of neurocognitive impairment (NCI) in people with HIV infection. Methods: The study population comprised 331 HIV-infected patients with available demographic, clinical, and neurocognitive data collected using a comprehensive battery of neuropsychological tests. Classification and regression trees (CART) were developed to btain detailed and reliable models to predict NCI. Following a practical clinical approach, NCI was considered the main variable for study outcomes, and analyses were performed separately in treatment-naïve and treatment-experienced patients. Results: The study sample comprised 52 treatment-naïve and 279 experienced patients. In the first group, the variables identified as better predictors of NCI were CD4 cell count and age (correct classification [CC]: 79.6%, 3 final nodes). In treatment-experienced patients, the variables most closely related to NCI were years of education, nadir CD4 cell count, central nervous system penetration-effectiveness score, age, employment status, and confounding comorbidities (CC: 82.1%, 7 final nodes). In patients with an undetectable viral load and no comorbidities, we obtained a fairly accurate model in which the main variables were nadir CD4 cell count, current CD4 cell count, time on current treatment, and past highest viral load (CC: 88%, 6 final nodes). Conclusion: Practical classification models to predict NCI in HIV infection can be obtained using demographic and clinical variables. An approach based on CART analyses may facilitate screening for HIV-associated neurocognitive disorders and complement clinical information about risk and protective factors for NCI in HIV-infected patients.
Resumo:
Objective To evaluate the performance of diagnostic centers in the classification of mammography reports from an opportunistic screening undertaken by the Brazilian public health system (SUS) in the municipality of Goiânia, GO, Brazil in 2010. Materials and Methods The present ecological study analyzed data reported to the Sistema de Informação do Controle do Câncer de Mama (SISMAMA) (Breast Cancer Management Information System) by diagnostic centers involved in the mammographic screening developed by the SUS. Based on the frequency of mammograms per BI-RADS® category and on the limits established for the present study, the authors have calculated the rate of conformity for each diagnostic center. Diagnostic centers with equal rates of conformity were considered as having equal performance. Results Fifteen diagnostic centers performed mammographic studies for SUS and reported 31,198 screening mammograms. The performance of the diagnostic centers concerning BI-RADS classification has demonstrated that none of them was in conformity for all categories, one center presented conformity in five categories, two centers, in four categories, three centers, in three categories, two centers, in two categories, four centers, in one category, and three centers with no conformity. Conclusion The results of the present study demonstrate unevenness in the diagnostic centers performance in the classification of mammograms reported to SISMAMA from the opportunistic screening undertaken by SUS.
Resumo:
Objective Quantitative analysis of chest radiographs of patients with and without chronic obstructive pulmonary disease (COPD) determining if the data obtained from such radiographic images could classify such individuals according to the presence or absence of disease. Materials and Methods For such a purpose, three groups of chest radiographic images were utilized, namely: group 1, including 25 individuals with COPD; group 2, including 27 individuals without COPD; and group 3 (utilized for the reclassification /validation of the analysis), including 15 individuals with COPD. The COPD classification was based on spirometry. The variables normalized by retrosternal height were the following: pulmonary width (LARGP); levels of right (ALBDIR) and left (ALBESQ) diaphragmatic eventration; costophrenic angle (ANGCF); and right (DISDIR) and left (DISESQ) intercostal distances. Results As the radiographic images of patients with and without COPD were compared, statistically significant differences were observed between the two groups on the variables related to the diaphragm. In the COPD reclassification the following variables presented the highest indices of correct classification: ANGCF (80%), ALBDIR (73.3%), ALBESQ (86.7%). Conclusion The radiographic assessment of the chest demonstrated that the variables related to the diaphragm allow a better differentiation between individuals with and without COPD.
Resumo:
The World Health Organization (WHO) plans to submit the 11th revision of the International Classification of Diseases (ICD) to the World Health Assembly in 2018. The WHO is working toward a revised classification system that has an enhanced ability to capture health concepts in a manner that reflects current scientific evidence and that is compatible with contemporary information systems. In this paper, we present recommendations made to the WHO by the ICD revision's Quality and Safety Topic Advisory Group (Q&S TAG) for a new conceptual approach to capturing healthcare-related harms and injuries in ICD-coded data. The Q&S TAG has grouped causes of healthcare-related harm and injuries into four categories that relate to the source of the event: (a) medications and substances, (b) procedures, (c) devices and (d) other aspects of care. Under the proposed multiple coding approach, one of these sources of harm must be coded as part of a cluster of three codes to depict, respectively, a healthcare activity as a 'source' of harm, a 'mode or mechanism' of harm and a consequence of the event summarized by these codes (i.e. injury or harm). Use of this framework depends on the implementation of a new and potentially powerful code-clustering mechanism in ICD-11. This new framework for coding healthcare-related harm has great potential to improve the clinical detail of adverse event descriptions, and the overall quality of coded health data.
Resumo:
AbstractRenal cell carcinoma (RCC) is the seventh most common histological type of cancer in the Western world and has shown a sustained increase in its prevalence. The histological classification of RCCs is of utmost importance, considering the significant prognostic and therapeutic implications of its histological subtypes. Imaging methods play an outstanding role in the diagnosis, staging and follow-up of RCC. Clear cell, papillary and chromophobe are the most common histological subtypes of RCC, and their preoperative radiological characterization, either followed or not by confirmatory percutaneous biopsy, may be particularly useful in cases of poor surgical condition, metastatic disease, central mass in a solitary kidney, and in patients eligible for molecular targeted therapy. New strategies recently developed for treating renal cancer, such as cryo and radiofrequency ablation, molecularly targeted therapy and active surveillance also require appropriate preoperative characterization of renal masses. Less common histological types, although sharing nonspecific imaging features, may be suspected on the basis of clinical and epidemiological data. The present study is aimed at reviewing the main clinical and imaging findings of histological RCC subtypes.
Resumo:
At present, despite extensive laboratory investigations, most cases of porcine abortion remain without an etiological diagnosis. Due to a lack of recent data on the abortigenic effect of order Chlamydiales, 286 fetuses and their placentae of 113 abortion cases (1-5 fetuses per abortion case) were investigated by polymerase chain reaction (PCR) methods for family Chlamydiaceae and selected Chlamydia-like organisms such as Parachlamydia acanthamoebae and Waddlia chondrophila. In 0.35% of the cases (1/286 fetuses), the Chlamydiaceae real-time PCR was positive. In the Chlamydiaceae-positive fetus, Chlamydia abortus was detected by a commercial microarray and 16S ribosomal RNA PCR followed by sequencing. The positive fetus had a Porcine circovirus-2 coinfection. By the Parachlamydia real-time PCR, 3.5% (10/286 fetuses of 9 abortion cases) were questionable positive (threshold cycle values: 35.0-45.0). In 2 of these 10 cases, a confirmation by Chlamydiales-specific real-time PCR was possible. All samples tested negative by the Waddlia real-time PCR. It seems unlikely that Chlamydiaceae, Parachlamydia, and Waddlia play an important role as abortigenic agents in Swiss sows.
Resumo:
Chlamydial infections in koalas can cause life-threatening diseases leading to blindness and sterility. However, little is known about the systemic spread of chlamydiae in the inner organs of the koala, and data concerning related pathological organ lesions are limited. The aim of this study was to perform a thorough investigation of organs from 23 koalas and to correlate their histopathological lesions to molecular chlamydial detection. To reach this goal, 246 formalin-fixed and paraffin embedded organ samples from 23 koalas were investigated by histopathology, Chlamydiaceae real-time PCR and immunohistochemistry, ArrayTube Microarray for Chlamydiaceae species identification as well as Chlamydiales real-time PCR and sequencing. By PCR, two koalas were positive for Chlamydia pecorum whereas immunohistochemical labelling for Chlamydiaceae was detected in 10 tissues out of nine koalas. The majority of these (n=6) had positive labelling in the urogenital tract related to histopathological lesions such as cystitis, endometritis, pyelonephritis and prostatitis. Somehow unexpected was the positive labelling in the gastrointestinal tract including the cloaca as well as in lung and spleen indicating systemic spread of infection. Uncultured Chlamydiales were detected in several organs of seven koalas by PCR, and four of these suffered from plasmacytic enteritis of unknown aetiology. Whether the finding of Chlamydia-like organisms in the gastrointestinal tract is linked to plasmacytic enteritis is unclear and remains speculative. However, as recently shown in a mouse model, the gastrointestinal tract might play a role being the site for persistent chlamydial infections and being a source for reinfection of the genital tract.
Resumo:
Myeloid malignancies (MMs) are a heterogeneous group of hematologic malignancies presenting different incidence, prognosis and survival.1–3 Changing classifications (FAB 1994, WHO 2001 and WHO 2008) and few available epidemiological data complicate incidence comparisons.4,5 Taking this into account, the aims of the present study were: a) to calculate the incidence rates and trends of MMs in the Province of Girona, northeastern Spain, between 1994 and 2008 according to the WHO 2001 classification; and b) to predict the number of MMs cases in Spain during 2013. Data were extracted from the population-based Girona Cancer Registry (GCR) located in the north-east of Catalonia, Spain, and covering a population of 731,864 inhabitants (2008 census). Cases were registered according to the rules of the European Network for Cancer Registries and the Manual for Coding and Reporting Haematological Malignancies (HAEMACARE project). To ensure the complete coverage of MMs in the GCR, and especially myeloproliferative neoplasms (MPN) and myelodysplastic syndromes (MDS), a retrospective search was performed. The ICD-O-2 (1990) codes were converted into their corresponding ICD-O-3 (2000) codes, including MDS, polycythemia vera (PV) and essential thrombocythemia (ET) as malignant diseases. Results of crude rate (CR) and European standardized incidence rate (ASRE) were expressed per 100,000 inhabitants/year
Resumo:
Breast cancer is the most common diagnosed cancer and the leading cause of cancer death among females worldwide. It is considered a highly heterogeneous disease and it must be classified into more homogeneous groups. Hence, the purpose of this study was to classify breast tumors based on variations in gene expression patterns derived from RNA sequencing by using different class discovery methods. 42 breast tumors paired-samples were sequenced by Illumine Genome Analyzer and the data was analyzed and prepared by TopHat2 and htseq-count. As reported previously, breast cancer could be grouped into five main groups known as basal epithelial-like group, HER2 group, normal breast-like group and two Luminal groups with a distinctive expression profile. Classifying breast tumor samples by using PAM50 method, the most common subtype was Luminal B and was significantly associated with ESR1 and ERBB2 high expression. Luminal A subtype had ESR1 and SLC39A6 significant high expression, whereas HER2 subtype had a high expression of ERBB2 and CNNE1 genes and low luminal epithelial gene expression. Basal-like and normal-like subtypes were associated with low expression of ESR1, PgR and HER2, and had significant high expression of cytokeratins 5 and 17. Our results were similar compared with TGCA breast cancer data results and with known studies related with breast cancer classification. Classifying breast tumors could add significant prognostic and predictive information to standard parameters, and moreover, identify marker genes for each subtype to find a better therapy for patients with breast cancer.
Resumo:
In this work we study the classification of forest types using mathematics based image analysis on satellite data. We are interested in improving classification of forest segments when a combination of information from two or more different satellites is used. The experimental part is based on real satellite data originating from Canada. This thesis gives summary of the mathematics basics of the image analysis and supervised learning , methods that are used in the classification algorithm. Three data sets and four feature sets were investigated in this thesis. The considered feature sets were 1) histograms (quantiles) 2) variance 3) skewness and 4) kurtosis. Good overall performances were achieved when a combination of ASTERBAND and RADARSAT2 data sets was used.
Resumo:
Identification of clouds from satellite images is now a routine task. Observation of clouds from the ground, however, is still needed to acquire a complete description of cloud conditions. Among the standard meteorologicalvariables, solar radiation is the most affected by cloud cover. In this note, a method for using global and diffuse solar radiation data to classify sky conditions into several classes is suggested. A classical maximum-likelihood method is applied for clustering data. The method is applied to a series of four years of solar radiation data and human cloud observations at a site in Catalonia, Spain. With these data, the accuracy of the solar radiation method as compared with human observations is 45% when nine classes of sky conditions are to be distinguished, and it grows significantly to almost 60% when samples are classified in only five different classes. Most errors are explained by limitations in the database; therefore, further work is under way with a more suitable database
Resumo:
Currently, numerous high-throughput technologies are available for the study of human carcinomas. In literature, many variations of these techniques have been described. The common denominator for these methodologies is the high amount of data obtained in a single experiment, in a short time period, and at a fairly low cost. However, these methods have also been described with several problems and limitations. The purpose of this study was to test the applicability of two selected high-throughput methods, cDNA and tissue microarrays (TMA), in cancer research. Two common human malignancies, breast and colorectal cancer, were used as examples. This thesis aims to present some practical considerations that need to be addressed when applying these techniques. cDNA microarrays were applied to screen aberrant gene expression in breast and colon cancers. Immunohistochemistry was used to validate the results and to evaluate the association of selected novel tumour markers with the outcome of the patients. The type of histological material used in immunohistochemistry was evaluated especially considering the applicability of whole tissue sections and different types of TMAs. Special attention was put on the methodological details in the cDNA microarray and TMA experiments. In conclusion, many potential tumour markers were identified in the cDNA microarray analyses. Immunohistochemistry could be applied to validate the observed gene expression changes of selected markers and to associate their expression change with patient outcome. In the current experiments, both TMAs and whole tissue sections could be used for this purpose. This study showed for the first time that securin and p120 catenin protein expression predict breast cancer outcome and the immunopositivity of carbonic anhydrase IX associates with the outcome of rectal cancer. The predictive value of these proteins was statistically evident also in multivariate analyses with up to a 13.1- fold risk for cancer specific death in a specific subgroup of patients.
Resumo:
Female sexual dysfunctions, including desire, arousal, orgasm and pain problems, have been shown to be highly prevalent among women around the world. The etiology of these dysfunctions is unclear but associations with health, age, psychological problems, and relationship factors have been identified. Genetic effects explain individual variation in orgasm function to some extent but until now quantitative behavior genetic analyses have not been applied to other sexual functions. In addition, behavior genetics can be applied to exploring the cause of any observed comorbidity between the dysfunctions. Discovering more about the etiology of the dysfunctions may further improve the classification systems which are currently under intense debate. The aims of the present thesis were to evaluate the psychometric properties of a Finnish-language version of a commonly used questionnaire for measuring female sexual function, the Female Sexual Function Index (FSFI), in order to investigate prevalence, comorbidity, and classification, and to explore the balance of genetic and environmental factors in the etiology as well as the associations of a number of biopsychosocial factors with female sexual functions. Female sexual functions were studied through survey methods in a population based sample of Finnish twins and their female siblings. There were two waves of data collection. The first data collection targeted 5,000 female twins aged 33–43 years and the second 7,680 female twins aged 18–33 and their over 18–year-old female siblings (n = 3,983). There was no overlap between the data collections. The combined overall response rate for both data collections was 53% (n = 8,868), with a better response rate in the second (57%) compared to the first (45%). In order to measure female sexual function, the FSFI was used. It includes 19 items which measure female sexual function during the previous four weeks in six subdomains; desire, subjective arousal, lubrication, orgasm, sexual satisfaction, and pain. In line with earlier research in clinical populations, a six factor solution of the Finnish-language version of the FSFI received supported. The internal consistencies of the scales were good to excellent. Some questions about how to avoid overestimating the prevalence of extreme dysfunctions due to women being allocated the score of zero if they had had no sexual activity during the preceding four weeks were raised. The prevalence of female sexual dysfunctions per se ranged from 11% for lubrication dysfunction to 55% for desire dysfunction. The prevalence rates for sexual dysfunction with concomitant sexual distress, in other words, sexual disorders were notably lower ranging from 7% for lubrication disorder to 23% for desire disorder. The comorbidity between the dysfunctions was substantial most notably between arousal and lubrication dysfunction even if these two dysfunctions showed distinct patterns of associations with the other dysfunctions. Genetic influences on individual variation in the six subdomains of FSFI were modest but significant ranging from 3–11% for additive genetic effects and 5–18% for nonadditive genetic effects. The rest of the variation in sexual functions was explained by nonshared environmental influences. A correlated factor model, including additive and nonadditive genetic effects and nonshared environmental effects had the best fit. All in all, every correlation between the genetic factors was significant except between lubrication and pain. All correlations between the nonshared environment factors were significant showing that there is a substantial overlap in genetic and nonshared environmental influences between the dysfunctions. In general, psychological problems, poor satisfaction with the relationship, sexual distress, and poor partner compatibility were associated with more sexual dysfunctions. Age was confounded with relationship length but had over and above relationship length a negative effect on desire and sexual satisfaction and a positive effect on orgasm and pain functions. Alcohol consumption in general was associated with better desire, arousal, lubrication, and orgasm function. Women pregnant with their first child had fewer pain problems than nulliparous nonpregnant women. Multiparous pregnant women had more orgasm problems compared to multiparous nonpregnant women. Having children was associated with less orgasm and pain problems. The conclusions were that desire, subjective arousal, lubrication, orgasm, sexual satisfaction, and pain are separate entities that have distinct associations with a number of different biopsychosocial factors. However, there is also considerable comorbidity between the dysfunctions which are explained by overlap in additive genetic, nonadditive genetic and nonshared environmental influences. Sexual dysfunctions are highly prevalent and are not always associated with sexual distress and this relationship might be moderated by a good relationship and compatibility with partner. Regarding classification, the results supports separate diagnoses for subjective arousal and genital arousal as well as the inclusion of pain under sexual dysfunctions.