104 resultados para microarray data classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Findings from randomised trials have shown a higher early risk of stroke after carotid artery stenting than after carotid endarterectomy. We assessed whether white-matter lesions affect the perioperative risk of stroke in patients treated with carotid artery stenting versus carotid endarterectomy. METHODS: Patients with symptomatic carotid artery stenosis included in the International Carotid Stenting Study (ICSS) were randomly allocated to receive carotid artery stenting or carotid endarterectomy. Copies of baseline brain imaging were analysed by two investigators, who were masked to treatment, for the severity of white-matter lesions using the age-related white-matter changes (ARWMC) score. Randomisation was done with a computer-generated sequence (1:1). Patients were divided into two groups using the median ARWMC. We analysed the risk of stroke within 30 days of revascularisation using a per-protocol analysis. ICSS is registered with controlled-trials.com, number ISRCTN 25337470. FINDINGS: 1036 patients (536 randomly allocated to carotid artery stenting, 500 to carotid endarterectomy) had baseline imaging available. Median ARWMC score was 7, and patients were dichotomised into those with a score of 7 or more and those with a score of less than 7. In patients treated with carotid artery stenting, those with an ARWMC score of 7 or more had an increased risk of stroke compared with those with a score of less than 7 (HR for any stroke 2·76, 95% CI 1·17-6·51; p=0·021; HR for non-disabling stroke 3·00, 1·10-8·36; p=0·031), but we did not see a similar association in patients treated with carotid endarterectomy (HR for any stroke 1·18, 0·40-3·55; p=0·76; HR for disabling or fatal stroke 1·41, 0·38-5·26; p=0·607). Carotid artery stenting was associated with a higher risk of stroke compared with carotid endarterectomy in patients with an ARWMC score of 7 or more (HR for any stroke 2·98, 1·29-6·93; p=0·011; HR for non-disabling stroke 6·34, 1·45-27·71; p=0·014), but there was no risk difference in patients with an ARWMC score of less than 7. INTERPRETATION: The presence of white-matter lesions on brain imaging should be taken into account when selecting patients for carotid revascularisation. Carotid artery stenting should be avoided in patients with more extensive white-matter lesions, but might be an acceptable alternative to carotid endarterectomy in patients with less extensive lesions. FUNDING: Medical Research Council, the Stroke Association, Sanofi-Synthélabo, the European Union Research Framework Programme 5.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate the relevance of morphological operators for the classification of land use in urban scenes using submetric panchromatic imagery. A support vector machine is used for the classification. Six types of filters have been employed: opening and closing, opening and closing by reconstruction, and opening and closing top hat. The type and scale of the filters are discussed, and a feature selection algorithm called recursive feature elimination is applied to decrease the dimensionality of the input data. The analysis performed on two QuickBird panchromatic images showed that simple opening and closing operators are the most relevant for classification at such a high spatial resolution. Moreover, mixed sets combining simple and reconstruction filters provided the best performance. Tests performed on both images, having areas characterized by different architectural styles, yielded similar results for both feature selection and classification accuracy, suggesting the generalization of the feature sets highlighted.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Résumé Cette thèse est consacrée à l'analyse, la modélisation et la visualisation de données environnementales à référence spatiale à l'aide d'algorithmes d'apprentissage automatique (Machine Learning). L'apprentissage automatique peut être considéré au sens large comme une sous-catégorie de l'intelligence artificielle qui concerne particulièrement le développement de techniques et d'algorithmes permettant à une machine d'apprendre à partir de données. Dans cette thèse, les algorithmes d'apprentissage automatique sont adaptés pour être appliqués à des données environnementales et à la prédiction spatiale. Pourquoi l'apprentissage automatique ? Parce que la majorité des algorithmes d'apprentissage automatiques sont universels, adaptatifs, non-linéaires, robustes et efficaces pour la modélisation. Ils peuvent résoudre des problèmes de classification, de régression et de modélisation de densité de probabilités dans des espaces à haute dimension, composés de variables informatives spatialisées (« géo-features ») en plus des coordonnées géographiques. De plus, ils sont idéaux pour être implémentés en tant qu'outils d'aide à la décision pour des questions environnementales allant de la reconnaissance de pattern à la modélisation et la prédiction en passant par la cartographie automatique. Leur efficacité est comparable au modèles géostatistiques dans l'espace des coordonnées géographiques, mais ils sont indispensables pour des données à hautes dimensions incluant des géo-features. Les algorithmes d'apprentissage automatique les plus importants et les plus populaires sont présentés théoriquement et implémentés sous forme de logiciels pour les sciences environnementales. Les principaux algorithmes décrits sont le Perceptron multicouches (MultiLayer Perceptron, MLP) - l'algorithme le plus connu dans l'intelligence artificielle, le réseau de neurones de régression généralisée (General Regression Neural Networks, GRNN), le réseau de neurones probabiliste (Probabilistic Neural Networks, PNN), les cartes auto-organisées (SelfOrganized Maps, SOM), les modèles à mixture Gaussiennes (Gaussian Mixture Models, GMM), les réseaux à fonctions de base radiales (Radial Basis Functions Networks, RBF) et les réseaux à mixture de densité (Mixture Density Networks, MDN). Cette gamme d'algorithmes permet de couvrir des tâches variées telle que la classification, la régression ou l'estimation de densité de probabilité. L'analyse exploratoire des données (Exploratory Data Analysis, EDA) est le premier pas de toute analyse de données. Dans cette thèse les concepts d'analyse exploratoire de données spatiales (Exploratory Spatial Data Analysis, ESDA) sont traités selon l'approche traditionnelle de la géostatistique avec la variographie expérimentale et selon les principes de l'apprentissage automatique. La variographie expérimentale, qui étudie les relations entre pairs de points, est un outil de base pour l'analyse géostatistique de corrélations spatiales anisotropiques qui permet de détecter la présence de patterns spatiaux descriptible par une statistique. L'approche de l'apprentissage automatique pour l'ESDA est présentée à travers l'application de la méthode des k plus proches voisins qui est très simple et possède d'excellentes qualités d'interprétation et de visualisation. Une part importante de la thèse traite de sujets d'actualité comme la cartographie automatique de données spatiales. Le réseau de neurones de régression généralisée est proposé pour résoudre cette tâche efficacement. Les performances du GRNN sont démontrées par des données de Comparaison d'Interpolation Spatiale (SIC) de 2004 pour lesquelles le GRNN bat significativement toutes les autres méthodes, particulièrement lors de situations d'urgence. La thèse est composée de quatre chapitres : théorie, applications, outils logiciels et des exemples guidés. Une partie importante du travail consiste en une collection de logiciels : Machine Learning Office. Cette collection de logiciels a été développée durant les 15 dernières années et a été utilisée pour l'enseignement de nombreux cours, dont des workshops internationaux en Chine, France, Italie, Irlande et Suisse ainsi que dans des projets de recherche fondamentaux et appliqués. Les cas d'études considérés couvrent un vaste spectre de problèmes géoenvironnementaux réels à basse et haute dimensionnalité, tels que la pollution de l'air, du sol et de l'eau par des produits radioactifs et des métaux lourds, la classification de types de sols et d'unités hydrogéologiques, la cartographie des incertitudes pour l'aide à la décision et l'estimation de risques naturels (glissements de terrain, avalanches). Des outils complémentaires pour l'analyse exploratoire des données et la visualisation ont également été développés en prenant soin de créer une interface conviviale et facile à l'utilisation. Machine Learning for geospatial data: algorithms, software tools and case studies Abstract The thesis is devoted to the analysis, modeling and visualisation of spatial environmental data using machine learning algorithms. In a broad sense machine learning can be considered as a subfield of artificial intelligence. It mainly concerns with the development of techniques and algorithms that allow computers to learn from data. In this thesis machine learning algorithms are adapted to learn from spatial environmental data and to make spatial predictions. Why machine learning? In few words most of machine learning algorithms are universal, adaptive, nonlinear, robust and efficient modeling tools. They can find solutions for the classification, regression, and probability density modeling problems in high-dimensional geo-feature spaces, composed of geographical space and additional relevant spatially referenced features. They are well-suited to be implemented as predictive engines in decision support systems, for the purposes of environmental data mining including pattern recognition, modeling and predictions as well as automatic data mapping. They have competitive efficiency to the geostatistical models in low dimensional geographical spaces but are indispensable in high-dimensional geo-feature spaces. The most important and popular machine learning algorithms and models interesting for geo- and environmental sciences are presented in details: from theoretical description of the concepts to the software implementation. The main algorithms and models considered are the following: multi-layer perceptron (a workhorse of machine learning), general regression neural networks, probabilistic neural networks, self-organising (Kohonen) maps, Gaussian mixture models, radial basis functions networks, mixture density networks. This set of models covers machine learning tasks such as classification, regression, and density estimation. Exploratory data analysis (EDA) is initial and very important part of data analysis. In this thesis the concepts of exploratory spatial data analysis (ESDA) is considered using both traditional geostatistical approach such as_experimental variography and machine learning. Experimental variography is a basic tool for geostatistical analysis of anisotropic spatial correlations which helps to understand the presence of spatial patterns, at least described by two-point statistics. A machine learning approach for ESDA is presented by applying the k-nearest neighbors (k-NN) method which is simple and has very good interpretation and visualization properties. Important part of the thesis deals with a hot topic of nowadays, namely, an automatic mapping of geospatial data. General regression neural networks (GRNN) is proposed as efficient model to solve this task. Performance of the GRNN model is demonstrated on Spatial Interpolation Comparison (SIC) 2004 data where GRNN model significantly outperformed all other approaches, especially in case of emergency conditions. The thesis consists of four chapters and has the following structure: theory, applications, software tools, and how-to-do-it examples. An important part of the work is a collection of software tools - Machine Learning Office. Machine Learning Office tools were developed during last 15 years and was used both for many teaching courses, including international workshops in China, France, Italy, Ireland, Switzerland and for realizing fundamental and applied research projects. Case studies considered cover wide spectrum of the real-life low and high-dimensional geo- and environmental problems, such as air, soil and water pollution by radionuclides and heavy metals, soil types and hydro-geological units classification, decision-oriented mapping with uncertainties, natural hazards (landslides, avalanches) assessments and susceptibility mapping. Complementary tools useful for the exploratory data analysis and visualisation were developed as well. The software is user friendly and easy to use.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a validation study on statistical nonsupervised brain tissue classification techniques in magnetic resonance (MR) images. Several image models assuming different hypotheses regarding the intensity distribution model, the spatial model and the number of classes are assessed. The methods are tested on simulated data for which the classification ground truth is known. Different noise and intensity nonuniformities are added to simulate real imaging conditions. No enhancement of the image quality is considered either before or during the classification process. This way, the accuracy of the methods and their robustness against image artifacts are tested. Classification is also performed on real data where a quantitative validation compares the methods' results with an estimated ground truth from manual segmentations by experts. Validity of the various classification methods in the labeling of the image as well as in the tissue volume is estimated with different local and global measures. Results demonstrate that methods relying on both intensity and spatial information are more robust to noise and field inhomogeneities. We also demonstrate that partial volume is not perfectly modeled, even though methods that account for mixture classes outperform methods that only consider pure Gaussian classes. Finally, we show that simulated data results can also be extended to real data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer queried and sampling is focused on division of clusters showing mixed labels. The model is tested on a VHR image in a multiclass classification setting. The method clearly outperforms random sampling in a transductive setting, but cannot generalize to unseen data, since it aims at optimizing the classification of a given cluster structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mature T-cell and T/NK-cell neoplasms are both uncommon and heterogeneous, among the broad category of non-Hodgkin's lymphomas. Due to the lack of specific genetic alterations in the vast majority of cases, most currently defined entities show overlapping morphologic and immunophenotypic features and therefore pose a challenge to the diagnostic pathologist. The goal of the symposium is to address current criteria for the recognition of specific subtypes of T-cell lymphoma, and to highlight new data regarding emerging immunophenotypic or molecular markers. This activity has been designed to meet the needs of practicing pathologists, and residents and fellows enrolled in training programs in anatomic and clinical pathology. It should be a particular benefit to those with an interest in hematopathology. Upon completion of this activity, participants should be better able to: -To be able to state the basis for the classification of mature T-cell malignancies involving nodal and extranodal sites. -To recognize and accurately diagnose the various subtypes of nodal and extranodal peripheral T-cell lymphomas. -To utilize immunohistochemical and molecular tests to characterize atypical T-cell proliferations. -To recognize and accurately diagnose T-cell lymphoproliferative lesions involving the skin and gastrointestinal tract, and be able to provide guidance regarding their clinical aggressiveness and management -To be able to utilize flow cytometric data to identify diverse functional T-cell subsets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: In order to provide a cost-effective tool to analyse pharmacogenetic markers in malaria treatment, DNA microarray technology was compared with sequencing of polymerase chain reaction (PCR) fragments to detect single nucleotide polymorphisms (SNPs) in a larger number of samples. Methods: The microarray was developed to affordably generate SNP data of genes encoding the human cytochrome P450 enzyme family (CYP) and N-acetyltransferase-2 (NAT2) involved in antimalarial drug metabolisms and with known polymorphisms, i.e. CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, and NAT2. Results: For some SNPs, i.e. CYP2A6*2, CYP2B6*5, CYP2C8*3, CYP2C9*3/*5, CYP2C19*3, CYP2D6*4 and NAT2*6/*7/*14, agreement between both techniques ranged from substantial to almost perfect (kappa index between 0.61 and 1.00), whilst for other SNPs a large variability from slight to substantial agreement (kappa index between 0.39 and 1.00) was found, e. g. CYP2D6*17 (2850C>T), CYP3A4*1B and CYP3A5*3. Conclusion: The major limit of the microarray technology for this purpose was lack of robustness and with a large number of missing data or with incorrect specificity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The increase of publicly available sequencing data has allowed for rapid progress in our understanding of genome composition. As new information becomes available we should constantly be updating and reanalyzing existing and newly acquired data. In this report we focus on transposable elements (TEs) which make up a significant portion of nearly all sequenced genomes. Our ability to accurately identify and classify these sequences is critical to understanding their impact on host genomes. At the same time, as we demonstrate in this report, problems with existing classification schemes have led to significant misunderstandings of the evolution of both TE sequences and their host genomes. In a pioneering publication Finnegan (1989) proposed classifying all TE sequences into two classes based on transposition mechanisms and structural features: the retrotransposons (class I) and the DNA transposons (class II). We have retraced how ideas regarding TE classification and annotation in both prokaryotic and eukaryotic scientific communities have changed over time. This has led us to observe that: (1) a number of TEs have convergent structural features and/or transposition mechanisms that have led to misleading conclusions regarding their classification, (2) the evolution of TEs is similar to that of viruses by having several unrelated origins, (3) there might be at least 8 classes and 12 orders of TEs including 10 novel orders. In an effort to address these classification issues we propose: (1) the outline of a universal TE classification, (2) a set of methods and classification rules that could be used by all scientific communities involved in the study of TEs, and (3) a 5-year schedule for the establishment of an International Committee for Taxonomy of Transposable Elements (ICTTE).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: To develop a consensus opinion regarding capturing diagnosis-timing in coded hospital data. METHODS: As part of the World Health Organization International Classification of Diseases-11th Revision initiative, the Quality and Safety Topic Advisory Group is charged with enhancing the capture of quality and patient safety information in morbidity data sets. One such feature is a diagnosis-timing flag. The Group has undertaken a narrative literature review, scanned national experiences focusing on countries currently using timing flags, and held a series of meetings to derive formal recommendations regarding diagnosis-timing reporting. RESULTS: The completeness of diagnosis-timing reporting continues to improve with experience and use; studies indicate that it enhances risk-adjustment and may have a substantial impact on hospital performance estimates, especially for conditions/procedures that involve acutely ill patients. However, studies suggest that its reliability varies, is better for surgical than medical patients (kappa in hip fracture patients of 0.7-1.0 versus kappa in pneumonia of 0.2-0.6) and is dependent on coder training and setting. It may allow simpler and more precise specification of quality indicators. CONCLUSIONS: As the evidence indicates that a diagnosis-timing flag improves the ability of routinely collected, coded hospital data to support outcomes research and the development of quality and safety indicators, the Group recommends that a classification of 'arising after admission' (yes/no), with permitted designations of 'unknown or clinically undetermined', will facilitate coding while providing flexibility when there is uncertainty. Clear coding standards and guidelines with ongoing coder education will be necessary to ensure reliability of the diagnosis-timing flag.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Commission on Classification and Terminology and the Commission on Epidemiology of the International League Against Epilepsy (ILAE) have charged a Task Force to revise concepts, definition, and classification of status epilepticus (SE). The proposed new definition of SE is as follows: Status epilepticus is a condition resulting either from the failure of the mechanisms responsible for seizure termination or from the initiation of mechanisms, which lead to abnormally, prolonged seizures (after time point t1 ). It is a condition, which can have long-term consequences (after time point t2 ), including neuronal death, neuronal injury, and alteration of neuronal networks, depending on the type and duration of seizures. This definition is conceptual, with two operational dimensions: the first is the length of the seizure and the time point (t1 ) beyond which the seizure should be regarded as "continuous seizure activity." The second time point (t2 ) is the time of ongoing seizure activity after which there is a risk of long-term consequences. In the case of convulsive (tonic-clonic) SE, both time points (t1 at 5 min and t2 at 30 min) are based on animal experiments and clinical research. This evidence is incomplete, and there is furthermore considerable variation, so these time points should be considered as the best estimates currently available. Data are not yet available for other forms of SE, but as knowledge and understanding increase, time points can be defined for specific forms of SE based on scientific evidence and incorporated into the definition, without changing the underlying concepts. A new diagnostic classification system of SE is proposed, which will provide a framework for clinical diagnosis, investigation, and therapeutic approaches for each patient. There are four axes: (1) semiology; (2) etiology; (3) electroencephalography (EEG) correlates; and (4) age. Axis 1 (semiology) lists different forms of SE divided into those with prominent motor systems, those without prominent motor systems, and currently indeterminate conditions (such as acute confusional states with epileptiform EEG patterns). Axis 2 (etiology) is divided into subcategories of known and unknown causes. Axis 3 (EEG correlates) adopts the latest recommendations by consensus panels to use the following descriptors for the EEG: name of pattern, morphology, location, time-related features, modulation, and effect of intervention. Finally, axis 4 divides age groups into neonatal, infancy, childhood, adolescent and adulthood, and elderly.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An important aspect of immune monitoring for vaccine development, clinical trials, and research is the detection, measurement, and comparison of antigen-specific T-cells from subject samples under different conditions. Antigen-specific T-cells compose a very small fraction of total T-cells. Developments in cytometry technology over the past five years have enabled the measurement of single-cells in a multivariate and high-throughput manner. This growth in both dimensionality and quantity of data continues to pose a challenge for effective identification and visualization of rare cell subsets, such as antigen-specific T-cells. Dimension reduction and feature extraction play pivotal role in both identifying and visualizing cell populations of interest in large, multi-dimensional cytometry datasets. However, the automated identification and visualization of rare, high-dimensional cell subsets remains challenging. Here we demonstrate how a systematic and integrated approach combining targeted feature extraction with dimension reduction can be used to identify and visualize biological differences in rare, antigen-specific cell populations. By using OpenCyto to perform semi-automated gating and features extraction of flow cytometry data, followed by dimensionality reduction with t-SNE we are able to identify polyfunctional subpopulations of antigen-specific T-cells and visualize treatment-specific differences between them.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The World Health Organization (WHO) plans to submit the 11th revision of the International Classification of Diseases (ICD) to the World Health Assembly in 2018. The WHO is working toward a revised classification system that has an enhanced ability to capture health concepts in a manner that reflects current scientific evidence and that is compatible with contemporary information systems. In this paper, we present recommendations made to the WHO by the ICD revision's Quality and Safety Topic Advisory Group (Q&S TAG) for a new conceptual approach to capturing healthcare-related harms and injuries in ICD-coded data. The Q&S TAG has grouped causes of healthcare-related harm and injuries into four categories that relate to the source of the event: (a) medications and substances, (b) procedures, (c) devices and (d) other aspects of care. Under the proposed multiple coding approach, one of these sources of harm must be coded as part of a cluster of three codes to depict, respectively, a healthcare activity as a 'source' of harm, a 'mode or mechanism' of harm and a consequence of the event summarized by these codes (i.e. injury or harm). Use of this framework depends on the implementation of a new and potentially powerful code-clustering mechanism in ICD-11. This new framework for coding healthcare-related harm has great potential to improve the clinical detail of adverse event descriptions, and the overall quality of coded health data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

At present, despite extensive laboratory investigations, most cases of porcine abortion remain without an etiological diagnosis. Due to a lack of recent data on the abortigenic effect of order Chlamydiales, 286 fetuses and their placentae of 113 abortion cases (1-5 fetuses per abortion case) were investigated by polymerase chain reaction (PCR) methods for family Chlamydiaceae and selected Chlamydia-like organisms such as Parachlamydia acanthamoebae and Waddlia chondrophila. In 0.35% of the cases (1/286 fetuses), the Chlamydiaceae real-time PCR was positive. In the Chlamydiaceae-positive fetus, Chlamydia abortus was detected by a commercial microarray and 16S ribosomal RNA PCR followed by sequencing. The positive fetus had a Porcine circovirus-2 coinfection. By the Parachlamydia real-time PCR, 3.5% (10/286 fetuses of 9 abortion cases) were questionable positive (threshold cycle values: 35.0-45.0). In 2 of these 10 cases, a confirmation by Chlamydiales-specific real-time PCR was possible. All samples tested negative by the Waddlia real-time PCR. It seems unlikely that Chlamydiaceae, Parachlamydia, and Waddlia play an important role as abortigenic agents in Swiss sows.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Chlamydial infections in koalas can cause life-threatening diseases leading to blindness and sterility. However, little is known about the systemic spread of chlamydiae in the inner organs of the koala, and data concerning related pathological organ lesions are limited. The aim of this study was to perform a thorough investigation of organs from 23 koalas and to correlate their histopathological lesions to molecular chlamydial detection. To reach this goal, 246 formalin-fixed and paraffin embedded organ samples from 23 koalas were investigated by histopathology, Chlamydiaceae real-time PCR and immunohistochemistry, ArrayTube Microarray for Chlamydiaceae species identification as well as Chlamydiales real-time PCR and sequencing. By PCR, two koalas were positive for Chlamydia pecorum whereas immunohistochemical labelling for Chlamydiaceae was detected in 10 tissues out of nine koalas. The majority of these (n=6) had positive labelling in the urogenital tract related to histopathological lesions such as cystitis, endometritis, pyelonephritis and prostatitis. Somehow unexpected was the positive labelling in the gastrointestinal tract including the cloaca as well as in lung and spleen indicating systemic spread of infection. Uncultured Chlamydiales were detected in several organs of seven koalas by PCR, and four of these suffered from plasmacytic enteritis of unknown aetiology. Whether the finding of Chlamydia-like organisms in the gastrointestinal tract is linked to plasmacytic enteritis is unclear and remains speculative. However, as recently shown in a mouse model, the gastrointestinal tract might play a role being the site for persistent chlamydial infections and being a source for reinfection of the genital tract.