72 resultados para microarray data classification

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a new class of neurofuzzy construction algorithms with the aim of maximizing generalization capability specifically for imbalanced data classification problems based on leave-one-out (LOO) cross validation. The algorithms are in two stages, first an initial rule base is constructed based on estimating the Gaussian mixture model with analysis of variance decomposition from input data; the second stage carries out the joint weighted least squares parameter estimation and rule selection using orthogonal forward subspace selection (OFSS)procedure. We show how different LOO based rule selection criteria can be incorporated with OFSS, and advocate either maximizing the leave-one-out area under curve of the receiver operating characteristics, or maximizing the leave-one-out Fmeasure if the data sets exhibit imbalanced class distribution. Extensive comparative simulations illustrate the effectiveness of the proposed algorithms.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The consequences of increasing atmospheric carbon dioxide for long-term adaptation of forest ecosystems remain uncertain, with virtually no studies undertaken at the genetic level. A global analysis using cDNA microarrays was conducted following 6 yr exposure of Populus × euramericana (clone I-214) to elevated [CO2] in a FACE (free-air CO2 enrichment) experiment.• Gene expression was sensitive to elevated [CO2] but the response depended on the developmental age of the leaves, and < 50 transcripts differed significantly between different CO2 environments. For young leaves most differentially expressed genes were upregulated in elevated [CO2], while in semimature leaves most were downregulated in elevated [CO2].• For transcripts related only to the small subunit of Rubisco, upregulation in LPI 3 and downregulation in LPI 6 leaves in elevated CO2 was confirmed by anova. Similar patterns of gene expression for young leaves were also confirmed independently across year 3 and year 6 microarray data, and using real-time RT–PCR.• This study provides the first clues to the long-term genetic expression changes that may occur during long-term plant response to elevated CO2.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background Somatic embryogenesis (SE) in plants is a process by which embryos are generated directly from somatic cells, rather than from the fused products of male and female gametes. Despite the detailed expression analysis of several somatic-to-embryonic marker genes, a comprehensive understanding of SE at a molecular level is still lacking. The present study was designed to generate high resolution transcriptome datasets for early SE providing the way for future research to understand the underlying molecular mechanisms that regulate this process. We sequenced Arabidopsis thaliana somatic embryos collected from three distinct developmental time-points (5, 10 and 15 d after in vitro culture) using the Illumina HiSeq 2000 platform. Results This study yielded a total of 426,001,826 sequence reads mapped to 26,520 genes in the A. thaliana reference genome. Analysis of embryonic cultures after 5 and 10 d showed differential expression of 1,195 genes; these included 778 genes that were more highly expressed after 5 d as compared to 10 d. Moreover, 1,718 genes were differentially expressed in embryonic cultures between 10 and 15 d. Our data also showed at least eight different expression patterns during early SE; the majority of genes are transcriptionally more active in embryos after 5 d. Comparison of transcriptomes derived from somatic embryos and leaf tissues revealed that at least 4,951 genes are transcriptionally more active in embryos than in the leaf; increased expression of genes involved in DNA cytosine methylation and histone deacetylation were noted in embryogenic tissues. In silico expression analysis based on microarray data found that approximately 5% of these genes are transcriptionally more active in somatic embryos than in actively dividing callus and non-dividing leaf tissues. Moreover, this identified 49 genes expressed at a higher level in somatic embryos than in other tissues. This included several genes with unknown function, as well as others related to oxidative and osmotic stress, and auxin signalling. Conclusions The transcriptome information provided here will form the foundation for future research on genetic and epigenetic control of plant embryogenesis at a molecular level. In follow-up studies, these data could be used to construct a regulatory network for SE; the genes more highly expressed in somatic embryos than in vegetative tissues can be considered as potential candidates to validate these networks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Although the adult brain contains neural stem cells (NSCs) that generate new neurons throughout life, these astrocyte-like populations are restricted to two discrete niches. Despite their terminally differentiated phenotype, adult parenchymal astrocytes can re-acquire NSC-like characteristics following injury, and as such, these 'reactive' astrocytes offer an alternative source of cells for central nervous system (CNS) repair following injury or disease. At present, the mechanisms that regulate the potential of different types of astrocytes are poorly understood. We used in vitro and ex vivo astrocytes to identify candidate pathways important for regulation of astrocyte potential. Using in vitro neural progenitor cell (NPC)-derived astrocytes, we found that exposure of more lineage-restricted astrocytes to either tumor necrosis factor alpha (TNF-α) (via nuclear factor-κB (NFκB)) or the bone morphogenetic protein (BMP) inhibitor, noggin, led to re-acquisition of NPC properties accompanied by transcriptomic and epigenetic changes consistent with a more neurogenic, NPC-like state. Comparative analyses of microarray data from in vitro-derived and ex vivo postnatal parenchymal astrocytes identified several common pathways and upstream regulators associated with inflammation (including transforming growth factor (TGF)-β1 and peroxisome proliferator-activated receptor gamma (PPARγ)) and cell cycle control (including TP53) as candidate regulators of astrocyte phenotype and potential. We propose that inflammatory signalling may control the normal, progressive restriction in potential of differentiating astrocytes as well as under reactive conditions and represent future targets for therapies to harness the latent neurogenic capacity of parenchymal astrocytes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Aims. Protein kinases are potential therapeutic targets for heart failure, but most studies of cardiac protein kinases derive from other systems, an approach that fails to account for specific kinases expressed in the heart and the contractile cardiomyocytes. We aimed to define the cardiomyocyte kinome (i.e. the protein kinases expressed in cardiomyocytes) and identify kinases with altered expression in human failing hearts. Methods and Results. Expression profiling (Affymetrix microarrays) detected >400 protein kinase mRNAs in rat neonatal ventricular myocytes (NVMs) and/or adult ventricular myocytes (AVMs), 32 and 93 of which were significantly upregulated or downregulated (>2-fold), respectively, in AVMs. Data for AGC family members were validated by qPCR. Proteomics analysis identified >180 cardiomyocyte protein kinases, with high relative expression of mitogen-activated protein kinase cascades and other known cardiomyocyte kinases (e.g. CAMKs, cAMP-dependent protein kinase). Other kinases are poorly-investigated (e.g. Slk, Stk24, Oxsr1). Expression of Akt1/2/3, BRaf, ERK1/2, Map2k1, Map3k8, Map4k4, MST1/3, p38-MAPK, PKCδ, Pkn2, Ripk1/2, Tnni3k and Zak was confirmed by immunoblotting. Relative to total protein, Map3k8 and Tnni3k were upregulated in AVMs vs NVMs. Microarray data for human hearts demonstrated variation in kinome expression that may influence responses to kinase inhibitor therapies. Furthermore, some kinases were upregulated (e.g. NRK, JAK2, STK38L) or downregulated (e.g. MAP2K1, IRAK1, STK40) in human failing hearts. Conclusions. This characterization of the spectrum of kinases expressed in cardiomyocytes and the heart (cardiomyocyte and cardiac kinomes) identified novel kinases, some of which are differentially expressed in failing human hearts and could serve as potential therapeutic targets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Site-specific management requires accurate knowledge of the spatial variation in a range of soil properties within fields. This involves considerable sampling effort, which is costly. Ancillary data, such as crop yield, elevation and apparent electrical conductivity (ECa) of the soil, can provide insight into the spatial variation of some soil properties. A multivariate classification with spatial constraint imposed by the variogram was used to classify data from two arable crop fields. The yield data comprised 5 years of crop yield, and the ancillary data 3 years of yield data, elevation and ECa. Information on soil chemical and physical properties was provided by intensive surveys of the soil. Multivariate variograms computed from these data were used to constrain sites spatially within classes to increase their contiguity. The constrained classifications resulted in coherent classes, and those based on the ancillary data were similar to those from the soil properties. The ancillary data seemed to identify areas in the field where the soil is reasonably homogeneous. The results of targeted sampling showed that these classes could be used as a basis for management and to guide future sampling of the soil.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Airborne LIght Detection And Ranging (LIDAR) provides accurate height information for objects on the earth, which makes LIDAR become more and more popular in terrain and land surveying. In particular, LIDAR data offer vital and significant features for land-cover classification which is an important task in many application domains. In this paper, an unsupervised approach based on an improved fuzzy Markov random field (FMRF) model is developed, by which the LIDAR data, its co-registered images acquired by optical sensors, i.e. aerial color image and near infrared image, and other derived features are fused effectively to improve the ability of the LIDAR system for the accurate land-cover classification. In the proposed FMRF model-based approach, the spatial contextual information is applied by modeling the image as a Markov random field (MRF), with which the fuzzy logic is introduced simultaneously to reduce the errors caused by the hard classification. Moreover, a Lagrange-Multiplier (LM) algorithm is employed to calculate a maximum A posteriori (MAP) estimate for the classification. The experimental results have proved that fusing the height data and optical images is particularly suited for the land-cover classification. The proposed approach works very well for the classification from airborne LIDAR data fused with its coregistered optical images and the average accuracy is improved to 88.9%.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Airborne lidar provides accurate height information of objects on the earth and has been recognized as a reliable and accurate surveying tool in many applications. In particular, lidar data offer vital and significant features for urban land-cover classification, which is an important task in urban land-use studies. In this article, we present an effective approach in which lidar data fused with its co-registered images (i.e. aerial colour images containing red, green and blue (RGB) bands and near-infrared (NIR) images) and other derived features are used effectively for accurate urban land-cover classification. The proposed approach begins with an initial classification performed by the Dempster–Shafer theory of evidence with a specifically designed basic probability assignment function. It outputs two results, i.e. the initial classification and pseudo-training samples, which are selected automatically according to the combined probability masses. Second, a support vector machine (SVM)-based probability estimator is adopted to compute the class conditional probability (CCP) for each pixel from the pseudo-training samples. Finally, a Markov random field (MRF) model is established to combine spatial contextual information into the classification. In this stage, the initial classification result and the CCP are exploited. An efficient belief propagation (EBP) algorithm is developed to search for the global minimum-energy solution for the maximum a posteriori (MAP)-MRF framework in which three techniques are developed to speed up the standard belief propagation (BP) algorithm. Lidar and its co-registered data acquired by Toposys Falcon II are used in performance tests. The experimental results prove that fusing the height data and optical images is particularly suited for urban land-cover classification. There is no training sample needed in the proposed approach, and the computational cost is relatively low. An average classification accuracy of 93.63% is achieved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The bewildering complexity of cortical microcircuits at the single cell level gives rise to surprisingly robust emergent activity patterns at the level of laminar and columnar local field potentials (LFPs) in response to targeted local stimuli. Here we report the results of our multivariate data-analytic approach based on simultaneous multi-site recordings using micro-electrode-array chips for investigation of the microcircuitary of rat somatosensory (barrel) cortex. We find high repeatability of stimulus-induced responses, and typical spatial distributions of LFP responses to stimuli in supragranular, granular, and infragranular layers, where the last form a particularly distinct class. Population spikes appear to travel with about 33 cm/s from granular to infragranular layers. Responses within barrel related columns have different profiles than those in neighbouring columns to the left or interchangeably to the right. Variations between slices occur, but can be minimized by strictly obeying controlled experimental protocols. Cluster analysis on normalized recordings indicates specific spatial distributions of time series reflecting the location of sources and sinks independent of the stimulus layer. Although the precise correspondences between single cell activity and LFPs are still far from clear, a sophisticated neuroinformatics approach in combination with multi-site LFP recordings in the standardized slice preparation is suitable for comparing normal conditions to genetically or pharmacologically altered situations based on real cortical microcircuitry.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Full-waveform laser scanning data acquired with a Riegl LMS-Q560 instrument were used to classify an orange orchard into orange trees, grass and ground using waveform parameters alone. Gaussian decomposition was performed on this data capture from the National Airborne Field Experiment in November 2006 using a custom peak-detection procedure and a trust-region-reflective algorithm for fitting Gauss functions. Calibration was carried out using waveforms returned from a road surface, and the backscattering coefficient c was derived for every waveform peak. The processed data were then analysed according to the number of returns detected within each waveform and classified into three classes based on pulse width and c. For single-peak waveforms the scatterplot of c versus pulse width was used to distinguish between ground, grass and orange trees. In the case of multiple returns, the relationship between first (or first plus middle) and last return c values was used to separate ground from other targets. Refinement of this classification, and further sub-classification into grass and orange trees was performed using the c versus pulse width scatterplots of last returns. In all cases the separation was carried out using a decision tree with empirical relationships between the waveform parameters. Ground points were successfully separated from orange tree points. The most difficult class to separate and verify was grass, but those points in general corresponded well with the grass areas identified in the aerial photography. The overall accuracy reached 91%, using photography and relative elevation as ground truth. The overall accuracy for two classes, orange tree and combined class of grass and ground, yielded 95%. Finally, the backscattering coefficient c of single-peak waveforms was also used to derive reflectance values of the three classes. The reflectance of the orange tree class (0.31) and ground class (0.60) are consistent with published values at the wavelength of the Riegl scanner (1550 nm). The grass class reflectance (0.46) falls in between the other two classes as might be expected, as this class has a mixture of the contributions of both vegetation and ground reflectance properties.