8 resultados para microarray data classification
em Aston University Research Archive
Resumo:
Clustering techniques such as k-means and hierarchical clustering are commonly used to analyze DNA microarray derived gene expression data. However, the interactions between processes underlying the cell activity suggest that the complexity of the microarray data structure may not be fully represented with discrete clustering methods.
Resumo:
To capture the genomic profiles for histone modification, chromatin immunoprecipitation (ChIP) is combined with next generation sequencing, which is called ChIP-seq. However, enriched regions generated from the ChIP-seq data are only evaluated on the limited knowledge acquired from manually examining the relevant biological literature. This paper proposes a novel framework, which integrates multiple knowledge sources such as biological literature, Gene Ontology, and microarray data. In order to precisely analyze ChIP-seq data for histone modification, knowledge integration is based on a unified probabilistic model. The model is employed to re-rank the enriched regions generated from peak finding algorithms. Through filtering the reranked enriched regions using some predefined threshold, more reliable and precise results could be generated. The combination of the multiple knowledge sources with the peaking finding algorithm produces a new paradigm for ChIP-seq data analysis. © (2012) Trans Tech Publications, Switzerland.
Resumo:
Oral drug delivery is considered the most popular route of delivery because of the ease of administration, availability of a wide range of dosage forms and the large surface area for drug absorption via the intestinal membrane. However, besides the unfavourable biopharmaceutical properties of the therapeutic agents, efflux transporters such as Pglycoprotein (P-gp) and multiple resistance proteins (MRP) decrease the overall drug uptake by extruding the drug from the cells. Although, prodrugs have been investigated to improve drug partitioning by masking the polar groups covalently with pre-moieties promoting increased uptake, they present significant challenges including reduced solubility and increased toxicity. The current work investigates the use of amino acids as ion-pairs for three model drugs: indomethacin (weak acid), trimethoprim (weak base) and ciprofloxacin (zwitter ion) in an attempt to improve both solubility and uptake. Solubility was studied by salt formation while creating new routes for uptake across the membranes via amino acids transporter proteins or dipeptidyl transporters was the rationale to enhance absorption. New salts were prepared for the model drugs and the oppositely charged amino acids by freeze drying and they were characterised using FTIR, 1HNMR, DSC, SEM, pH solubility profile, solubility and dissolution. Permeability profiles were assessed using an in vitro cell based method; Caco-2 cells and the genetic changes occurring across the transporter genes and various pathways involved in the cellular activities were studied using DNA microarrays. Solubility data showed a significant increase in drug solubility upon preparing the new salts with the oppositely charged counter ions (ciprofloxacin glutamate salt exhibiting 2.9x103 fold enhancement when compared to the free drug). Moreover, permeability studies showed a 3 fold increase in trimethoprim and indomethacin permeabilities upon ion-pairing with amino acids and more than 10 fold when the zwitter ionic drug was paired with glutamic acid. Microarray data revealed that trimethoprim was absorbed actively via OCTN1 transporters while MRP7 is the main transporter gene that mediates its efflux. The absorption of trimethoprim from trimethoprim glutamic acid ion-paired formulations was affected by the ratio of glutamic acid in the formulation which was inversely proportional to the degree of expression of OCTN1. Interestingly, ciprofloxacin glutamic acid ion-pairs were found to decrease the up-regulation of ciprofloxacin efflux proteins (P-gp and MRP4) and over-express two solute carrier transporters; (PEPT2 and SLCO1A2) suggesting that a high aqueous binding constant (K11aq) enables the ion-paired formulations to be absorbed as one entity. In conclusion, formation of ion-pairs with amino acids can influence in a positive way solubility, transfer and gene expression effects of drugs.
Resumo:
Retrospective clinical data presents many challenges for data mining and machine learning. The transcription of patient records from paper charts and subsequent manipulation of data often results in high volumes of noise as well as a loss of other important information. In addition, such datasets often fail to represent expert medical knowledge and reasoning in any explicit manner. In this research we describe applying data mining methods to retrospective clinical data to build a prediction model for asthma exacerbation severity for pediatric patients in the emergency department. Difficulties in building such a model forced us to investigate alternative strategies for analyzing and processing retrospective data. This paper describes this process together with an approach to mining retrospective clinical data by incorporating formalized external expert knowledge (secondary knowledge sources) into the classification task. This knowledge is used to partition the data into a number of coherent sets, where each set is explicitly described in terms of the secondary knowledge source. Instances from each set are then classified in a manner appropriate for the characteristics of the particular set. We present our methodology and outline a set of experiential results that demonstrate some advantages and some limitations of our approach. © 2008 Springer-Verlag Berlin Heidelberg.
Resumo:
The number of remote sensing platforms and sensors rises almost every year, yet much work on the interpretation of land cover is still carried out using either single images or images from the same source taken at different dates. Two questions could be asked of this proliferation of images: can the information contained in different scenes be used to improve the classification accuracy and, what is the best way to combine the different imagery? Two of these multiple image sources are MODIS on the Terra platform and ETM+ on board Landsat7, which are suitably complementary. Daily MODIS images with 36 spectral bands in 250-1000 m spatial resolution and seven spectral bands of ETM+ with 30m and 16 days spatial and temporal resolution respectively are available. In the UK, cloud cover may mean that only a few ETM+ scenes may be available for any particular year and these may not be at the time of year of most interest. The MODIS data may provide information on land cover over the growing season, such as harvest dates, that is not present in the ETM+ data. Therefore, the primary objective of this work is to develop a methodology for the integration of medium spatial resolution Landsat ETM+ image, with multi-temporal, multi-spectral, low-resolution MODIS \Terra images, with the aim of improving the classification of agricultural land. Additionally other data may also be incorporated such as field boundaries from existing maps. When classifying agricultural land cover of the type seen in the UK, where crops are largely sown in homogenous fields with clear and often mapped boundaries, the classification is greatly improved using the mapped polygons and utilising the classification of the polygon as a whole as an apriori probability in classifying each individual pixel using a Bayesian approach. When dealing with multiple images from different platforms and dates it is highly unlikely that the pixels will be exactly co-registered and these pixels will contain a mixture of different real world land covers. Similarly the different atmospheric conditions prevailing during the different days will mean that the same emission from the ground will give rise to different sensor reception. Therefore, a method is presented with a model of the instantaneous field of view and atmospheric effects to enable different remote sensed data sources to be integrated.
Resumo:
The aims of the project were twofold: 1) To investigate classification procedures for remotely sensed digital data, in order to develop modifications to existing algorithms and propose novel classification procedures; and 2) To investigate and develop algorithms for contextual enhancement of classified imagery in order to increase classification accuracy. The following classifiers were examined: box, decision tree, minimum distance, maximum likelihood. In addition to these the following algorithms were developed during the course of the research: deviant distance, look up table and an automated decision tree classifier using expert systems technology. Clustering techniques for unsupervised classification were also investigated. Contextual enhancements investigated were: mode filters, small area replacement and Wharton's CONAN algorithm. Additionally methods for noise and edge based declassification and contextual reclassification, non-probabilitic relaxation and relaxation based on Markov chain theory were developed. The advantages of per-field classifiers and Geographical Information Systems were investigated. The conclusions presented suggest suitable combinations of classifier and contextual enhancement, given user accuracy requirements and time constraints. These were then tested for validity using a different data set. A brief examination of the utility of the recommended contextual algorithms for reducing the effects of data noise was also carried out.
Resumo:
We address the important bioinformatics problem of predicting protein function from a protein's primary sequence. We consider the functional classification of G-Protein-Coupled Receptors (GPCRs), whose functions are specified in a class hierarchy. We tackle this task using a novel top-down hierarchical classification system where, for each node in the class hierarchy, the predictor attributes to be used in that node and the classifier to be applied to the selected attributes are chosen in a data-driven manner. Compared with a previous hierarchical classification system selecting classifiers only, our new system significantly reduced processing time without significantly sacrificing predictive accuracy.