75 resultados para 060102 Bioinformatics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of learning from imbalanced data is of critical importance in a large number of application domains and can be a bottleneck in the performance of various conventional learning methods that assume the data distribution to be balanced. The class imbalance problem corresponds to dealing with the situation where one class massively outnumbers the other. The imbalance between majority and minority would lead machine learning to be biased and produce unreliable outcomes if the imbalanced data is used directly. There has been increasing interest in this research area and a number of algorithms have been developed. However, independent evaluation of the algorithms is limited. This paper aims at evaluating the performance of five representative data sampling methods namely SMOTE, ADASYN, BorderlineSMOTE, SMOTETomek and RUSBoost that deal with class imbalance problems. A comparative study is conducted and the performance of each method is critically analysed in terms of assessment metrics. © 2013 Springer-Verlag.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent technological advances have increased the quantity of movement data being recorded. While valuable knowledge can be gained by analysing such data, its sheer volume creates challenges. Geovisual analytics, which helps the human cognition process by using tools to reason about data, offers powerful techniques to resolve these challenges. This paper introduces such a geovisual analytics environment for exploring movement trajectories, which provides visualisation interfaces, based on the classic space-time cube. Additionally, a new approach, using the mathematical description of motion within a space-time cube, is used to determine the similarity of trajectories and forms the basis for clustering them. These techniques were used to analyse pedestrian movement. The results reveal interesting and useful spatiotemporal patterns and clusters of pedestrians exhibiting similar behaviour.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cancer is a complex disease that has proven to be difficult to understand on the single-gene level. For this reason a functional elucidation needs to take interactions among genes on a systems-level into account. In this study, we infer a colon cancer network from a large-scale gene expression data set by using the method BC3Net. We provide a structural and a functional analysis of this network and also connect its molecular interaction structure with the chromosomal locations of the genes enabling the definition of cis- and trans-interactions. Furthermore, we investigate the interaction of genes that can be found in close neighborhoods on the chromosomes to gain insight into regulatory mechanisms. To our knowledge this is the first study analyzing the genome-scale colon cancer network.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Real-world graphs or networks tend to exhibit a well-known set of properties, such as heavy-tailed degree distributions, clustering and community formation. Much effort has been directed into creating realistic and tractable models for unlabelled graphs, which has yielded insights into graph structure and evolution. Recently, attention has moved to creating models for labelled graphs: many real-world graphs are labelled with both discrete and numeric attributes. In this paper, we presentAgwan (Attribute Graphs: Weighted and Numeric), a generative model for random graphs with discrete labels and weighted edges. The model is easily generalised to edges labelled with an arbitrary number of numeric attributes. We include algorithms for fitting the parameters of the Agwanmodel to real-world graphs and for generating random graphs from the model. Using real-world directed and undirected graphs as input, we compare our approach to state-of-the-art random labelled graph generators and draw conclusions about the contribution of discrete vertex labels and edge weights to graph structure.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ultrasound has long been recognized as a means of effecting change at the cellular and tissue levels [1-3], which may be enhanced in the presence of photosensitive agents [4-6]. During insonation, the presence of bubbles can also play a role, creating strong microstreaming effects in solution and in more dramatic circumstances leading to the formation of energetic microjets [7], plasmas [8], and the production of other highly reactive species [9]. Such sonodynamic activity has generated particular excitement in the medical community as it Moreover the dual role for microbubbles as both an adjunct to therapy and a diagnostic echogenicity enhancer has seen industry take a proactive role in their development. In the present paper we studied the role of ultrasound driven sonoluminescent light on the degradation of a fluorescent test species (rhodamine) in the presence of an archetypal photocatalyst material, TiO 2, with a view to exploring its exploitation potential for downstream medical applications. We found that, whilst the efficiency of this process is seen to be low compared with conventional ultra-violet sources, we advocate the further exploration of the sonoluminescent approach given its potential for non-invasive applications. A strategy for enhancing the effect is also suggested. 

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Acid whey has become a major concern especially in dairy industry manufacturing Greek yoghurt. Proper disposal of acid whey is essential as it not only increases the BOD of water but also increases the acidity when disposed of in landfill, rendering soil barren and unsuitable for cultivation. Effluent (acid-whey) treatment increases the cost of production. The vast quantities of acid whey that are produced by the dairy industry make the treatment and safe disposal of effluent very difficult. Hence an economical way to handle this problem is very important. Biogenic glycine betaine and trehalose have many applications in food and confectionery industry, medicine, bioprocess industry, agriculture, genetic engineering, and animal feeds (etc.), hence their production is of industrial importance. Here we used the extreme, obligate halophile Actinopolyspora halophila (MTCC 263) for fermentative production of glycine betaine and trehalose from acid whey. Maximum yields were obtained by implementation of a sequential media optimization process, identification and addition of rate-limiting enzyme cofactors via a bioinformatics approach, and manipulation of nitrogen substrate supply. The implications of using glycine as a precursor were also investigated. The core factors that affected production were identified and then optimized using orthogonal array design followed by response surface methodology. The maximum production achieved after complete optimization was 9.07 ± 0.25 g/L and 2.49 ± 0.14 g/L for glycine betaine and trehalose, respectively.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Infectious diseases are a leading cause of global human mortality. The use of antimicrobials remains the most common strategy for treatment. However, the isolation of pathogens resistant to virtually all antimicrobials makes it urgent to develop effective therapeutics based on new targets. Here we review a new drug discovery paradigm focusing on identifying and targeting host factors important for infection as well as pathogen determinants involved in disease progression. We summarize innovative strategies which by combining bioinformatics with transcriptomics and chemical genetics have already identified host factors essential for pathogen entry, survival and replication. We describe how the discovery of RNA interference which allows loss-of-function studies has facilitated functional genomic studies in human cells. It is expected that these studies will identify targets to be used as host-directed drug therapy which, together with antimicrobials targeting microbial virulence factors, will efficiently eliminate the invading pathogen.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the major challenges in systems biology is to understand the complex responses of a biological system to external perturbations or internal signalling depending on its biological conditions. Genome-wide transcriptomic profiling of cellular systems under various chemical perturbations allows the manifestation of certain features of the chemicals through their transcriptomic expression profiles. The insights obtained may help to establish the connections between human diseases, associated genes and therapeutic drugs. The main objective of this study was to systematically analyse cellular gene expression data under various drug treatments to elucidate drug-feature specific transcriptomic signatures. We first extracted drug-related information (drug features) from the collected textual description of DrugBank entries using text-mining techniques. A novel statistical method employing orthogonal least square learning was proposed to obtain drug-feature-specific signatures by integrating gene expression with DrugBank data. To obtain robust signatures from noisy input datasets, a stringent ensemble approach was applied with the combination of three techniques: resampling, leave-one-out cross validation, and aggregation. The validation experiments showed that the proposed method has the capacity of extracting biologically meaningful drug-feature-specific gene expression signatures. It was also shown that most of signature genes are connected with common hub genes by regulatory network analysis. The common hub genes were further shown to be related to general drug metabolism by Gene Ontology analysis. Each set of genes has relatively few interactions with other sets, indicating the modular nature of each signature and its drug-feature-specificity. Based on Gene Ontology analysis, we also found that each set of drug feature (DF)-specific genes were indeed enriched in biological processes related to the drug feature. The results of these experiments demonstrated the pot- ntial of the method for predicting certain features of new drugs using their transcriptomic profiles, providing a useful methodological framework and a valuable resource for drug development and characterization.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Quantile normalization (QN) is a technique for microarray data processing and is the default normalization method in the Robust Multi-array Average (RMA) procedure, which was primarily designed for analysing gene expression data from Affymetrix arrays. Given the abundance of Affymetrix microarrays and the popularity of the RMA method, it is crucially important that the normalization procedure is applied appropriately. In this study we carried out simulation experiments and also analysed real microarray data to investigate the suitability of RMA when it is applied to dataset with different groups of biological samples. From our experiments, we showed that RMA with QN does not preserve the biological signal included in each group, but rather it would mix the signals between the groups. We also showed that the Median Polish method in the summarization step of RMA has similar mixing effect. RMA is one of the most widely used methods in microarray data processing and has been applied to a vast volume of data in biomedical research. The problematic behaviour of this method suggests that previous studies employing RMA could have been misadvised or adversely affected. Therefore we think it is crucially important that the research community recognizes the issue and starts to address it. The two core elements of the RMA method, quantile normalization and Median Polish, both have the undesirable effects of mixing biological signals between different sample groups, which can be detrimental to drawing valid biological conclusions and to any subsequent analyses. Based on the evidence presented here and that in the literature, we recommend exercising caution when using RMA as a method of processing microarray gene expression data, particularly in situations where there are likely to be unknown subgroups of samples.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Radiation induced bystander effects are secondary effects caused by the production of chemical signals by cells in response to radiation. We present a Bio-PEPA model which builds on previous modelling work in this field to predict: the surviving fraction of cells in response to radiation, the relative proportion of cell death caused by bystander signalling, the risk of non-lethal damage and the probability of observing bystander signalling for a given dose. This work provides the foundation for modelling bystander effects caused by biologically realistic dose distributions, with implications for cancer therapies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Gene expression connectivity mapping has proven to be a powerful and flexible tool for research. Its application has been shown in a broad range of research topics, most commonly as a means of identifying potential small molecule compounds, which may be further investigated as candidates for repurposing to treat diseases. The public release of voluminous data from the Library of Integrated Cellular Signatures (LINCS) programme further enhanced the utilities and potentials of gene expression connectivity mapping in biomedicine. Results: We describe QUADrATiC (http://go.qub.ac.uk/QUADrATiC), a user-friendly tool for the exploration of gene expression connectivity on the subset of the LINCS data set corresponding to FDA-approved small molecule compounds. It enables the identification of compounds for repurposing therapeutic potentials. The software is designed to cope with the increased volume of data over existing tools, by taking advantage of multicore computing architectures to provide a scalable solution, which may be installed and operated on a range of computers, from laptops to servers. This scalability is provided by the use of the modern concurrent programming paradigm provided by the Akka framework. The QUADrATiC Graphical User Interface (GUI) has been developed using advanced Javascript frameworks, providing novel visualization capabilities for further analysis of connections. There is also a web services interface, allowing integration with other programs or scripts.Conclusions: QUADrATiC has been shown to provide an improvement over existing connectivity map software, in terms of scope (based on the LINCS data set), applicability (using FDA-approved compounds), usability and speed. It offers potential to biological researchers to analyze transcriptional data and generate potential therapeutics for focussed study in the lab. QUADrATiC represents a step change in the process of investigating gene expression connectivity and provides more biologically-relevant results than previous alternative solutions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Modern approaches to biomedical research and diagnostics targeted towards precision medicine are generating ‘big data’ across a range of high-throughput experimental and analytical platforms. Integrative analysis of this rich clinical, pathological, molecular and imaging data represents one of the greatest bottlenecks in biomarker discovery research in cancer and other diseases. Following on from the publication of our successful framework for multimodal data amalgamation and integrative analysis, Pathology Integromics in Cancer (PICan), this article will explore the essential elements of assembling an integromics framework from a more detailed perspective. PICan, built around a relational database storing curated multimodal data, is the research tool sitting at the heart of our interdisciplinary efforts to streamline biomarker discovery and validation. While recognizing that every institution has a unique set of priorities and challenges, we will use our experiences with PICan as a case study and starting point, rationalizing the design choices we made within the context of our local infrastructure and specific needs, but also highlighting alternative approaches that may better suit other programmes of research and discovery. Along the way, we stress that integromics is not just a set of tools, but rather a cohesive paradigm for how modern bioinformatics can be enhanced. Successful implementation of an integromics framework is a collaborative team effort that is built with an eye to the future and greatly accelerates the processes of biomarker discovery, validation and translation into clinical practice.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Gene expression connectivity mapping has gained much popularity recently with a number of successful applications in biomedical research testifying its utility and promise. Previously methodological research in connectivity mapping mainly focused on two of the key components in the framework, namely, the reference gene expression profiles and the connectivity mapping algorithms. The other key component in this framework, the query gene signature, has been left to users to construct without much consensus on how this should be done, albeit it has been an issue most relevant to end users. As a key input to the connectivity mapping process, gene signature is crucially important in returning biologically meaningful and relevant results. This paper intends to formulate a standardized procedure for constructing high quality gene signatures from a user’s perspective.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Urothelial cancer (UC) is highly recurrent and can progress from non-invasive (NMIUC) to a more aggressive muscle-invasive (MIUC) subtype that invades the muscle tissue layer of the bladder. We present a proof of principle study that network-based features of gene pairs can be used to improve classifier performance and the functional analysis of urothelial cancer gene expression data. In the first step of our procedure each individual sample of a UC gene expression dataset is inflated by gene pair expression ratios that are defined based on a given network structure. In the second step an elastic net feature selection procedure for network-based signatures is applied to discriminate between NMIUC and MIUC samples. We performed a repeated random subsampling cross validation in three independent datasets. The network signatures were characterized by a functional enrichment analysis and studied for the enrichment of known cancer genes. We observed that the network-based gene signatures from meta collections of proteinprotein interaction (PPI) databases such as CPDB and the PPI databases HPRD and BioGrid improved the classification performance compared to single gene based signatures. The network based signatures that were derived from PPI databases showed a prominent enrichment of cancer genes (e.g., TP53, TRIM27 and HNRNPA2Bl). We provide a novel integrative approach for large-scale gene expression analysis for the identification and development of novel diagnostical targets in bladder cancer. Further, our method allowed to link cancer gene associations to network-based expression signatures that are not observed in gene-based expression signatures.