945 resultados para Bioinformatics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a technique that supports process participants in making risk-informed decisions, with the aim to reduce the process risks. Risk reduction involves decreasing the likelihood and severity of a process fault from occurring. Given a process exposed to risks, e.g. a financial process exposed to a risk of reputation loss, we enact this process and whenever a process participant needs to provide input to the process, e.g. by selecting the next task to execute or by filling out a form, we prompt the participant with the expected risk that a given fault will occur given the particular input. These risks are predicted by traversing decision trees generated from the logs of past process executions and considering process data, involved resources, task durations and contextual information like task frequencies. The approach has been implemented in the YAWL system and its effectiveness evaluated. The results show that the process instances executed in the tests complete with substantially fewer faults and with lower fault severities, when taking into account the recommendations provided by our technique.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Accumulated biological research outcomes show that biological functions do not depend on individual genes, but on complex gene networks. Microarray data are widely used to cluster genes according to their expression levels across experimental conditions. However, functionally related genes generally do not show coherent expression across all conditions since any given cellular process is active only under a subset of conditions. Biclustering finds gene clusters that have similar expression levels across a subset of conditions. This paper proposes a seed-based algorithm that identifies coherent genes in an exhaustive, but efficient manner. Methods In order to find the biclusters in a gene expression dataset, we exhaustively select combinations of genes and conditions as seeds to create candidate bicluster tables. The tables have two columns: (a) a gene set, and (b) the conditions on which the gene set have dissimilar expression levels to the seed. First, the genes with less than the maximum number of dissimilar conditions are identified and a table of these genes is created. Second, the rows that have the same dissimilar conditions are grouped together. Third, the table is sorted in ascending order based on the number of dissimilar conditions. Finally, beginning with the first row of the table, a test is run repeatedly to determine whether the cardinality of the gene set in the row is greater than the minimum threshold number of genes in a bicluster. If so, a bicluster is outputted and the corresponding row is removed from the table. Repeating this process, all biclusters in the table are systematically identified until the table becomes empty. Conclusions This paper presents a novel biclustering algorithm for the identification of additive biclusters. Since it involves exhaustively testing combinations of genes and conditions, the additive biclusters can be found more readily.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This item provides supplementary materials for the paper mentioned in the title, specifically a range of organisms used in the study. The full abstract for the main paper is as follows: Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, the collected reads must be consistent with the assumed species or species group, and not corrupted in some way. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans,with some strains exhibiting antibiotic resistance. In this paper, we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from alternative pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. Methodology/Principal Findings A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. Conclusions It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpr​ed_page.php.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: Unravelling the genetic architecture of complex traits requires large amounts of data, sophisticated models and large computational resources. The lack of user-friendly software incorporating all these requisites is delaying progress in the analysis of complex traits. Methods: Linkage disequilibrium and linkage analysis (LDLA) is a high-resolution gene mapping approach based on sophisticated mixed linear models, applicable to any population structure. LDLA can use population history information in addition to pedigree and molecular markers to decompose traits into genetic components. Analyses are distributed in parallel over a large public grid of computers in the UK. Results: We have proven the performance of LDLA with analyses of simulated data. There are real gains in statistical power to detect quantitative trait loci when using historical information compared with traditional linkage analysis. Moreover, the use of a grid of computers significantly increases computational speed, hence allowing analyses that would have been prohibitive on a single computer. © The Author 2009. Published by Oxford University Press. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Currently there are little objective parameters that can quantify the success of one form of prostate surgical removal over another. Accordingly, at Old Dominion University (ODU) we have been developing a process resulting in the use of software algorithms to assess the coverage and depth of extra-capsular soft tissue removed with the prostate by the various surgical approaches. Parameters such as the percent of capsule that is bare of soft tissue and where present the depth and extent of coverage have been assessed. First, visualization methods and tools are developed for images of prostate slices that are provided to ODU by the Pathology Department at Eastern Virginia Medical School (EVMS). The visualization tools interpolate and present 3D models of the prostates. Measurement algorithms are then applied to determine statistics about extra-capsular tissue coverage. This paper addresses the modeling, visualization, and analysis of prostate gland tissue to aid in quantifying prostate surgery success. Particular attention is directed towards the accuracy of these measurements and is addressed in the analysis discussions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The sheep (Ovis aries) is commonly used as a large animal model in skeletal research. Although the sheep genome has been sequenced there are still only a limited number of annotated mRNA sequences in public databases. A complementary DNA (cDNA) library was constructed to provide a generic resource for further exploration of genes that are actively expressed in bone cells in sheep. It was anticipated that the cDNA library would provide molecular tools for further research into the process of fracture repair and bone homeostasis, and add to the existing body of knowledge. One of the hallmarks of cDNA libraries has been the identification of novel genes and in this library the full open reading frame of the gene C12orf29 was cloned and characterised. This gene codes for a protein of unknown function with a molecular weight of 37 kDa. A literature search showed that no previous studies had been conducted into the biological role of C12orf29, except for some bioinformatics studies that suggested a possible link with cancer. Phylogenetic analyses revealed that C12orf29 had an ancient pedigree with a homologous gene found in some bacterial taxa. This implied that the gene was present in the last common eukaryotic ancestor, thought to have existed more than 2 billion years ago. This notion was further supported by the fact that the gene is found in taxa belonging to the two major eukaryotic branches, bikonts and unikonts. In the bikont supergroup a C12orf29-like gene was found in the single celled protist Naegleria gruberi, whereas in the unikont supergroup, encompassing the metazoa, the gene is universal to all chordate and, therefore, vertebrate species. It appears to have been lost to the majority of cnidaria and protostomes taxa; however, C12orf29-like genes have been found in the cnidarian freshwater hydra and the protostome Pacific oyster. The experimental data indicate that C12orf29 has a structural role in skeletal development and tissue homeostasis, whereas in silico analysis of the human C12orf29 promoter region suggests that its expression is potentially under the control of the NOTCH, WNT and TGF- developmental pathways, as well SOX9 and BAPX1; pathways that are all heavily involved in skeletogenesis. Taken together, this investigation provides strong evidence that C12orf29 has a very important role in the chordate body plan, in early skeletal development, cartilage homeostasis, and also a possible link with spina bifida in humans.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Originally developed in bioinformatics, sequence analysis is being increasingly used in social sciences for the study of life-course processes. The methodology generally employed consists in computing dissimilarities between the trajectories and, if typologies are sought, in clustering the trajectories according to their similarities or dissemblances. The choice of an appropriate dissimilarity measure is a major issue when dealing with sequence analysis for life sequences. Several dissimilarities are available in the literature, but neither of them succeeds to become indisputable. In this paper, instead of deciding upon one dissimilarity measure, we propose to use an optimal convex combination of different dissimilarities. The optimality is automatically determined by the clustering procedure and is defined with respect to the within-class variance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Prior to the completion of the human genome project, the human genome was thought to have a greater number of genes as it seemed structurally and functionally more complex than other simpler organisms. This along with the belief of “one gene, one protein”, were demonstrated to be incorrect. The inequality in the ratio of gene to protein formation gave rise to the theory of alternative splicing (AS). AS is a mechanism by which one gene gives rise to multiple protein products. Numerous databases and online bioinformatic tools are available for the detection and analysis of AS. Bioinformatics provides an important approach to study mRNA and protein diversity by various tools such as expressed sequence tag (EST) sequences obtained from completely processed mRNA. Microarrays and deep sequencing approaches also aid in the detection of splicing events. Initially it was postulated that AS occurred only in about 5%; of all genes but was later found to be more abundant. Using bioinformatic approaches, the level of AS in human genes was found to be fairly high with 35-59%; of genes having at least one AS form. Our ability to determine and predict AS is important as disorders in splicing patterns may lead to abnormal splice variants resulting in genetic diseases. In addition, the diversity of proteins produced by AS poses a challenge for successful drug discovery and therefore a greater understanding of AS would be beneficial.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Focal segmental glomerulosclerosis (FSGS) is the consequence of a disease process that attacks the kidney's filtering system, causing serious scarring. More than half of FSGS patients develop chronic kidney failure within 10 years, ultimately requiring dialysis or renal transplantation. There are currently several genes known to cause the hereditary forms of FSGS (ACTN4, TRPC6, CD2AP, INF2, MYO1E and NPHS2). This study involves a large, unique, multigenerational Australian pedigree in which FSGS co-segregates with progressive heart block with apparent X-linked recessive inheritance. Through a classical combined approach of linkage and haplotype analysis, we identified a 21.19 cM interval implicated on the X chromosome. We then used a whole exome sequencing approach to identify two mutated genes, NXF5 and ALG13, which are located within this linkage interval. The two mutations NXF5-R113W and ALG13-T141L segregated perfectly with the disease phenotype in the pedigree and were not found in a large healthy control cohort. Analysis using bioinformatics tools predicted the R113W mutation in the NXF5 gene to be deleterious and cellular studies support a role in the stability and localization of the protein suggesting a causative role of this mutation in these co-morbid disorders. Further studies are now required to determine the functional consequence of these novel mutations to development of FSGS and heart block in this pedigree and to determine whether these mutations have implications for more common forms of these diseases in the general population.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Illumina's Infinium SNP BeadChips are extensively used in both small and large-scale genetic studies. A fundamental step in any analysis is the processing of raw allele A and allele B intensities from each SNP into genotype calls (AA, AB, BB). Various algorithms which make use of different statistical models are available for this task. We compare four methods (GenCall, Illuminus, GenoSNP and CRLMM) on data where the true genotypes are known in advance and data from a recently published genome-wide association study. Results In general, differences in accuracy are relatively small between the methods evaluated, although CRLMM and GenoSNP were found to consistently outperform GenCall. The performance of Illuminus is heavily dependent on sample size, with lower no call rates and improved accuracy as the number of samples available increases. For X chromosome SNPs, methods with sex-dependent models (Illuminus, CRLMM) perform better than methods which ignore gender information (GenCall, GenoSNP). We observe that CRLMM and GenoSNP are more accurate at calling SNPs with low minor allele frequency than GenCall or Illuminus. The sample quality metrics from each of the four methods were found to have a high level of agreement at flagging samples with unusual signal characteristics. Conclusions CRLMM, GenoSNP and GenCall can be applied with confidence in studies of any size, as their performance was shown to be invariant to the number of samples available. Illuminus on the other hand requires a larger number of samples to achieve comparable levels of accuracy and its use in smaller studies (50 or fewer individuals) is not recommended.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A novel method was developed for studying the genetic relatedness of Pseudomonas aeruginosa isolates from clinical and environmental sources. This bacterium is ubiquitous in the natural environment and is an important pathogen known to infect Cystic Fibrosis (CF) patients. The transmission route of strains has not yet been defined; current theories include acquisition from an environmental source or through patient-to-patient spread. A highly discriminatory, bioinformatics based, DNA typing method was developed to investigate the relatedness of clinical and environmental isolates. This study found a similarity between the environmental and several CF clonal strains and also highlighted occurrence of environmental P. aeruginosa strains in CF infections.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Effective management of chronic diseases is a global health priority. A healthcare information system offers opportunities to address challenges of chronic disease management. However, the requirements of health information systems are often not well understood. The accuracy of requirements has a direct impact on the successful design and implementation of a health information system. Our research describes methods used to understand the requirements of health information systems for advanced prostate cancer management. The research conducted a survey to identify heterogeneous sources of clinical records. Our research showed that the General Practitioner was the common source of patient's clinical records (41%) followed by the Urologist (14%) and other clinicians (14%). Our research describes a method to identify diverse data sources and proposes a novel patient journey browser prototype that integrates disparate data sources.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Several lines of evidence suggests that transcription factors are involved in the pathogenesis of Multiple Sclerosis (MS) but a complete mapping the whole network has been elusive. One of the reasons is that there are several clinical subtypes of MS and transcription factors which may be involved in one subtype may not be in others. We investigated the possibility that this network could be mapped using microarray technologies and modern bioinformatics methods on a dataset from whole blood in 99 untreated MS patients (36 Relapse Remitting MS, 43 Primary Progressive MS, and 20 Secondary Progressive MS) and 45 age-matched healthy controls, Methodology/Principal Findings We have used two different analytical methodologies: a differential expression analysis and a differential co-expression analysis, which have converged on a significant number of regulatory motifs that seem to be statistically overrepresented in genes which are either differentially expressed (or differentially co-expressed) in cases and controls (e.g. V$KROX_Q6, p-value < 3.31E-6; V$CREBP1_Q2, p-value < 9.93E-6, V$YY1_02, p-value < 1.65E-5). Conclusions/significance: Our analysis uncovered a network of transcription factors that potentially dysregulate several genes in MS or one or more of its disease subtypes. Analysing the published literature we have found that these transcription factors are involved in the early T-lymphocyte specification and commitment as well as in oligodendrocytes dedifferentiation and development. The most significant transcription factors motifs were for the Early Growth response EGR/KROX family, ATF2, YY1 (Yin and Yang 1), E2F-1/DP-1 and E2F-4/DP-2 heterodimers, SOX5, and CREB and ATF families.