7 resultados para Dataset

em Repositório da Produção Científica e Intelectual da Unicamp


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Diabetic Retinopathy (DR) is a complication of diabetes that can lead to blindness if not readily discovered. Automated screening algorithms have the potential to improve identification of patients who need further medical attention. However, the identification of lesions must be accurate to be useful for clinical application. The bag-of-visual-words (BoVW) algorithm employs a maximum-margin classifier in a flexible framework that is able to detect the most common DR-related lesions such as microaneurysms, cotton-wool spots and hard exudates. BoVW allows to bypass the need for pre- and post-processing of the retinographic images, as well as the need of specific ad hoc techniques for identification of each type of lesion. An extensive evaluation of the BoVW model, using three large retinograph datasets (DR1, DR2 and Messidor) with different resolution and collected by different healthcare personnel, was performed. The results demonstrate that the BoVW classification approach can identify different lesions within an image without having to utilize different algorithms for each lesion reducing processing time and providing a more flexible diagnostic system. Our BoVW scheme is based on sparse low-level feature detection with a Speeded-Up Robust Features (SURF) local descriptor, and mid-level features based on semi-soft coding with max pooling. The best BoVW representation for retinal image classification was an area under the receiver operating characteristic curve (AUC-ROC) of 97.8% (exudates) and 93.5% (red lesions), applying a cross-dataset validation protocol. To assess the accuracy for detecting cases that require referral within one year, the sparse extraction technique associated with semi-soft coding and max pooling obtained an AUC of 94.2 ± 2.0%, outperforming current methods. Those results indicate that, for retinal image classification tasks in clinical practice, BoVW is equal and, in some instances, surpasses results obtained using dense detection (widely believed to be the best choice in many vision problems) for the low-level descriptors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Congenital muscular dystrophy with laminin α2 chain deficiency (MDC1A) is one of the most severe forms of muscular disease and is characterized by severe muscle weakness and delayed motor milestones. The genetic basis of MDC1A is well known, yet the secondary mechanisms ultimately leading to muscle degeneration and subsequent connective tissue infiltration are not fully understood. In order to obtain new insights into the molecular mechanisms underlying MDC1A, we performed a comparative proteomic analysis of affected muscles (diaphragm and gastrocnemius) from laminin α2 chain-deficient dy(3K)/dy(3K) mice, using multidimensional protein identification technology combined with tandem mass tags. Out of the approximately 700 identified proteins, 113 and 101 proteins, respectively, were differentially expressed in the diseased gastrocnemius and diaphragm muscles compared with normal muscles. A large portion of these proteins are involved in different metabolic processes, bind calcium, or are expressed in the extracellular matrix. Our findings suggest that metabolic alterations and calcium dysregulation could be novel mechanisms that underlie MDC1A and might be targets that should be explored for therapy. Also, detailed knowledge of the composition of fibrotic tissue, rich in extracellular matrix proteins, in laminin α2 chain-deficient muscle might help in the design of future anti-fibrotic treatments. All MS data have been deposited in the ProteomeXchange with identifier PXD000978 (http://proteomecentral.proteomexchange.org/dataset/PXD000978).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With a huge amount of printed documents nowadays, identifying their source is useful for criminal investigations and also to authenticate digital copies of a document. In this paper, we propose novel techniques for laser printer attribution. Our solutions do not need very high resolution scanning of the investigated document and explore the multidirectional, multiscale and low-level gradient texture patterns yielded by printing devices. The main contributions of this work are: (1) the description of printed areas using multidirectional and multiscale co-occurring texture patterns; (2) description of texture on low-level gradient areas by a convolution texture gradient filter that emphasizes textures in specific transition areas and (3) the analysis of printer patterns in segments of interest, which we call frames, instead of whole documents or only printed letters. We show by experiments in a well documented dataset that the proposed methods outperform techniques described in the literature and present near-perfect classification accuracy being very promising for deployment in real-world forensic investigations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A fosmid metagenomic library was constructed with total community DNA obtained from a municipal wastewater treatment plant (MWWTP), with the aim of identifying new FeFe-hydrogenase genes encoding the enzymes most important for hydrogen metabolism. The dataset generated by pyrosequencing of a fosmid library was mined to identify environmental gene tags (EGTs) assigned to FeFe-hydrogenase. The majority of EGTs representing FeFe-hydrogenase genes were affiliated with the class Clostridia, suggesting that this group is the main hydrogen producer in the MWWTP analyzed. Based on assembled sequences, three FeFe-hydrogenase genes were predicted based on detection of the L2 motif (MPCxxKxxE) in the encoded gene product, confirming true FeFe-hydrogenase sequences. These sequences were used to design specific primers to detect fosmids encoding FeFe-hydrogenase genes predicted from the dataset. Three identified fosmids were completely sequenced. The cloned genomic fragments within these fosmids are closely related to members of the Spirochaetaceae, Bacteroidales and Firmicutes, and their FeFe-hydrogenase sequences are characterized by the structure type M3, which is common to clostridial enzymes. FeFe-hydrogenase sequences found in this study represent hitherto undetected sequences, indicating the high genetic diversity regarding these enzymes in MWWTP. Results suggest that MWWTP have to be considered as reservoirs for new FeFe-hydrogenase genes.