8 resultados para Automatic Inference
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.
Resumo:
Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
Background: Arboviral diseases are major global public health threats. Yet, our understanding of infection risk factors is, with a few exceptions, considerably limited. A crucial shortcoming is the widespread use of analytical methods generally not suited for observational data - particularly null hypothesis-testing (NHT) and step-wise regression (SWR). Using Mayaro virus (MAYV) as a case study, here we compare information theory-based multimodel inference (MMI) with conventional analyses for arboviral infection risk factor assessment. Methodology/Principal Findings: A cross-sectional survey of anti-MAYV antibodies revealed 44% prevalence (n = 270 subjects) in a central Amazon rural settlement. NHT suggested that residents of village-like household clusters and those using closed toilet/latrines were at higher risk, while living in non-village-like areas, using bednets, and owning fowl, pigs or dogs were protective. The "minimum adequate" SWR model retained only residence area and bednet use. Using MMI, we identified relevant covariates, quantified their relative importance, and estimated effect-sizes (beta +/- SE) on which to base inference. Residence area (beta(Village) = 2.93 +/- 0.41; beta(Upland) = -0.56 +/- 0.33, beta(Riverbanks) = -2.37 +/- 0.55) and bednet use (beta = -0.95 +/- 0.28) were the most important factors, followed by crop-plot ownership (beta = 0.39 +/- 0.22) and regular use of a closed toilet/latrine (beta = 0.19 +/- 0.13); domestic animals had insignificant protective effects and were relatively unimportant. The SWR model ranked fifth among the 128 models in the final MMI set. Conclusions/Significance: Our analyses illustrate how MMI can enhance inference on infection risk factors when compared with NHT or SWR. MMI indicates that forest crop-plot workers are likely exposed to typical MAYV cycles maintained by diurnal, forest dwelling vectors; however, MAYV might also be circulating in nocturnal, domestic-peridomestic cycles in village-like areas. This suggests either a vector shift (synanthropic mosquitoes vectoring MAYV) or a habitat/habits shift (classical MAYV vectors adapting to densely populated landscapes and nocturnal biting); any such ecological/adaptive novelty could increase the likelihood of MAYV emergence in Amazonia.
Resumo:
There is no consensus regarding the accuracy of bioimpedance for the determination of body composition in older persons. This study aimed to compare the assessment of lean body mass of healthy older volunteers obtained by the deuterium dilution method (reference) with those obtained by two frequently used bioelectrical impedance formulas and one formula specifically developed for a Latin-American population. A cross-sectional study. Twenty one volunteers were studied, 12 women, with mean age 72 +/- 6.7 years. Urban community, Ribeiro Preto, Brazil. Fat free mass was determined, simultaneously, by the deuterium dilution method and bioelectrical impedance; results were compared. In bioelectrical impedance, body composition was calculated by the formulas of Deuremberg, Lukaski and Bolonchuck and Valencia et al. Lean body mass of the studied volunteers, as determined by bioelectrical impedance was 37.8 +/- 9.2 kg by the application of the Lukaski e Bolonchuk formula, 37.4 +/- 9.3 kg (Deuremberg) and 43.2 +/- 8.9 kg (Valencia et. al.). The results were significantly correlated to those obtained by the deuterium dilution method (41.6 +/- 9.3 Kg), with r=0.963, 0.932 and 0.971, respectively. Lean body mass obtained by the Valencia formula was the most accurate. In this study, lean body mass of older persons obtained by the bioelectrical impedance method showed good correlation with the values obtained by the deuterium dilution method. The formula of Valencia et al., developed for a Latin-American population, showed the best accuracy.
Resumo:
In this manuscript, an automatic setup for screening of microcystins in surface waters by employing photometric detection is described. Microcystins are toxins delivered by cyanobacteria within an aquatic environment, which have been considered strongly poisonous for humans. For that reason, the World Health Organization (WHO) has proposed a provisional guideline value for drinking water of 1 mu g L-1. In this work, we developed an automated equipment setup, which allows the screening of water for concentration of microcystins below 0.1 mu g V. The photometric method was based on the enzyme-linked immunosorbent assay (ELISA) and the analytical signal was monitored at 458 nm using a homemade LED-based photometer. The proposed system was employed for the detection of microcystins in rivers and lakes waters. Accuracy was assessed by processing samples using a reference method and applying the paired t-test between results. No significant difference at the 95% confidence level was observed. Other useful features including a linear response ranging from 0.05 up to 2.00 mu g L-1 (R-2 =0.999) and a detection limit of 0.03 mu g L-1 microcystins were achieved. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
This paper considers likelihood-based inference for the family of power distributions. Widely applicable results are presented which can be used to conduct inference for all three parameters of the general location-scale extension of the family. More specific results are given for the special case of the power normal model. The analysis of a large data set, formed from density measurements for a certain type of pollen, illustrates the application of the family and the results for likelihood-based inference. Throughout, comparisons are made with analogous results for the direct parametrisation of the skew-normal distribution.
Resumo:
Abstract Background Atherosclerosis causes millions of deaths, annually yielding billions in expenses round the world. Intravascular Optical Coherence Tomography (IVOCT) is a medical imaging modality, which displays high resolution images of coronary cross-section. Nonetheless, quantitative information can only be obtained with segmentation; consequently, more adequate diagnostics, therapies and interventions can be provided. Since it is a relatively new modality, many different segmentation methods, available in the literature for other modalities, could be successfully applied to IVOCT images, improving accuracies and uses. Method An automatic lumen segmentation approach, based on Wavelet Transform and Mathematical Morphology, is presented. The methodology is divided into three main parts. First, the preprocessing stage attenuates and enhances undesirable and important information, respectively. Second, in the feature extraction block, wavelet is associated with an adapted version of Otsu threshold; hence, tissue information is discriminated and binarized. Finally, binary morphological reconstruction improves the binary information and constructs the binary lumen object. Results The evaluation was carried out by segmenting 290 challenging images from human and pig coronaries, and rabbit iliac arteries; the outcomes were compared with the gold standards made by experts. The resultant accuracy was obtained: True Positive (%) = 99.29 ± 2.96, False Positive (%) = 3.69 ± 2.88, False Negative (%) = 0.71 ± 2.96, Max False Positive Distance (mm) = 0.1 ± 0.07, Max False Negative Distance (mm) = 0.06 ± 0.1. Conclusions In conclusion, by segmenting a number of IVOCT images with various features, the proposed technique showed to be robust and more accurate than published studies; in addition, the method is completely automatic, providing a new tool for IVOCT segmentation.