7 resultados para Distribution Selection
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Discriminating Different Classes of Biological Networks by Analyzing the Graphs Spectra Distribution
Resumo:
The brain's structural and functional systems, protein-protein interaction, and gene networks are examples of biological systems that share some features of complex networks, such as highly connected nodes, modularity, and small-world topology. Recent studies indicate that some pathologies present topological network alterations relative to norms seen in the general population. Therefore, methods to discriminate the processes that generate the different classes of networks (e. g., normal and disease) might be crucial for the diagnosis, prognosis, and treatment of the disease. It is known that several topological properties of a network (graph) can be described by the distribution of the spectrum of its adjacency matrix. Moreover, large networks generated by the same random process have the same spectrum distribution, allowing us to use it as a "fingerprint". Based on this relationship, we introduce and propose the entropy of a graph spectrum to measure the "uncertainty" of a random graph and the Kullback-Leibler and Jensen-Shannon divergences between graph spectra to compare networks. We also introduce general methods for model selection and network model parameter estimation, as well as a statistical procedure to test the nullity of divergence between two classes of complex networks. Finally, we demonstrate the usefulness of the proposed methods by applying them to (1) protein-protein interaction networks of different species and (2) on networks derived from children diagnosed with Attention Deficit Hyperactivity Disorder (ADHD) and typically developing children. We conclude that scale-free networks best describe all the protein-protein interactions. Also, we show that our proposed measures succeeded in the identification of topological changes in the network while other commonly used measures (number of edges, clustering coefficient, average path length) failed.
Resumo:
A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Most biological systems are formed by component parts that are to some degree interrelated. Groups of parts that are more associated among themselves and are relatively autonomous from others are called modules. One of the consequences of modularity is that biological systems usually present an unequal distribution of the genetic variation among traits. Estimating the covariance matrix that describes these systems is a difficult problem due to a number of factors such as poor sample sizes and measurement errors. We show that this problem will be exacerbated whenever matrix inversion is required, as in directional selection reconstruction analysis. We explore the consequences of varying degrees of modularity and signal-to-noise ratio on selection reconstruction. We then present and test the efficiency of available methods for controlling noise in matrix estimates. In our simulations, controlling matrices for noise vastly improves the reconstruction of selection gradients. We also perform an analysis of selection gradients reconstruction over a New World Monkeys skull database to illustrate the impact of noise on such analyses. Noise-controlled estimates render far more plausible interpretations that are in full agreement with previous results.
Resumo:
Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.
Resumo:
Abstract Background Invasive cervical cancer is the second most common malignant tumor affecting Brazilian women. Knowledge on Human Papillomavirus (HPV) genotypes in invasive cervical cancer cases is crucial to guide the introduction and further evaluate the impact of new preventive strategies based on HPV. We aimed to provide updated comprehensive data about the HPV types’ distribution in patients with invasive cervical cancer. Methods Fresh tumor tissue samples of histologically confirmed invasive cervical cancer were collected from 175 women attending two cancer reference hospitals from São Paulo State: ICESP and Hospital de Câncer de Barretos. HPV detection and genotyping were performed by the Linear Array HPV Genotyping Test (Roche Molecular Diagnostics, Pleasanton,USA). Results 170 out of 172 valid samples (99%) were HPV DNA positive. The most frequent types were HPV16 (77.6%), HPV18 (12.3%), HPV31 (8.8%), HPV33 (7.1%) and HPV35 (5.9%). Most infections (75%) were caused by individual HPV types. Women with adenocarcinoma were not younger than those with squamous cell carcinoma, as well, as women infected with HPV33 were older than those infected by other HPV types. Some differences between results obtained in the two hospitals were observed: higher overall prevalence of HPV16, absence of single infection by HPV31 and HPV45 was verified in HC-Barretos in comparison to ICESP patients. Conclusions To our knowledge, this is one of the largest studies made with fresh tumor tissues of invasive cervical cancer cases in Brazil. This study depicted a distinct HPV genotype distribution between two centers that may reflect the local epidemiology of HPV transmission among these populations. Due to the impact of these findings on cervical cancer preventive strategies, extension of this investigation to routine screening populations is warranted.
Resumo:
Variable rate sprinklers (VRS) have been developed to promote localized water application of irrigated areas. In Precision Irrigation, VRS permits better control of flow adjustment and, at the same time, provides satisfactory radial distribution profiles for various pressures and flow rates are really necessary. The objective of this work was to evaluate the performance and radial distribution profiles of a developed VRS which varies the nozzle cross sectional area by moving a pin in or out using a stepper motor. Field tests were performed under different conditions of service pressure, rotation angles imposed on the pin and flow rate which resulted in maximal water throw radiuses ranging from 7.30 to 10.38 m. In the experiments in which the service pressure remained constant, the maximal throw radius varied from 7.96 to 8.91 m. Averages were used of repetitions performed under conditions without wind or with winds less than 1.3 m s-1. The VRS with the four stream deflector resulted in greater water application throw radius compared to the six stream deflector. However, the six stream deflector had greater precipitation intensities, as well as better distribution. Thus, selection of the deflector to be utilized should be based on project requirements, respecting the difference in the obtained results. With a small opening of the nozzle, the VRS produced small water droplets that visually presented applicability for foliar chemigation. Regarding the comparison between the estimated and observed flow rates, the stepper motor produced excellent results.