947 resultados para genetics, statistical genetics, variable models
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.
Resumo:
The Brazilian Atlantic Forest is one of the richest biodiversity hotspots of the world. Paleoclimatic models have predicted two large stability regions in its northern and central parts, whereas southern regions might have suffered strong instability during Pleistocene glaciations. Molecular phylogeographic and endemism studies show, nevertheless, contradictory results: although some results validate these predictions, other data suggest that paleoclimatic models fail to predict stable rainforest areas in the south. Most studies, however, have surveyed species with relatively high dispersal rates whereas taxa with lower dispersion capabilities should be better predictors of habitat stability. Here, we have used two land planarian species as model organisms to analyse the patterns and levels of nucleotide diversity on a locality within the Southern Atlantic Forest. We find that both species harbour high levels of genetic variability without exhibiting the molecular footprint of recent colonization or population expansions, suggesting a long-term stability scenario. The results reflect, therefore, that paleoclimatic models may fail to detect refugia in the Southern Atlantic Forest, and that model organisms with low dispersal capability can improve the resolution of these models.
Resumo:
This paper presents both the theoretical and the experimental approaches of the development of a mathematical model to be used in multi-variable control system designs of an active suspension for a sport utility vehicle (SUV), in this case a light pickup truck. A complete seven-degree-of-freedom model is successfully quickly identified, with very satisfactory results in simulations and in real experiments conducted with the pickup truth. The novelty of the proposed methodology is the use of commercial software in the early stages of the identification to speed up the process and to minimize the need for a large number of costly experiments. The paper also presents major contributions to the identification of uncertainties in vehicle suspension models and in the development of identification methods using the sequential quadratic programming, where an innovation regarding the calculation of the objective function is proposed and implemented. Results from simulations of and practical experiments with the real SUV are presented, analysed, and compared, showing the potential of the method.
Resumo:
The aim objective of this project was to evaluate the protein extraction of soybean flour in dairy whey, by the multivariate statistical method with 2(3) experiments. Influence of three variables were considered: temperature, pH and percentage of sodium chloride against the process specific variable ( percentage of protein extraction). It was observed that, during the protein extraction against time and temperature, the treatments at 80 degrees C for 2h presented great values of total protein (5.99%). The increasing for the percentage of protein extraction was major according to the heating time. Therefore, the maximum point from the function that represents the protein extraction was analysed by factorial experiment 2(3). By the results, it was noted that all the variables were important to extraction. After the statistical analyses, was observed that the parameters as pH, temperature, and percentage of sodium chloride, did not sufficient for the extraction process, since did not possible to obtain the inflection point from mathematical function, however, by the other hand, the mathematical model was significant, as well as, predictive.
Resumo:
Mixed models have become important in analyzing the results of experiments, particularly those that require more complicated models (e.g., those that involve longitudinal data). This article describes a method for deriving the terms in a mixed model. Our approach extends an earlier method by Brien and Bailey to explicitly identify terms for which autocorrelation and smooth trend arising from longitudinal observations need to be incorporated in the model. At the same time we retain the principle that the model used should include, at least, all the terms that are justified by the randomization. This is done by dividing the factors into sets, called tiers, based on the randomization and determining the crossing and nesting relationships between factors. The method is applied to formulate mixed models for a wide range of examples. We also describe the mixed model analysis of data from a three-phase experiment to investigate the effect of time of refinement on Eucalyptus pulp from four different sources. Cubic smoothing splines are used to describe differences in the trend over time and unstructured covariance matrices between times are found to be necessary.
Resumo:
In this paper, we present various diagnostic methods for polyhazard models. Polyhazard models are a flexible family for fitting lifetime data. Their main advantage over the single hazard models, such as the Weibull and the log-logistic models, is to include a large amount of nonmonotone hazard shapes, as bathtub and multimodal curves. Some influence methods, such as the local influence and total local influence of an individual are derived, analyzed and discussed. A discussion of the computation of the likelihood displacement as well as the normal curvature in the local influence method are presented. Finally, an example with real data is given for illustration.
Resumo:
Leaf wetness duration (LWD) models based on empirical approaches offer practical advantages over physically based models in agricultural applications, but their spatial portability is questionable because they may be biased to the climatic conditions under which they were developed. In our study, spatial portability of three LWD models with empirical characteristics - a RH threshold model, a decision tree model with wind speed correction, and a fuzzy logic model - was evaluated using weather data collected in Brazil, Canada, Costa Rica, Italy and the USA. The fuzzy logic model was more accurate than the other models in estimating LWD measured by painted leaf wetness sensors. The fraction of correct estimates for the fuzzy logic model was greater (0.87) than for the other models (0.85-0.86) across 28 sites where painted sensors were installed, and the degree of agreement k statistic between the model and painted sensors was greater for the fuzzy logic model (0.71) than that for the other models (0.64-0.66). Values of the k statistic for the fuzzy logic model were also less variable across sites than those of the other models. When model estimates were compared with measurements from unpainted leaf wetness sensors, the fuzzy logic model had less mean absolute error (2.5 h day(-1)) than other models (2.6-2.7 h day(-1)) after the model was calibrated for the unpainted sensors. The results suggest that the fuzzy logic model has greater spatial portability than the other models evaluated and merits further validation in comparison with physical models under a wider range of climate conditions. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Despite its importance to agriculture, the genetic basis of heterosis is still not well understood. The main competing hypotheses include dominance, overdominance, and epistasis. NC design III is an experimental design that. has been used for estimating the average degree of dominance of quantitative trait 106 (QTL) and also for studying heterosis. In this study, we first develop a multiple-interval mapping (MIM) model for design III that provides a platform to estimate the number, genomic positions, augmented additive and dominance effects, and epistatic interactions of QTL. The model can be used for parents with any generation of selling. We apply the method to two data sets, one for maize and one for rice. Our results show that heterosis in maize is mainly due to dominant gene action, although overdominance of individual QTL could not completely be ruled out due to the mapping resolution and limitations of NC design III. For rice, the estimated QTL dominant effects could not explain the observed heterosis. There is evidence that additive X additive epistatic effects of QTL could be the main cause for the heterosis in rice. The difference in the genetic basis of heterosis seems to be related to open or self pollination of the two species. The MIM model for NC design III is implemented in Windows QTL Cartographer, a freely distributed software.
Resumo:
Traditionally the basal ganglia have been implicated in motor behavior, as they are involved in both the execution of automatic actions and the modification of ongoing actions in novel contexts. Corresponding to cognition, the role of the basal ganglia has not been defined as explicitly. Relative to linguistic processes, contemporary theories of subcortical participation in language have endorsed a role for the globus pallidus internus (GPi) in the control of lexical-semantic operations. However, attempts to empirically validate these postulates have been largely limited to neuropsychological investigations of verbal fluency abilities subsequent to pallidotomy. We evaluated the impact of bilateral posteroventral pallidotomy (BPVP) on language function across a range of general and high-level linguistic abilities, and validated/extended working theories of pallidal participation in language. Comprehensive linguistic profiles were compiled up to 1 month before and 3 months after BPVP in 6 subjects with Parkinson's disease (PD). Commensurate linguistic profiles were also gathered over a 3-month period for a nonsurgical control cohort of 16 subjects with PD and a group of 16 non-neurologically impaired controls (NC). Nonparametric between-groups comparisons were conducted and reliable change indices calculated, relative to baseline/3-month follow-up difference scores. Group-wise statistical comparisons between the three groups failed to reveal significant postoperative changes in language performance. Case-by-case data analysis relative to clinically consequential change indices revealed reliable alterations in performance across several language variables as a consequence of BPVP. These findings lend support to models of subcortical participation in language, which promote a role for the GPi in lexical-semantic manipulation mechanisms. Concomitant improvements and decrements in postoperative performance were interpreted within the context of additive and subtractive postlesional effects. Relative to parkinsonian cohorts, clinically reliable versus statistically significant changes on a case by case basis may provide the most accurate method of characterizing the way in which pathophysiologically divergent basal ganglia linguistic circuits respond to BPVP.
Resumo:
Three main models of parameter setting have been proposed: the Variational model proposed by Yang (2002; 2004), the Structured Acquisition model endorsed by Baker (2001; 2005), and the Very Early Parameter Setting (VEPS) model advanced by Wexler (1998). The VEPS model contends that parameters are set early. The Variational model supposes that children employ statistical learning mechanisms to decide among competing parameter values, so this model anticipates delays in parameter setting when critical input is sparse, and gradual setting of parameters. On the Structured Acquisition model, delays occur because parameters form a hierarchy, with higher-level parameters set before lower-level parameters. Assuming that children freely choose the initial value, children sometimes will miss-set parameters. However when that happens, the input is expected to trigger a precipitous rise in one parameter value and a corresponding decline in the other value. We will point to the kind of child language data that is needed in order to adjudicate among these competing models.
Resumo:
Successful fertilization in free-spawning marine organisms depends on the interactions between genes expressed on the surfaces of eggs and sperm. Positive selection frequently characterizes the molecular evolution of such genes, raising the possibility that some common deterministic process drives the evolution of gamete recognition genes and may even be important for understanding the evolution of prezygotic isolation and speciation in the marine realm. One hypothesis is that gamete recognition genes are subject to selection for prezygotic isolation, namely reinforcement. In a previous study, positive selection on the gene coding for the acrosomal sperm protein M7 lysin was demonstrated among allopatric populations of mussels in the Mytilus edulis species group (M. edulis, M. galloprovincialis, and M. trossulus). Here, we expand sampling to include M7 lysin haplotypes from populations where mussel species are sympatric and hybridize to determine whether there is a pattern of reproductive character displacement, which would be consistent with reinforcement driving selection on this gene. We do not detect a strong pattern of reproductive character displacement; there are no unique haplotypes in sympatry nor is there consistently greater population structure in comparisons involving sympatric populations. One distinct group of haplotypes, however, is strongly affected by natural selection and this group of haplotypes is found within M. galloprovincialis populations throughout the Northern Hemisphere concurrent with haplotypes common to M. galloprovincialis and M. edulis. We suggest that balancing selection, perhaps resulting from sexual conflicts between sperm and eggs, maintains old allelic diversity within M. galloprovincialis.
Resumo:
Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We previously evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach to delineate breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.
Resumo:
This book provides a comprehensive and critical overview of the immunological aspects of autoimmune neurological disease. These diseases include common conditions such as multiple sclerosis, the Guillain–Barré syndrome and myasthenia gravis. The introductory chapters on antigen recognition and self–non-self discrimination, and on neuroimmunology, are followed by chapters on specific diseases. These are presented in a standardized format with sections on clinical features, genetics, neuropathology, pathophysiology, immunology and therapy. Each chapter has a concluding section which summarizes key points and suggests directions for future research. Animal models of autoimmune neurological disease are also covered in detail because of their importance in understanding the human diseases. The book is suitable for clinicians and neurologists managing patients with these diseases, and for immunologists, neuroscientists and neurologists investigating the pathogenesis and pathophysiology of these disorders.
Resumo:
This paper discusses a multi-layer feedforward (MLF) neural network incident detection model that was developed and evaluated using field data. In contrast to published neural network incident detection models which relied on simulated or limited field data for model development and testing, the model described in this paper was trained and tested on a real-world data set of 100 incidents. The model uses speed, flow and occupancy data measured at dual stations, averaged across all lanes and only from time interval t. The off-line performance of the model is reported under both incident and non-incident conditions. The incident detection performance of the model is reported based on a validation-test data set of 40 incidents that were independent of the 60 incidents used for training. The false alarm rates of the model are evaluated based on non-incident data that were collected from a freeway section which was video-taped for a period of 33 days. A comparative evaluation between the neural network model and the incident detection model in operation on Melbourne's freeways is also presented. The results of the comparative performance evaluation clearly demonstrate the substantial improvement in incident detection performance obtained by the neural network model. The paper also presents additional results that demonstrate how improvements in model performance can be achieved using variable decision thresholds. Finally, the model's fault-tolerance under conditions of corrupt or missing data is investigated and the impact of loop detector failure/malfunction on the performance of the trained model is evaluated and discussed. The results presented in this paper provide a comprehensive evaluation of the developed model and confirm that neural network models can provide fast and reliable incident detection on freeways. (C) 1997 Elsevier Science Ltd. All rights reserved.