972 resultados para Maximum entropy statistical estimate
Resumo:
Bayesian algorithms pose a limit to the performance learning algorithms can achieve. Natural selection should guide the evolution of information processing systems towards those limits. What can we learn from this evolution and what properties do the intermediate stages have? While this question is too general to permit any answer, progress can be made by restricting the class of information processing systems under study. We present analytical and numerical results for the evolution of on-line algorithms for learning from examples for neural network classifiers, which might include or not a hidden layer. The analytical results are obtained by solving a variational problem to determine the learning algorithm that leads to maximum generalization ability. Simulations using evolutionary programming, for programs that implement learning algorithms, confirm and expand the results. The principal result is not just that the evolution is towards a Bayesian limit. Indeed it is essentially reached. In addition we find that evolution is driven by the discovery of useful structures or combinations of variables and operators. In different runs the temporal order of the discovery of such combinations is unique. The main result is that combinations that signal the surprise brought by an example arise always before combinations that serve to gauge the performance of the learning algorithm. This latter structures can be used to implement annealing schedules. The temporal ordering can be understood analytically as well by doing the functional optimization in restricted functional spaces. We also show that there is data suggesting that the appearance of these traits also follows the same temporal ordering in biological systems. © 2006 American Institute of Physics.
Resumo:
Multitype branching processes (MTBP) model branching structures, where the nodes of the resulting tree are particles of different types. Usually such a process is not observable in the sense of the whole tree, but only as the “generation” at a given moment in time, which consists of the number of particles of every type. This requires an EM-type algorithm to obtain a maximum likelihood (ML) estimate of the parameters of the branching process. Using a version of the inside-outside algorithm for stochastic context-free grammars (SCFG), such an estimate could be obtained for the offspring distribution of the process.
Resumo:
2000 Mathematics Subject Classi cation: 62F35, 62F15
Resumo:
Speciation can be understood as a continuum occurring at different levels, from population to species. The recent molecular revolution in population genetics has opened a pathway towards understanding species evolution. At the same time, speciation patterns can be better explained by incorporating a geographic context, through the use of geographic information systems (GIS). Phaedranassa (Amaryllidaceae) is a genus restricted to one of the world’s most biodiverse hotspots, the Northern Andes. I studied seven Phaedranassa species from Ecuador. Six of these species are endemic to the country. The topographic complexity of the Andes, which creates local microhabitats ranging from moist slopes to dry valleys, might explain the patterns of Phaedranassa species differentiation. With a Bayesian individual assignment approach, I assessed the genetic structure of the genus throughout Ecuador using twelve microsatellite loci. I also used bioclimatic variables and species geographic coordinates under a Maximum Entropy algorithm to generate distribution models of the species. My results show that Phaedranassa species are genetically well-differentiated. Furthermore, with the exception of two species, all Phaedranassa showed non-overlapping distributions. Phaedranassa viridiflora and P. glauciflora were the only species in which the model predicted a broad species distribution, but genetic evidence indicates that these findings are likely an artifact of species delimitation issues. Both genetic differentiation and nonoverlapping geographic distribution suggest that allopatric divergence could be the general model of genetic differentiation. Evidence of sympatric speciation was found in two geographically and genetically distinct groups of P. viridiflora. Additionally, I report the first register of natural hybridization for the genus. The findings of this research show that the genetic differentiation of species in an intricate landscape as the Andes does not necessarily show a unique trend. Although allopatric speciation is the most common form of speciation, I found evidence of sympatric speciation and hybridization. These results show that the processes of speciation in the Andes have followed several pathways. The mixture of these processes contributes to the high biodiversity of the region.
Resumo:
The genus Hemidactylus Oken, 1817 has cosmopolite distribution, with three species occurring in Brazil, two of them native, H. brasilianus and H. agrius, and one exotic, H. mabouia. Considering the studies about ecology of lizards conducted in the Ecological Station of the Seridó, from 2001 to 2011, this study aimed (1) to re-evaluate the occurrence of the species of Hemidactylus in this ESEC; (2) to analyze ecological and biological aspects of the H. agrius population; and (3) to investigate the current and potential distribution of the native species of the genus in northeastern Brazil, analyzing the suitability of ESEC to this taxon. For the first two objectives, a sampling area consisting of five transects of 200 x 20 m, was inspected in alternating daily shifts for three consecutive days, from August 2012 to August 2013. For the latter objective, occurrence points of H. agrius and H. brasilianus from literature and from the database of Herpetological Collections of the UFRN and the UNICAMP were consulted to build predictive maps via the Maximum Entropy algorithm (MaxEnt). In ESEC Seridó, 62 H. agrius individuals were collected (25 females, 18 males and 19 juveniles), and two neonates were obtained from a communal nest incubated in the laboratory. No record was made for the other two species of the genus. Hemidactylus agrius demonstrated to be a nocturnal species specialized in habitats with rocky outcrops; but this species is generalist regarding microhabitat use. In the population studied, females had an average body length greater than males, and showed higher frequencies of caudal autotomy. Regarding diet, H. agrius is a moderately generalist species that consumes arthropods, especially insect larvae, Isoptera and Araneae; and vertebrates, with a case of cannibalism registered in the population. With respect to seasonal differences, only the number of food items ingested differed between seasons. The diet was similar between sexes, but ontogenetic differences were recorded for the total volume and maximum length of the food items. Significant relationships were found between lizard body/head size measurements and the maximum length of prey consumed. Cases of polydactyly and tail bifurcation were recorded in the population, with frequencies of 1.6% and 3.1%, respectively. In relation xv to the occurrence points of the native species, 27 were identified, 14 for H. agrius and 13 for H. brasilianus. The first species presented restricted distribution, while the second showed a wide distribution. In both models generated, the ESEC Seridó area showed medium to high suitability. The results of this study confirm the absence of H. brasilianus and H. mabouia this ESEC, and reveal H. agrius as a dietary opportunist and cannibal species. Further, the results confirm the distribution patterns shown by native species of Hemidactylus, and point ESEC Seridó as an area of probable occurrence for the species of the genus, the establishing of H. brasilianus and H. mabouia are probably limited by biotic factors, a fact yet little understood
Resumo:
Marine spatial planning and ecological research call for high-resolution species distribution data. However, those data are still not available for most marine large vertebrates. The dynamic nature of oceanographic processes and the wide-ranging behavior of many marine vertebrates create further difficulties, as distribution data must incorporate both the spatial and temporal dimensions. Cetaceans play an essential role in structuring and maintaining marine ecosystems and face increasing threats from human activities. The Azores holds a high diversity of cetaceans but the information about spatial and temporal patterns of distribution for this marine megafauna group in the region is still very limited. To tackle this issue, we created monthly predictive cetacean distribution maps for spring and summer months, using data collected by the Azores Fisheries Observer Programme between 2004 and 2009. We then combined the individual predictive maps to obtain species richness maps for the same period. Our results reflect a great heterogeneity in distribution among species and within species among different months. This heterogeneity reflects a contrasting influence of oceanographic processes on the distribution of cetacean species. However, some persistent areas of increased species richness could also be identified from our results. We argue that policies aimed at effectively protecting cetaceans and their habitats must include the principle of dynamic ocean management coupled with other area-based management such as marine spatial planning.
Resumo:
Dynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.
Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
Se presenta un estudio de detección y caracterización de eventos sísmicos del tipo volcano tectónicos y largo periodo de registros sísmicos generados por el volcán Cotopaxi. La estructura secuencial de detección propuesta permite en un registro sísmico maximizar la probabilidad de presencia de un evento y minimizar la ausencia de este. La detección se la realiza en el dominio del tiempo en cuasi tiempo real manteniendo una tasa constante de falsa alarma para posteriormente realizar un estudio del contenido espectral de los eventos mediante el uso de estimadores espectrales clásicos como el periodograma y paramétricos como el método de máxima entropía de Burg, logrando así, categorizar a los eventos detectados como volcano tectónicos, largo periodo y otros cuando no poseen características pertenecientes a los otros dos tipos como son los rayos.
Resumo:
We study the problem of detecting sentences describing adverse drug reactions (ADRs) and frame the problem as binary classification. We investigate different neural network (NN) architectures for ADR classification. In particular, we propose two new neural network models, Convolutional Recurrent Neural Network (CRNN) by concatenating convolutional neural networks with recurrent neural networks, and Convolutional Neural Network with Attention (CNNA) by adding attention weights into convolutional neural networks. We evaluate various NN architectures on a Twitter dataset containing informal language and an Adverse Drug Effects (ADE) dataset constructed by sampling from MEDLINE case reports. Experimental results show that all the NN architectures outperform the traditional maximum entropy classifiers trained from n-grams with different weighting strategies considerably on both datasets. On the Twitter dataset, all the NN architectures perform similarly. But on the ADE dataset, CNN performs better than other more complex CNN variants. Nevertheless, CNNA allows the visualisation of attention weights of words when making classification decisions and hence is more appropriate for the extraction of word subsequences describing ADRs.
Resumo:
Se describe la variante homocigota c.320-2A>G de TGM1 en dos hermanas con ictiosis congénita autosómica recesiva. El clonaje de los transcritos generados por esta variante permitió identificar tres mecanismos moleculares de splicing alternativos.
Resumo:
Knowledge of the geographical distribution of timber tree species in the Amazon is still scarce. This is especially true at the local level, thereby limiting natural resource management actions. Forest inventories are key sources of information on the occurrence of such species. However, areas with approved forest management plans are mostly located near access roads and the main industrial centers. The present study aimed to assess the spatial scale effects of forest inventories used as sources of occurrence data in the interpolation of potential species distribution models. The occurrence data of a group of six forest tree species were divided into four geographical areas during the modeling process. Several sampling schemes were then tested applying the maximum entropy algorithm, using the following predictor variables: elevation, slope, exposure, normalized difference vegetation index (NDVI) and height above the nearest drainage (HAND). The results revealed that using occurrence data from only one geographical area with unique environmental characteristics increased both model overfitting to input data and omission error rates. The use of a diagonal systematic sampling scheme and lower threshold values led to improved model performance. Forest inventories may be used to predict areas with a high probability of species occurrence, provided they are located in forest management plan regions representative of the environmental range of the model projection area.
Resumo:
Non-linear methods for estimating variability in time-series are currently of widespread use. Among such methods are approximate entropy (ApEn) and sample approximate entropy (SampEn). The applicability of ApEn and SampEn in analyzing data is evident and their use is increasing. However, consistency is a point of concern in these tools, i.e., the classification of the temporal organization of a data set might indicate a relative less ordered series in relation to another when the opposite is true. As highlighted by their proponents themselves, ApEn and SampEn might present incorrect results due to this lack of consistency. In this study, we present a method which gains consistency by using ApEn repeatedly in a wide range of combinations of window lengths and matching error tolerance. The tool is called volumetric approximate entropy, vApEn. We analyze nine artificially generated prototypical time-series with different degrees of temporal order (combinations of sine waves, logistic maps with different control parameter values, random noises). While ApEn/SampEn clearly fail to consistently identify the temporal order of the sequences, vApEn correctly do. In order to validate the tool we performed shuffled and surrogate data analysis. Statistical analysis confirmed the consistency of the method. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
AbstractBackground:Aerobic fitness, assessed by measuring VO2max in maximum cardiopulmonary exercise testing (CPX) or by estimating VO2max through the use of equations in exercise testing, is a predictor of mortality. However, the error resulting from this estimate in a given individual can be high, affecting clinical decisions.Objective:To determine the error of estimate of VO2max in cycle ergometry in a population attending clinical exercise testing laboratories, and to propose sex-specific equations to minimize that error.Methods:This study assessed 1715 adults (18 to 91 years, 68% men) undertaking maximum CPX in a lower limbs cycle ergometer (LLCE) with ramp protocol. The percentage error (E%) between measured VO2max and that estimated from the modified ACSM equation (Lang et al. MSSE, 1992) was calculated. Then, estimation equations were developed: 1) for all the population tested (C-GENERAL); and 2) separately by sex (C-MEN and C-WOMEN).Results:Measured VO2max was higher in men than in WOMEN: -29.4 ± 10.5 and 24.2 ± 9.2 mL.(kg.min)-1 (p < 0.01). The equations for estimating VO2max [in mL.(kg.min)-1] were: C-GENERAL = [final workload (W)/body weight (kg)] x 10.483 + 7; C-MEN = [final workload (W)/body weight (kg)] x 10.791 + 7; and C-WOMEN = [final workload (W)/body weight (kg)] x 9.820 + 7. The E% for MEN was: -3.4 ± 13.4% (modified ACSM); 1.2 ± 13.2% (C-GENERAL); and -0.9 ± 13.4% (C-MEN) (p < 0.01). For WOMEN: -14.7 ± 17.4% (modified ACSM); -6.3 ± 16.5% (C-GENERAL); and -1.7 ± 16.2% (C-WOMEN) (p < 0.01).Conclusion:The error of estimate of VO2max by use of sex-specific equations was reduced, but not eliminated, in exercise tests on LLCE.
Resumo:
We extend PML theory to account for information on the conditional moments up to order four, but without assuming a parametric model, to avoid a risk of misspecification of the conditional distribution. The key statistical tool is the quartic exponential family, which allows us to generalize the PML2 and QGPML1 methods proposed in Gourieroux et al. (1984) to PML4 and QGPML2 methods, respectively. An asymptotic theory is developed. The key numerical tool that we use is the Gauss-Freud integration scheme that solves a computational problem that has previously been raised in several fields. Simulation exercises demonstrate the feasibility and robustness of the methods [Authors]