936 resultados para Maximum entropy
Resumo:
We present and analyze three different online algorithms for learning in discrete Hidden Markov Models (HMMs) and compare their performance with the Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of the generalization error we draw learning curves in simplified situations and compare the results. The performance for learning drifting concepts of one of the presented algorithms is analyzed and compared with the Baldi-Chauvin algorithm in the same situations. A brief discussion about learning and symmetry breaking based on our results is also presented. © 2006 American Institute of Physics.
Resumo:
Bayesian algorithms pose a limit to the performance learning algorithms can achieve. Natural selection should guide the evolution of information processing systems towards those limits. What can we learn from this evolution and what properties do the intermediate stages have? While this question is too general to permit any answer, progress can be made by restricting the class of information processing systems under study. We present analytical and numerical results for the evolution of on-line algorithms for learning from examples for neural network classifiers, which might include or not a hidden layer. The analytical results are obtained by solving a variational problem to determine the learning algorithm that leads to maximum generalization ability. Simulations using evolutionary programming, for programs that implement learning algorithms, confirm and expand the results. The principal result is not just that the evolution is towards a Bayesian limit. Indeed it is essentially reached. In addition we find that evolution is driven by the discovery of useful structures or combinations of variables and operators. In different runs the temporal order of the discovery of such combinations is unique. The main result is that combinations that signal the surprise brought by an example arise always before combinations that serve to gauge the performance of the learning algorithm. This latter structures can be used to implement annealing schedules. The temporal ordering can be understood analytically as well by doing the functional optimization in restricted functional spaces. We also show that there is data suggesting that the appearance of these traits also follows the same temporal ordering in biological systems. © 2006 American Institute of Physics.
Resumo:
Speciation can be understood as a continuum occurring at different levels, from population to species. The recent molecular revolution in population genetics has opened a pathway towards understanding species evolution. At the same time, speciation patterns can be better explained by incorporating a geographic context, through the use of geographic information systems (GIS). Phaedranassa (Amaryllidaceae) is a genus restricted to one of the world’s most biodiverse hotspots, the Northern Andes. I studied seven Phaedranassa species from Ecuador. Six of these species are endemic to the country. The topographic complexity of the Andes, which creates local microhabitats ranging from moist slopes to dry valleys, might explain the patterns of Phaedranassa species differentiation. With a Bayesian individual assignment approach, I assessed the genetic structure of the genus throughout Ecuador using twelve microsatellite loci. I also used bioclimatic variables and species geographic coordinates under a Maximum Entropy algorithm to generate distribution models of the species. My results show that Phaedranassa species are genetically well-differentiated. Furthermore, with the exception of two species, all Phaedranassa showed non-overlapping distributions. Phaedranassa viridiflora and P. glauciflora were the only species in which the model predicted a broad species distribution, but genetic evidence indicates that these findings are likely an artifact of species delimitation issues. Both genetic differentiation and nonoverlapping geographic distribution suggest that allopatric divergence could be the general model of genetic differentiation. Evidence of sympatric speciation was found in two geographically and genetically distinct groups of P. viridiflora. Additionally, I report the first register of natural hybridization for the genus. The findings of this research show that the genetic differentiation of species in an intricate landscape as the Andes does not necessarily show a unique trend. Although allopatric speciation is the most common form of speciation, I found evidence of sympatric speciation and hybridization. These results show that the processes of speciation in the Andes have followed several pathways. The mixture of these processes contributes to the high biodiversity of the region.
Resumo:
The genus Hemidactylus Oken, 1817 has cosmopolite distribution, with three species occurring in Brazil, two of them native, H. brasilianus and H. agrius, and one exotic, H. mabouia. Considering the studies about ecology of lizards conducted in the Ecological Station of the Seridó, from 2001 to 2011, this study aimed (1) to re-evaluate the occurrence of the species of Hemidactylus in this ESEC; (2) to analyze ecological and biological aspects of the H. agrius population; and (3) to investigate the current and potential distribution of the native species of the genus in northeastern Brazil, analyzing the suitability of ESEC to this taxon. For the first two objectives, a sampling area consisting of five transects of 200 x 20 m, was inspected in alternating daily shifts for three consecutive days, from August 2012 to August 2013. For the latter objective, occurrence points of H. agrius and H. brasilianus from literature and from the database of Herpetological Collections of the UFRN and the UNICAMP were consulted to build predictive maps via the Maximum Entropy algorithm (MaxEnt). In ESEC Seridó, 62 H. agrius individuals were collected (25 females, 18 males and 19 juveniles), and two neonates were obtained from a communal nest incubated in the laboratory. No record was made for the other two species of the genus. Hemidactylus agrius demonstrated to be a nocturnal species specialized in habitats with rocky outcrops; but this species is generalist regarding microhabitat use. In the population studied, females had an average body length greater than males, and showed higher frequencies of caudal autotomy. Regarding diet, H. agrius is a moderately generalist species that consumes arthropods, especially insect larvae, Isoptera and Araneae; and vertebrates, with a case of cannibalism registered in the population. With respect to seasonal differences, only the number of food items ingested differed between seasons. The diet was similar between sexes, but ontogenetic differences were recorded for the total volume and maximum length of the food items. Significant relationships were found between lizard body/head size measurements and the maximum length of prey consumed. Cases of polydactyly and tail bifurcation were recorded in the population, with frequencies of 1.6% and 3.1%, respectively. In relation xv to the occurrence points of the native species, 27 were identified, 14 for H. agrius and 13 for H. brasilianus. The first species presented restricted distribution, while the second showed a wide distribution. In both models generated, the ESEC Seridó area showed medium to high suitability. The results of this study confirm the absence of H. brasilianus and H. mabouia this ESEC, and reveal H. agrius as a dietary opportunist and cannibal species. Further, the results confirm the distribution patterns shown by native species of Hemidactylus, and point ESEC Seridó as an area of probable occurrence for the species of the genus, the establishing of H. brasilianus and H. mabouia are probably limited by biotic factors, a fact yet little understood
Resumo:
Marine spatial planning and ecological research call for high-resolution species distribution data. However, those data are still not available for most marine large vertebrates. The dynamic nature of oceanographic processes and the wide-ranging behavior of many marine vertebrates create further difficulties, as distribution data must incorporate both the spatial and temporal dimensions. Cetaceans play an essential role in structuring and maintaining marine ecosystems and face increasing threats from human activities. The Azores holds a high diversity of cetaceans but the information about spatial and temporal patterns of distribution for this marine megafauna group in the region is still very limited. To tackle this issue, we created monthly predictive cetacean distribution maps for spring and summer months, using data collected by the Azores Fisheries Observer Programme between 2004 and 2009. We then combined the individual predictive maps to obtain species richness maps for the same period. Our results reflect a great heterogeneity in distribution among species and within species among different months. This heterogeneity reflects a contrasting influence of oceanographic processes on the distribution of cetacean species. However, some persistent areas of increased species richness could also be identified from our results. We argue that policies aimed at effectively protecting cetaceans and their habitats must include the principle of dynamic ocean management coupled with other area-based management such as marine spatial planning.
Resumo:
This work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.
The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.
Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.
Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.
The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.
Resumo:
Dynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.
Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
Se presenta un estudio de detección y caracterización de eventos sísmicos del tipo volcano tectónicos y largo periodo de registros sísmicos generados por el volcán Cotopaxi. La estructura secuencial de detección propuesta permite en un registro sísmico maximizar la probabilidad de presencia de un evento y minimizar la ausencia de este. La detección se la realiza en el dominio del tiempo en cuasi tiempo real manteniendo una tasa constante de falsa alarma para posteriormente realizar un estudio del contenido espectral de los eventos mediante el uso de estimadores espectrales clásicos como el periodograma y paramétricos como el método de máxima entropía de Burg, logrando así, categorizar a los eventos detectados como volcano tectónicos, largo periodo y otros cuando no poseen características pertenecientes a los otros dos tipos como son los rayos.
Resumo:
We study the problem of detecting sentences describing adverse drug reactions (ADRs) and frame the problem as binary classification. We investigate different neural network (NN) architectures for ADR classification. In particular, we propose two new neural network models, Convolutional Recurrent Neural Network (CRNN) by concatenating convolutional neural networks with recurrent neural networks, and Convolutional Neural Network with Attention (CNNA) by adding attention weights into convolutional neural networks. We evaluate various NN architectures on a Twitter dataset containing informal language and an Adverse Drug Effects (ADE) dataset constructed by sampling from MEDLINE case reports. Experimental results show that all the NN architectures outperform the traditional maximum entropy classifiers trained from n-grams with different weighting strategies considerably on both datasets. On the Twitter dataset, all the NN architectures perform similarly. But on the ADE dataset, CNN performs better than other more complex CNN variants. Nevertheless, CNNA allows the visualisation of attention weights of words when making classification decisions and hence is more appropriate for the extraction of word subsequences describing ADRs.
Resumo:
Se describe la variante homocigota c.320-2A>G de TGM1 en dos hermanas con ictiosis congénita autosómica recesiva. El clonaje de los transcritos generados por esta variante permitió identificar tres mecanismos moleculares de splicing alternativos.
Resumo:
Knowledge of the geographical distribution of timber tree species in the Amazon is still scarce. This is especially true at the local level, thereby limiting natural resource management actions. Forest inventories are key sources of information on the occurrence of such species. However, areas with approved forest management plans are mostly located near access roads and the main industrial centers. The present study aimed to assess the spatial scale effects of forest inventories used as sources of occurrence data in the interpolation of potential species distribution models. The occurrence data of a group of six forest tree species were divided into four geographical areas during the modeling process. Several sampling schemes were then tested applying the maximum entropy algorithm, using the following predictor variables: elevation, slope, exposure, normalized difference vegetation index (NDVI) and height above the nearest drainage (HAND). The results revealed that using occurrence data from only one geographical area with unique environmental characteristics increased both model overfitting to input data and omission error rates. The use of a diagonal systematic sampling scheme and lower threshold values led to improved model performance. Forest inventories may be used to predict areas with a high probability of species occurrence, provided they are located in forest management plan regions representative of the environmental range of the model projection area.
Resumo:
Molecular dynamics simulations have been performed on monatomic sorbates confined within zeolite NaY to obtain the dependence of entropy and self-diffusivity on the sorbate diameter. Previously, molecular dynamics simulations by Santikary and Yashonath J. Phys. Chem. 98, 6368 (1994)], theoretical analysis by Derouane J. Catal. 110, 58 (1988)] as well as experiments by Kemball Adv. Catal. 2, 233 (1950)] found that certain sorbates in certain adsorbents exhibit unusually high self-diffusivity. Experiments showed that the loss of entropy for certain sorbates in specific adsorbents was minimum. Kemball suggested that such sorbates will have high self-diffusivity in these adsorbents. Entropy of the adsorbed phase has been evaluated from the trajectory information by two alternative methods: two-phase and multiparticle expansion. The results show that anomalous maximum in entropy is also seen as a function of the sorbate diameter. Further, the experimental observation of Kemball that minimum loss of entropy is associated with maximum in self-diffusivity is found to be true for the system studied here. A suitably scaled dimensionless self-diffusivity shows an exponential dependence on the excess entropy of the adsorbed phase, analogous to excess entropy scaling rules seen in many bulk and confined fluids. The two trajectory-based estimators for the entropy show good semiquantitative agreement and provide some interesting microscopic insights into entropy changes associated with confinement.
Resumo:
Using generalized bosons, we construct the fuzzy sphere S-F(2) and monopoles on S-F(2) in a reducible representation of SU(2). The corresponding quantum states are naturally obtained using the GNS-construction. We show that there is an emergent nonabelian unitary gauge symmetry which is in the commutant of the algebra of observables. The quantum states are necessarily mixed and have non-vanishing von Neumann entropy, which increases monotonically under a bistochastic Markov map. The maximum value of the entropy has a simple relation to the degeneracy of the irreps that constitute the reducible representation that underlies the fuzzy sphere.
Resumo:
Minimization problems with respect to a one-parameter family of generalized relative entropies are studied. These relative entropies, which we term relative alpha-entropies (denoted I-alpha), arise as redundancies under mismatched compression when cumulants of compressed lengths are considered instead of expected compressed lengths. These parametric relative entropies are a generalization of the usual relative entropy (Kullback-Leibler divergence). Just like relative entropy, these relative alpha-entropies behave like squared Euclidean distance and satisfy the Pythagorean property. Minimizers of these relative alpha-entropies on closed and convex sets are shown to exist. Such minimizations generalize the maximum Renyi or Tsallis entropy principle. The minimizing probability distribution (termed forward I-alpha-projection) for a linear family is shown to obey a power-law. Other results in connection with statistical inference, namely subspace transitivity and iterated projections, are also established. In a companion paper, a related minimization problem of interest in robust statistics that leads to a reverse I-alpha-projection is studied.