910 resultados para Bayesian maximum entropy
Resumo:
Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.
Resumo:
The genus Hemidactylus Oken, 1817 has cosmopolite distribution, with three species occurring in Brazil, two of them native, H. brasilianus and H. agrius, and one exotic, H. mabouia. Considering the studies about ecology of lizards conducted in the Ecological Station of the Seridó, from 2001 to 2011, this study aimed (1) to re-evaluate the occurrence of the species of Hemidactylus in this ESEC; (2) to analyze ecological and biological aspects of the H. agrius population; and (3) to investigate the current and potential distribution of the native species of the genus in northeastern Brazil, analyzing the suitability of ESEC to this taxon. For the first two objectives, a sampling area consisting of five transects of 200 x 20 m, was inspected in alternating daily shifts for three consecutive days, from August 2012 to August 2013. For the latter objective, occurrence points of H. agrius and H. brasilianus from literature and from the database of Herpetological Collections of the UFRN and the UNICAMP were consulted to build predictive maps via the Maximum Entropy algorithm (MaxEnt). In ESEC Seridó, 62 H. agrius individuals were collected (25 females, 18 males and 19 juveniles), and two neonates were obtained from a communal nest incubated in the laboratory. No record was made for the other two species of the genus. Hemidactylus agrius demonstrated to be a nocturnal species specialized in habitats with rocky outcrops; but this species is generalist regarding microhabitat use. In the population studied, females had an average body length greater than males, and showed higher frequencies of caudal autotomy. Regarding diet, H. agrius is a moderately generalist species that consumes arthropods, especially insect larvae, Isoptera and Araneae; and vertebrates, with a case of cannibalism registered in the population. With respect to seasonal differences, only the number of food items ingested differed between seasons. The diet was similar between sexes, but ontogenetic differences were recorded for the total volume and maximum length of the food items. Significant relationships were found between lizard body/head size measurements and the maximum length of prey consumed. Cases of polydactyly and tail bifurcation were recorded in the population, with frequencies of 1.6% and 3.1%, respectively. In relation xv to the occurrence points of the native species, 27 were identified, 14 for H. agrius and 13 for H. brasilianus. The first species presented restricted distribution, while the second showed a wide distribution. In both models generated, the ESEC Seridó area showed medium to high suitability. The results of this study confirm the absence of H. brasilianus and H. mabouia this ESEC, and reveal H. agrius as a dietary opportunist and cannibal species. Further, the results confirm the distribution patterns shown by native species of Hemidactylus, and point ESEC Seridó as an area of probable occurrence for the species of the genus, the establishing of H. brasilianus and H. mabouia are probably limited by biotic factors, a fact yet little understood
Resumo:
Marine spatial planning and ecological research call for high-resolution species distribution data. However, those data are still not available for most marine large vertebrates. The dynamic nature of oceanographic processes and the wide-ranging behavior of many marine vertebrates create further difficulties, as distribution data must incorporate both the spatial and temporal dimensions. Cetaceans play an essential role in structuring and maintaining marine ecosystems and face increasing threats from human activities. The Azores holds a high diversity of cetaceans but the information about spatial and temporal patterns of distribution for this marine megafauna group in the region is still very limited. To tackle this issue, we created monthly predictive cetacean distribution maps for spring and summer months, using data collected by the Azores Fisheries Observer Programme between 2004 and 2009. We then combined the individual predictive maps to obtain species richness maps for the same period. Our results reflect a great heterogeneity in distribution among species and within species among different months. This heterogeneity reflects a contrasting influence of oceanographic processes on the distribution of cetacean species. However, some persistent areas of increased species richness could also be identified from our results. We argue that policies aimed at effectively protecting cetaceans and their habitats must include the principle of dynamic ocean management coupled with other area-based management such as marine spatial planning.
Resumo:
This work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.
The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.
Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.
Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.
The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.
Resumo:
Dynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.
Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
Se presenta un estudio de detección y caracterización de eventos sísmicos del tipo volcano tectónicos y largo periodo de registros sísmicos generados por el volcán Cotopaxi. La estructura secuencial de detección propuesta permite en un registro sísmico maximizar la probabilidad de presencia de un evento y minimizar la ausencia de este. La detección se la realiza en el dominio del tiempo en cuasi tiempo real manteniendo una tasa constante de falsa alarma para posteriormente realizar un estudio del contenido espectral de los eventos mediante el uso de estimadores espectrales clásicos como el periodograma y paramétricos como el método de máxima entropía de Burg, logrando así, categorizar a los eventos detectados como volcano tectónicos, largo periodo y otros cuando no poseen características pertenecientes a los otros dos tipos como son los rayos.
Resumo:
We study the problem of detecting sentences describing adverse drug reactions (ADRs) and frame the problem as binary classification. We investigate different neural network (NN) architectures for ADR classification. In particular, we propose two new neural network models, Convolutional Recurrent Neural Network (CRNN) by concatenating convolutional neural networks with recurrent neural networks, and Convolutional Neural Network with Attention (CNNA) by adding attention weights into convolutional neural networks. We evaluate various NN architectures on a Twitter dataset containing informal language and an Adverse Drug Effects (ADE) dataset constructed by sampling from MEDLINE case reports. Experimental results show that all the NN architectures outperform the traditional maximum entropy classifiers trained from n-grams with different weighting strategies considerably on both datasets. On the Twitter dataset, all the NN architectures perform similarly. But on the ADE dataset, CNN performs better than other more complex CNN variants. Nevertheless, CNNA allows the visualisation of attention weights of words when making classification decisions and hence is more appropriate for the extraction of word subsequences describing ADRs.
Resumo:
Se describe la variante homocigota c.320-2A>G de TGM1 en dos hermanas con ictiosis congénita autosómica recesiva. El clonaje de los transcritos generados por esta variante permitió identificar tres mecanismos moleculares de splicing alternativos.
Resumo:
Knowledge of the geographical distribution of timber tree species in the Amazon is still scarce. This is especially true at the local level, thereby limiting natural resource management actions. Forest inventories are key sources of information on the occurrence of such species. However, areas with approved forest management plans are mostly located near access roads and the main industrial centers. The present study aimed to assess the spatial scale effects of forest inventories used as sources of occurrence data in the interpolation of potential species distribution models. The occurrence data of a group of six forest tree species were divided into four geographical areas during the modeling process. Several sampling schemes were then tested applying the maximum entropy algorithm, using the following predictor variables: elevation, slope, exposure, normalized difference vegetation index (NDVI) and height above the nearest drainage (HAND). The results revealed that using occurrence data from only one geographical area with unique environmental characteristics increased both model overfitting to input data and omission error rates. The use of a diagonal systematic sampling scheme and lower threshold values led to improved model performance. Forest inventories may be used to predict areas with a high probability of species occurrence, provided they are located in forest management plan regions representative of the environmental range of the model projection area.
Resumo:
The development and tests of an iterative reconstruction algorithm for emission tomography based on Bayesian statistical concepts are described. The algorithm uses the entropy of the generated image as a prior distribution, can be accelerated by the choice of an exponent, and converges uniformly to feasible images by the choice of one adjustable parameter. A feasible image has been defined as one that is consistent with the initial data (i.e. it is an image that, if truly a source of radiation in a patient, could have generated the initial data by the Poisson process that governs radioactive disintegration). The fundamental ideas of Bayesian reconstruction are discussed, along with the use of an entropy prior with an adjustable contrast parameter, the use of likelihood with data increment parameters as conditional probability, and the development of the new fast maximum a posteriori with entropy (FMAPE) Algorithm by the successive substitution method. It is shown that in the maximum likelihood estimator (MLE) and FMAPE algorithms, the only correct choice of initial image for the iterative procedure in the absence of a priori knowledge about the image configuration is a uniform field.
Resumo:
In recent decades, an increased interest has been evidenced in the research on multi-scale hierarchical modelling in the field of mechanics, and also in the field of wood products and timber engineering. One of the main motivations for hierar-chical modelling is to understand how properties, composition and structure at lower scale levels may influence and be used to predict the material properties on a macroscopic and structural engineering scale. This chapter presents the applicability of statistic and probabilistic methods, such as the Maximum Likelihood method and Bayesian methods, in the representation of timber’s mechanical properties and its inference accounting to prior information obtained in different importance scales. These methods allow to analyse distinct timber’s reference properties, such as density, bending stiffness and strength, and hierarchically consider information obtained through different non, semi or destructive tests. The basis and fundaments of the methods are described and also recommendations and limitations are discussed. The methods may be used in several contexts, however require an expert’s knowledge to assess the correct statistic fitting and define the correlation arrangement between properties.
Resumo:
In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to 'explain' an observed set of alleles. The other procedure is probabilistic using Bayes' theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.
Resumo:
In this paper we present a Bayesian image reconstruction algorithm with entropy prior (FMAPE) that uses a space-variant hyperparameter. The spatial variation of the hyperparameter allows different degrees of resolution in areas of different statistical characteristics, thus avoiding the large residuals resulting from algorithms that use a constant hyperparameter. In the first implementation of the algorithm, we begin by segmenting a Maximum Likelihood Estimator (MLE) reconstruction. The segmentation method is based on using a wavelet decomposition and a self-organizing neural network. The result is a predetermined number of extended regions plus a small region for each star or bright object. To assign a different value of the hyperparameter to each extended region and star, we use either feasibility tests or cross-validation methods. Once the set of hyperparameters is obtained, we carried out the final Bayesian reconstruction, leading to a reconstruction with decreased bias and excellent visual characteristics. The method has been applied to data from the non-refurbished Hubble Space Telescope. The method can be also applied to ground-based images.