966 resultados para Maximum entropy methods


Relevância:

90.00% 90.00%

Publicador:

Resumo:

RESUMEN El apoyo a la selección de especies a la restauración de la vegetación en España en los últimos 40 años se ha basado fundamentalmente en modelos de distribución de especies, también llamados modelos de nicho ecológico, que estiman la probabilidad de presencia de las especies en función de las condiciones del medio físico (clima, suelo, etc.). Con esta tesis se ha intentado contribuir a la mejora de la capacidad predictiva de los modelos introduciendo algunas propuestas metodológicas adaptadas a los datos disponibles actualmente en España y enfocadas al uso de los modelos en la selección de especies. No siempre se dispone de datos a una resolución espacial adecuada para la escala de los proyectos de restauración de la vegetación. Sin embrago es habitual contar con datos de baja resolución espacial para casi todas las especies vegetales presentes en España. Se propone un método de recalibración que actualiza un modelo de regresión logística de baja resolución espacial con una nueva muestra de alta resolución espacial. El método permite obtener predicciones de calidad aceptable con muestras relativamente pequeñas (25 presencias de la especie) frente a las muestras mucho mayores (más de 100 presencias) que requería una estrategia de modelización convencional que no usara el modelo previo. La selección del método estadístico puede influir decisivamente en la capacidad predictiva de los modelos y por esa razón la comparación de métodos ha recibido mucha atención en la última década. Los estudios previos consideraban a la regresión logística como un método inferior a técnicas más modernas como las de máxima entropía. Los resultados de la tesis demuestran que esa diferencia observada se debe a que los modelos de máxima entropía incluyen técnicas de regularización y la versión de la regresión logística usada en las comparaciones no. Una vez incorporada la regularización a la regresión logística usando penalización, las diferencias en cuanto a capacidad predictiva desaparecen. La regresión logística penalizada es, por tanto, una alternativa más para el ajuste de modelos de distribución de especies y está a la altura de los métodos modernos con mejor capacidad predictiva como los de máxima entropía. A menudo, los modelos de distribución de especies no incluyen variables relativas al suelo debido a que no es habitual que se disponga de mediciones directas de sus propiedades físicas o químicas. La incorporación de datos de baja resolución espacial proveniente de mapas de suelo nacionales o continentales podría ser una alternativa. Los resultados de esta tesis sugieren que los modelos de distribución de especies de alta resolución espacial mejoran de forma ligera pero estadísticamente significativa su capacidad predictiva cuando se incorporan variables relativas al suelo procedente de mapas de baja resolución espacial. La validación es una de las etapas fundamentales del desarrollo de cualquier modelo empírico como los modelos de distribución de especies. Lo habitual es validar los modelos evaluando su capacidad predictiva especie a especie, es decir, comparando en un conjunto de localidades la presencia o ausencia observada de la especie con las predicciones del modelo. Este tipo de evaluación no responde a una cuestión clave en la restauración de la vegetación ¿cuales son las n especies más idóneas para el lugar a restaurar? Se ha propuesto un método de evaluación de modelos adaptado a esta cuestión que consiste en estimar la capacidad de un conjunto de modelos para discriminar entre las especies presentes y ausentes de un lugar concreto. El método se ha aplicado con éxito a la validación de 188 modelos de distribución de especies leñosas orientados a la selección de especies para la restauración de la vegetación en España. Las mejoras metodológicas propuestas permiten mejorar la capacidad predictiva de los modelos de distribución de especies aplicados a la selección de especies en la restauración de la vegetación y también permiten ampliar el número de especies para las que se puede contar con un modelo que apoye la toma de decisiones. SUMMARY During the last 40 years, decision support tools for plant species selection in ecological restoration in Spain have been based on species distribution models (also called ecological niche models), that estimate the probability of occurrence of the species as a function of environmental predictors (e.g., climate, soil). In this Thesis some methodological improvements are proposed to contribute to a better predictive performance of such models, given the current data available in Spain and focusing in the application of the models to selection of species for ecological restoration. Fine grained species distribution data are required to train models to be used at the scale of the ecological restoration projects, but this kind of data are not always available for every species. On the other hand, coarse grained data are available for almost every species in Spain. A recalibration method is proposed that updates a coarse grained logistic regression model using a new fine grained updating sample. The method allows obtaining acceptable predictive performance with reasonably small updating sample (25 occurrences of the species), in contrast with the much larger samples (more than 100 occurrences) required for a conventional modeling approach that discards the coarse grained data. The choice of the statistical method may have a dramatic effect on model performance, therefore comparisons of methods have received much interest in the last decade. Previous studies have shown a poorer performance of the logistic regression compared to novel methods like maximum entropy models. The results of this Thesis show that the observed difference is caused by the fact that maximum entropy models include regularization techniques and the versions of logistic regression compared do not. Once regularization has been added to the logistic regression using a penalization procedure, the differences in model performance disappear. Therefore, penalized logistic regression may be considered one of the best performing methods to model species distributions. Usually, species distribution models do not consider soil related predictors because direct measurements of the chemical or physical properties are often lacking. The inclusion of coarse grained soil data from national or continental soil maps could be a reasonable alternative. The results of this Thesis suggest that the performance of the models slightly increase after including soil predictors form coarse grained soil maps. Model validation is a key stage of the development of empirical models, such as species distribution models. The usual way of validating is based on the evaluation of model performance for each species separately, i.e., comparing observed species presences or absence to predicted probabilities in a set of sites. This kind of evaluation is not informative for a common question in ecological restoration projects: which n species are the most suitable for the environment of the site to be restored? A method has been proposed to address this question that estimates the ability of a set of models to discriminate among present and absent species in a evaluation site. The method has been successfully applied to the validation of 188 species distribution models used to support decisions on species selection for ecological restoration in Spain. The proposed methodological approaches improve the predictive performance of the predictive models applied to species selection in ecological restoration and increase the number of species for which a model that supports decisions can be fitted.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Species distribution models (SDM) predict species occurrence based on statistical relationships with environmental conditions. The R-package biomod2 which includes 10 different SDM techniques and 10 different evaluation methods was used in this study. Macroalgae are the main biomass producers in Potter Cove, King George Island (Isla 25 de Mayo), Antarctica, and they are sensitive to climate change factors such as suspended particulate matter (SPM). Macroalgae presence and absence data were used to test SDMs suitability and, simultaneously, to assess the environmental response of macroalgae as well as to model four scenarios of distribution shifts by varying SPM conditions due to climate change. According to the averaged evaluation scores of Relative Operating Characteristics (ROC) and True scale statistics (TSS) by models, those methods based on a multitude of decision trees such as Random Forest and Classification Tree Analysis, reached the highest predictive power followed by generalized boosted models (GBM) and maximum-entropy approaches (Maxent). The final ensemble model used 135 of 200 calculated models (TSS > 0.7) and identified hard substrate and SPM as the most influencing parameters followed by distance to glacier, total organic carbon (TOC), bathymetry and slope. The climate change scenarios show an invasive reaction of the macroalgae in case of less SPM and a retreat of the macroalgae in case of higher assumed SPM values.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The texture segmentation techniques are diversified by the existence of several approaches. In this paper, we propose fuzzy features for the segmentation of texture image. For this purpose, a membership function is constructed to represent the effect of the neighboring pixels on the current pixel in a window. Using these membership function values, we find a feature by weighted average method for the current pixel. This is repeated for all pixels in the window treating each time one pixel as the current pixel. Using these fuzzy based features, we derive three descriptors such as maximum, entropy, and energy for each window. To segment the texture image, the modified mountain clustering that is unsupervised and fuzzy c-means clustering have been used. The performance of the proposed features is compared with that of fractal features.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A formalism for modelling the dynamics of Genetic Algorithms (GAs) using methods from statistical mechanics, originally due to Prugel-Bennett and Shapiro, is reviewed, generalized and improved upon. This formalism can be used to predict the averaged trajectory of macroscopic statistics describing the GA's population. These macroscopics are chosen to average well between runs, so that fluctuations from mean behaviour can often be neglected. Where necessary, non-trivial terms are determined by assuming maximum entropy with constraints on known macroscopics. Problems of realistic size are described in compact form and finite population effects are included, often proving to be of fundamental importance. The macroscopics used here are cumulants of an appropriate quantity within the population and the mean correlation (Hamming distance) within the population. Including the correlation as an explicit macroscopic provides a significant improvement over the original formulation. The formalism is applied to a number of simple optimization problems in order to determine its predictive power and to gain insight into GA dynamics. Problems which are most amenable to analysis come from the class where alleles within the genotype contribute additively to the phenotype. This class can be treated with some generality, including problems with inhomogeneous contributions from each site, non-linear or noisy fitness measures, simple diploid representations and temporally varying fitness. The results can also be applied to a simple learning problem, generalization in a binary perceptron, and a limit is identified for which the optimal training batch size can be determined for this problem. The theory is compared to averaged results from a real GA in each case, showing excellent agreement if the maximum entropy principle holds. Some situations where this approximation brakes down are identified. In order to fully test the formalism, an attempt is made on the strong sc np-hard problem of storing random patterns in a binary perceptron. Here, the relationship between the genotype and phenotype (training error) is strongly non-linear. Mutation is modelled under the assumption that perceptron configurations are typical of perceptrons with a given training error. Unfortunately, this assumption does not provide a good approximation in general. It is conjectured that perceptron configurations would have to be constrained by other statistics in order to accurately model mutation for this problem. Issues arising from this study are discussed in conclusion and some possible areas of further research are outlined.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work explores the use of statistical methods in describing and estimating camera poses, as well as the information feedback loop between camera pose and object detection. Surging development in robotics and computer vision has pushed the need for algorithms that infer, understand, and utilize information about the position and orientation of the sensor platforms when observing and/or interacting with their environment.

The first contribution of this thesis is the development of a set of statistical tools for representing and estimating the uncertainty in object poses. A distribution for representing the joint uncertainty over multiple object positions and orientations is described, called the mirrored normal-Bingham distribution. This distribution generalizes both the normal distribution in Euclidean space, and the Bingham distribution on the unit hypersphere. It is shown to inherit many of the convenient properties of these special cases: it is the maximum-entropy distribution with fixed second moment, and there is a generalized Laplace approximation whose result is the mirrored normal-Bingham distribution. This distribution and approximation method are demonstrated by deriving the analytical approximation to the wrapped-normal distribution. Further, it is shown how these tools can be used to represent the uncertainty in the result of a bundle adjustment problem.

Another application of these methods is illustrated as part of a novel camera pose estimation algorithm based on object detections. The autocalibration task is formulated as a bundle adjustment problem using prior distributions over the 3D points to enforce the objects' structure and their relationship with the scene geometry. This framework is very flexible and enables the use of off-the-shelf computational tools to solve specialized autocalibration problems. Its performance is evaluated using a pedestrian detector to provide head and foot location observations, and it proves much faster and potentially more accurate than existing methods.

Finally, the information feedback loop between object detection and camera pose estimation is closed by utilizing camera pose information to improve object detection in scenarios with significant perspective warping. Methods are presented that allow the inverse perspective mapping traditionally applied to images to be applied instead to features computed from those images. For the special case of HOG-like features, which are used by many modern object detection systems, these methods are shown to provide substantial performance benefits over unadapted detectors while achieving real-time frame rates, orders of magnitude faster than comparable image warping methods.

The statistical tools and algorithms presented here are especially promising for mobile cameras, providing the ability to autocalibrate and adapt to the camera pose in real time. In addition, these methods have wide-ranging potential applications in diverse areas of computer vision, robotics, and imaging.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.

Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The use of human brain electroencephalography (EEG) signals for automatic person identi cation has been investigated for a decade. It has been found that the performance of an EEG-based person identication system highly depends on what feature to be extracted from multi-channel EEG signals. Linear methods such as Power Spectral Density and Autoregressive Model have been used to extract EEG features. However these methods assumed that EEG signals are stationary. In fact, EEG signals are complex, non-linear, non-stationary, and random in nature. In addition, other factors such as brain condition or human characteristics may have impacts on the performance, however these factors have not been investigated and evaluated in previous studies. It has been found in the literature that entropy is used to measure the randomness of non-linear time series data. Entropy is also used to measure the level of chaos of braincomputer interface systems. Therefore, this thesis proposes to study the role of entropy in non-linear analysis of EEG signals to discover new features for EEG-based person identi- cation. Five dierent entropy methods including Shannon Entropy, Approximate Entropy, Sample Entropy, Spectral Entropy, and Conditional Entropy have been proposed to extract entropy features that are used to evaluate the performance of EEG-based person identication systems and the impacts of epilepsy, alcohol, age and gender characteristics on these systems. Experiments were performed on the Australian EEG and Alcoholism datasets. Experimental results have shown that, in most cases, the proposed entropy features yield very fast person identication, yet with compatible accuracy because the feature dimension is low. In real life security operation, timely response is critical. The experimental results have also shown that epilepsy, alcohol, age and gender characteristics have impacts on the EEG-based person identication systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a method of estimating HIV incidence rates in epidemic situations from data on age-specific prevalence and changes in the overall prevalence over time. The method is applied to women attending antenatal clinics in Hlabisa, a rural district of KwaZulu/Natal, South Africa, where transmission of HIV is overwhelmingly through heterosexual contact. A model which gives age-specific prevalence rates in the presence of a progressing epidemic is fitted to prevalence data for 1998 using maximum likelihood methods and used to derive the age-specific incidence. Error estimates are obtained using a Monte Carlo procedure. Although the method is quite general some simplifying assumptions are made concerning the form of the risk function and sensitivity analyses are performed to explore the importance of these assumptions. The analysis shows that in 1998 the annual incidence of infection per susceptible woman increased from 5.4 per cent (3.3-8.5 per cent; here and elsewhere ranges give 95 per cent confidence limits) at age 15 years to 24.5 per cent (20.6-29.1 per cent) at age 22 years and declined to 1.3 per cent (0.5-2.9 per cent) at age 50 years; standardized to a uniform age distribution, the overall incidence per susceptible woman aged 15 to 59 was 11.4 per cent (10.0-13.1 per cent); per women in the population it was 8.4 per cent (7.3-9.5 per cent). Standardized to the age distribution of the female population the average incidence per woman was 9.6 per cent (8.4-11.0 per cent); standardized to the age distribution of women attending antenatal clinics, it was 11.3 per cent (9.8-13.3 per cent). The estimated incidence depends on the values used for the epidemic growth rate and the AIDS related mortality. To ensure that, for this population, errors in these two parameters change the age specific estimates of the annual incidence by less than the standard deviation of the estimates of the age specific incidence, the AIDS related mortality should be known to within +/-50 per cent and the epidemic growth rate to within +/-25 per cent, both of which conditions are met. In the absence of cohort studies to measure the incidence of HIV infection directly, useful estimates of the age-specific incidence can be obtained from cross-sectional, age-specific prevalence data and repeat cross-sectional data on the overall prevalence of HIV infection. Several assumptions were made because of the lack of data but sensitivity analyses show that they are unlikely to affect the overall estimates significantly. These estimates are important in assessing the magnitude of the public health problem, for designing vaccine trials and for evaluating the impact of interventions. Copyright (C) 2001 John Wiley & Sons, Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Rabies virus (RABV) isolates from two species of canids and three species of bats were analyzed by comparing the C-terminal region of the G gene and the G-L intergenic region of the virus genome. Intercluster identities for the genetic sequences of the isolates showed both regions to be poorly conserved. Phylogenetic trees were generated by the neighbor-joining and maximum parsimony methods, and the results were found to agree between the two methods for both regions. Putative amino acid sequences obtained from the G gene were also analyzed, and genetic markers were identified. Our results suggest that different genetic lineages of RABV are adapted to different animal species in Brazil.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Multi-environment trials (METs) used to evaluate breeding lines vary in the number of years that they sample. We used a cropping systems model to simulate the target population of environments (TPE) for 6 locations over 108 years for 54 'near-isolines' of sorghum in north-eastern Australia. For a single reference genotype, each of 547 trials was clustered into 1 of 3 'drought environment types' (DETs) based on a seasonal water stress index. Within sequential METs of 2 years duration, the frequencies of these drought patterns often differed substantially from those derived for the entire TPE. This was reflected in variation in the mean yield of the reference genotype. For the TPE and for 2-year METs, restricted maximum likelihood methods were used to estimate components of genotypic and genotype by environment variance. These also varied substantially, although not in direct correlation with frequency of occurrence of different DETs over a 2-year period. Combined analysis over different numbers of seasons demonstrated the expected improvement in the correlation between MET estimates of genotype performance and the overall genotype averages as the number of seasons in the MET was increased.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

25th Annual Conference of the European Cetacean Society, Cadiz, Spain 21-23 March 2011.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

26th Annual Conference of the European Cetacean Society, Galway, Ireland 26-28 March 2012.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação de Mestrado, Estudos Integrados dos Oceanos, 25 de Março de 2013, Universidade dos Açores.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Copyright © 2014 Entomological Society of America.