891 resultados para Hierarchical clustering model


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Twitter System is the biggest social network in the world, and everyday millions of tweets are posted and talked about, expressing various views and opinions. A large variety of research activities have been conducted to study how the opinions can be clustered and analyzed, so that some tendencies can be uncovered. Due to the inherent weaknesses of the tweets - very short texts and very informal styles of writing - it is rather hard to make an investigation of tweet data analysis giving results with good performance and accuracy. In this paper, we intend to attack the problem from another aspect - using a two-layer structure to analyze the twitter data: LDA with topic map modelling. The experimental results demonstrate that this approach shows a progress in twitter data analysis. However, more experiments with this method are expected in order to ensure that the accurate analytic results can be maintained.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mathematical models are increasingly used in environmental science thus increasing the importance of uncertainty and sensitivity analyses. In the present study, an iterative parameter estimation and identifiability analysis methodology is applied to an atmospheric model – the Operational Street Pollution Model (OSPMr). To assess the predictive validity of the model, the data is split into an estimation and a prediction data set using two data splitting approaches and data preparation techniques (clustering and outlier detection) are analysed. The sensitivity analysis, being part of the identifiability analysis, showed that some model parameters were significantly more sensitive than others. The application of the determined optimal parameter values was shown to succesfully equilibrate the model biases among the individual streets and species. It was as well shown that the frequentist approach applied for the uncertainty calculations underestimated the parameter uncertainties. The model parameter uncertainty was qualitatively assessed to be significant, and reduction strategies were identified.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Le processus de planification forestière hiérarchique présentement en place sur les terres publiques risque d’échouer à deux niveaux. Au niveau supérieur, le processus en place ne fournit pas une preuve suffisante de la durabilité du niveau de récolte actuel. À un niveau inférieur, le processus en place n’appuie pas la réalisation du plein potentiel de création de valeur de la ressource forestière, contraignant parfois inutilement la planification à court terme de la récolte. Ces échecs sont attribuables à certaines hypothèses implicites au modèle d’optimisation de la possibilité forestière, ce qui pourrait expliquer pourquoi ce problème n’est pas bien documenté dans la littérature. Nous utilisons la théorie de l’agence pour modéliser le processus de planification forestière hiérarchique sur les terres publiques. Nous développons un cadre de simulation itératif en deux étapes pour estimer l’effet à long terme de l’interaction entre l’État et le consommateur de fibre, nous permettant ainsi d’établir certaines conditions pouvant mener à des ruptures de stock. Nous proposons ensuite une formulation améliorée du modèle d’optimisation de la possibilité forestière. La formulation classique du modèle d’optimisation de la possibilité forestière (c.-à-d., maximisation du rendement soutenu en fibre) ne considère pas que le consommateur de fibre industriel souhaite maximiser son profit, mais suppose plutôt la consommation totale de l’offre de fibre à chaque période, peu importe le potentiel de création de valeur de celle-ci. Nous étendons la formulation classique du modèle d’optimisation de la possibilité forestière afin de permettre l’anticipation du comportement du consommateur de fibre, augmentant ainsi la probabilité que l’offre de fibre soit entièrement consommée, rétablissant ainsi la validité de l’hypothèse de consommation totale de l’offre de fibre implicite au modèle d’optimisation. Nous modélisons la relation principal-agent entre le gouvernement et l’industrie à l’aide d’une formulation biniveau du modèle optimisation, où le niveau supérieur représente le processus de détermination de la possibilité forestière (responsabilité du gouvernement), et le niveau inférieur représente le processus de consommation de la fibre (responsabilité de l’industrie). Nous montrons que la formulation biniveau peux atténuer le risque de ruptures de stock, améliorant ainsi la crédibilité du processus de planification forestière hiérarchique. Ensemble, le modèle biniveau d’optimisation de la possibilité forestière et la méthodologie que nous avons développée pour résoudre celui-ci à l’optimalité, représentent une alternative aux méthodes actuellement utilisées. Notre modèle biniveau et le cadre de simulation itérative représentent un pas vers l’avant en matière de technologie de planification forestière axée sur la création de valeur. L’intégration explicite d’objectifs et de contraintes industrielles au processus de planification forestière, dès la détermination de la possibilité forestière, devrait favoriser une collaboration accrue entre les instances gouvernementales et industrielles, permettant ainsi d’exploiter le plein potentiel de création de valeur de la ressource forestière.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Post inhibitory rebound is a nonlinear phenomenon present in a variety of nerve cells. Following a period of hyper-polarization this effect allows a neuron to fire a spike or packet of spikes before returning to rest. It is an important mechanism underlying central pattern generation for heartbeat, swimming and other motor patterns in many neuronal systems. In this paper we consider how networks of neurons, which do not intrinsically oscillate, may make use of inhibitory synaptic connections to generate large scale coherent rhythms in the form of cluster states. We distinguish between two cases i) where the rebound mechanism is due to anode break excitation and ii) where rebound is due to a slow T-type calcium current. In the former case we use a geometric analysis of a McKean type model to obtain expressions for the number of clusters in terms of the speed and strength of synaptic coupling. Results are found to be in good qualitative agreement with numerical simulations of the more detailed Hodgkin-Huxley model. In the second case we consider a particular firing rate model of a neuron with a slow calcium current that admits to an exact analysis. Once again existence regions for cluster states are explicitly calculated. Both mechanisms are shown to prefer globally synchronous states for slow synapses as long as the strength of coupling is sufficiently large. With a decrease in the duration of synaptic inhibition both systems are found to break into clusters. A major difference between the two mechanisms for cluster generation is that anode break excitation can support clusters with several groups, whilst slow T-type calcium currents predominantly give rise to clusters of just two (anti-synchronous) populations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Growing models have been widely used for clustering or topology learning. Traditionally these models work on stationary environments, grow incrementally and adapt their nodes to a given distribution based on global parameters. In this paper, we present an enhanced unsupervised self-organising network for the modelling of visual objects. We first develop a framework for building non-rigid shapes using the growth mechanism of the self-organising maps, and then we define an optimal number of nodes without overfitting or underfitting the network based on the knowledge obtained from information-theoretic considerations. We present experimental results for hands and we quantitatively evaluate the matching capabilities of the proposed method with the topographic product.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Assessing the fit of a model is an important final step in any statistical analysis, but this is not straightforward when complex discrete response models are used. Cross validation and posterior predictions have been suggested as methods to aid model criticism. In this paper a comparison is made between four methods of model predictive assessment in the context of a three level logistic regression model for clinical mastitis in dairy cattle; cross validation, a prediction using the full posterior predictive distribution and two “mixed” predictive methods that incorporate higher level random effects simulated from the underlying model distribution. Cross validation is considered a gold standard method but is computationally intensive and thus a comparison is made between posterior predictive assessments and cross validation. The analyses revealed that mixed prediction methods produced results close to cross validation whilst the full posterior predictive assessment gave predictions that were over-optimistic (closer to the observed disease rates) compared with cross validation. A mixed prediction method that simulated random effects from both higher levels was best at identifying the outlying level two (farm-year) units of interest. It is concluded that this mixed prediction method, simulating random effects from both higher levels, is straightforward and may be of value in model criticism of multilevel logistic regression, a technique commonly used for animal health data with a hierarchical structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mestrado em Engenharia Florestal e dos Recursos Naturais - Instituto Superior de Agronomia - UL

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Free-riding behaviors exist in tourism and they should be analyzed from a comprehensive perspective; while the literature has mainly focused on free riders operating in a destination, the destinations themselves might also free ride when they are under the umbrella of a collective brand. The objective of this article is to detect potential free-riding destinations by estimating the contribution of the different individual destinations to their collective brands, from the point of view of consumer perception. We argue that these individual contributions can be better understood by reflecting the various stages that tourists follow to reach their final decision. A hierarchical choice process is proposed in which the following choices are nested (not independent): “whether to buy,” “what collective brand to buy,” and “what individual brand to buy.” A Mixed Logit model confirms this sequence, which permits estimation of individual contributions and detection of free riders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Overrecentdecades,remotesensinghasemergedasaneffectivetoolforimprov- ing agriculture productivity. In particular, many works have dealt with the problem of identifying characteristics or phenomena of crops and orchards on different scales using remote sensed images. Since the natural processes are scale dependent and most of them are hierarchically structured, the determination of optimal study scales is mandatory in understanding these processes and their interactions. The concept of multi-scale/multi- resolution inherent to OBIA methodologies allows the scale problem to be dealt with. But for that multi-scale and hierarchical segmentation algorithms are required. The question that remains unsolved is to determine the suitable scale segmentation that allows different objects and phenomena to be characterized in a single image. In this work, an adaptation of the Simple Linear Iterative Clustering (SLIC) algorithm to perform a multi-scale hierarchi- cal segmentation of satellite images is proposed. The selection of the optimal multi-scale segmentation for different regions of the image is carried out by evaluating the intra- variability and inter-heterogeneity of the regions obtained on each scale with respect to the parent-regions defined by the coarsest scale. To achieve this goal, an objective function, that combines weighted variance and the global Moran index, has been used. Two different kinds of experiment have been carried out, generating the number of regions on each scale through linear and dyadic approaches. This methodology has allowed, on the one hand, the detection of objects on different scales and, on the other hand, to represent them all in a sin- gle image. Altogether, the procedure provides the user with a better comprehension of the land cover, the objects on it and the phenomena occurring.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Planning, navigation, and search are fundamental human cognitive abilities central to spatial problem solving in search and rescue, law enforcement, and military operations. Despite a wealth of literature concerning naturalistic spatial problem solving in animals, literature on naturalistic spatial problem solving in humans is comparatively lacking and generally conducted by separate camps among which there is little crosstalk. Addressing this deficiency will allow us to predict spatial decision making in operational environments, and understand the factors leading to those decisions. The present dissertation is comprised of two related efforts, (1) a set of empirical research studies intended to identify characteristics of planning, execution, and memory in naturalistic spatial problem solving tasks, and (2) a computational modeling effort to develop a model of naturalistic spatial problem solving. The results of the behavioral studies indicate that problem space hierarchical representations are linear in shape, and that human solutions are produced according to multiple optimization criteria. The Mixed Criteria Model presented in this dissertation accounts for global and local human performance in a traditional and naturalistic Traveling Salesman Problem. The results of the empirical and modeling efforts hold implications for basic and applied science in domains such as problem solving, operations research, human-computer interaction, and artificial intelligence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we present a mathematical formulation of the interaction between microorganisms such as bacteria or amoebae and chemicals, often produced by the organisms themselves. This interaction is called chemotaxis and leads to cellular aggregation. We derive some models to describe chemotaxis. The first is the pioneristic Keller-Segel parabolic-parabolic model and it is derived by two different frameworks: a macroscopic perspective and a microscopic perspective, in which we start with a stochastic differential equation and we perform a mean-field approximation. This parabolic model may be generalized by the introduction of a degenerate diffusion parameter, which depends on the density itself via a power law. Then we derive a model for chemotaxis based on Cattaneo's law of heat propagation with finite speed, which is a hyperbolic model. The last model proposed here is a hydrodynamic model, which takes into account the inertia of the system by a friction force. In the limit of strong friction, the model reduces to the parabolic model, whereas in the limit of weak friction, we recover a hyperbolic model. Finally, we analyze the instability condition, which is the condition that leads to aggregation, and we describe the different kinds of aggregates we may obtain: the parabolic models lead to clusters or peaks whereas the hyperbolic models lead to the formation of network patterns or filaments. Moreover, we discuss the analogy between bacterial colonies and self gravitating systems by comparing the chemotactic collapse and the gravitational collapse (Jeans instability).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introducción: El Cáncer es prevenible en algunos casos, si se evita la exposición a sustancias cancerígenas en el medio ambiente. En Colombia, Cundinamarca es uno de los departamentos con mayores incrementos en la tasa de mortalidad y en el municipio de Sibaté, habitantes han manifestado preocupación por el incremento de la enfermedad. En el campo de la salud ambiental mundial, la georreferenciación aplicada al estudio de fenómenos en salud, ha tenido éxito con resultados válidos. El estudio propuso usar herramientas de información geográfica, para generar análisis de tiempo y espacio que hicieran visible el comportamiento del cáncer en Sibaté y sustentaran hipótesis de influencias ambientales sobre concentraciones de casos. Objetivo: Obtener incidencia y prevalencia de casos de cáncer en habitantes de Sibaté y georreferenciar los casos en un periodo de 5 años, con base en indagación de registros. Metodología: Estudio exploratorio descriptivo de corte transversal,sobre todos los diagnósticos de cáncer entre los años 2010 a 2014, encontrados en los archivos de la Secretaria de Salud municipal. Se incluyeron unicamente quienes tuvieron residencia permanente en el municipio y fueron diagnosticados con cáncer entre los años de 2010 a 2104. Sobre cada caso se obtuvo género, edad, estrato socioeconómico, nivel académico, ocupación y estado civil. Para el análisis de tiempo se usó la fecha de diagnóstico y para el análisis de espacio, la dirección de residencia, tipo de cáncer y coordenada geográfica. Se generaron coordenadas geográficas con un equipo GPS Garmin y se crearon mapas con los puntos de la ubicación de las viviendas de los pacientes. Se proceso la información, con Epi Info 7 Resultados: Se encontraron 107 casos de cáncer registrados en la Secretaria de Salud de Sibaté, 66 mujeres, 41 hombres. Sin división de género, el 30.93% de la población presento cáncer del sistema reproductor, el 18,56% digestivo y el 17,53% tegumentario. Se presentaron 2 grandes casos de agrupaciones espaciales en el territorio estudiado, una en el Barrio Pablo Neruda con 12 (21,05%) casos y en el casco Urbano de Sibaté con 38 (66,67%) casos. Conclusión: Se corroboro que el análisis geográfico con variables espacio temporales y de exposición, puede ser la herramienta para generar hipótesis sobre asociaciones de casos de cáncer con factores ambientales.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Our goal in this paper is to extend previous results obtained for Newtonian and secondgrade fluids to third-grade fluids in the case of an axisymmetric, straight, rigid and impermeable tube with constant cross-section using a one-dimensional hierarchical model based on the Cosserat theory related to fluid dynamics. In this way we can reduce the full threedimensional system of equations for the axisymmetric unsteady motion of a non-Newtonian incompressible third-grade fluid to a system of equations depending on time and on a single spatial variable. Some numerical simulations for the volume flow rate and the the wall shear stress are presented.