33 resultados para MML


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El turismo es una actividad económica esencial para el desarrollo del país al igual que el de las diferentes entidades territoriales, regiones, provincias y a su vez cumple una función social. La actividad turística es responsabilidad de los diferentes niveles del Estado en sus áreas de competencia y es desarrollada por las empresas privadas y estatales. Este trabajo se realizó con el objetivo de hacer una propuesta desde lo académico al Municipio de Sesquilé (Cundinamarca), en donde se identificaron alternativas de posible potencial en los factores productivos para la generación de empleo, en relación al sector del turismo en el municipio. Para el desarrollo del mismo, se utilizó la metodología de la Matriz de Marco Lógico (MML) por medio de visitas de campo en el Municipio, en donde se identificaron los problemas que se presentan en relación al desarrollo de actividades turísticas, con los cuales se plantearon objetivos e identificaron las alternativas de potencial turístico presentadas en este trabajo.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Efficiently inducing precise causal models accurately reflecting given data sets is the ultimate goal of causal discovery. The algorithms proposed by Dai et al. has demonstrated the ability of the Minimum Message Length (MML) principle in discovering Linear Causal Models from training data. In order to further explore ways to improve efficiency, this paper incorporates the Hoeffding Bounds into the learning process. At each step of causal discovery, if a small number of data items is enough to distinguish the better model from the rest, the computation cost will be reduced by ignoring the other data items. Experiments with data set from related benchmark models indicate that the new algorithm achieves speedup over previous work in terms of learning efficiency while preserving the discovery accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Determining the causal relation among attributes in a domain is a key task in data mining and knowledge discovery. The Minimum Message Length (MML) principle has demonstrated its ability in discovering linear causal models from training data. To explore the ways to improve efficiency, this paper proposes a novel Markov Blanket identification algorithm based on the Lasso estimator. For each variable, this algorithm first generates a Lasso tree, which represents a pruned candidate set of possible feature sets. The Minimum Message Length principle is then employed to evaluate all those candidate feature sets, and the feature set with minimum message length is chosen as the Markov Blanket. Our experiment results show the ability of this algorithm. In addition, this algorithm can be used to prune the search space of causal discovery, and further reduce the computational cost of those score-based causal discovery algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A critical question in data mining is that can we always trust what discovered by a data mining system unconditionally? The answer is obviously not. If not, when can we trust the discovery then? What are the factors that affect the reliability of the discovery? How do they affect the reliability of the discovery? These are some interesting questions to be investigated. In this chapter we will firstly provide a definition and the measurements of reliability, and analyse the factors that affect the reliability. We then examine the impact of model complexity, weak links, varying sample sizes and the ability of different learners to the reliability of graphical model discovery. The experimental results reveal that (1) the larger sample size for the discovery, the higher reliability we will get; (2) the stronger a graph link is, the easier the discovery will be and thus the higher the reliability it can achieve; (3) the complexity of a graph also plays an important role in the discovery. The higher the complexity of a graph is, the more difficult to induce the graph and the lower reliability it would be. We also examined the performance difference of different discovery algorithms. This reveals the impact of discovery process. The experimental results show the superior reliability and robustness of MML method to standard significance tests in the recovery of graph links with small samples and weak links.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a Minimal Causal Model Inducer that can be used for the reliable knowledge discovery. The minimal-model semantics of causal discovery is an essential concept for the identification of a best fitting model in the sense of satisfactory consistent with the given data and be the simpler, less expressive model. Consistency is one of major measures of reliability in knowledge discovery. Therefore to develop an algorithm being able to derive a minimal model is an interesting topic in the are of reliable knowledge discovery. various causal induction algorithms and tools developed so far can not guarantee that the derived model is minimal and consistent. It was proved the MML induction approach introduced by Wallace, Keven and Honghua Dai is a minimal causal model learner. In this paper, we further prove that the developed minimal causal model learner is reliable in the sense of satisfactory consistency. The experimental results obtained from the tests on a number of both artificial and real models provided in this paper confirm this theoretical result.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Pasture degradation is one of the greatest problems related to land use in the Amazon region, forcing farmers to open new forest areas. Many studies have identified the causes and the factors involved in this degradation process, in an attempt to reverse the situation. The purpose of this study was to examine the relationship between pasture degradation and some soil properties, to try to identify the most significant soil features in the degradation process. A cattle raising farm in the eastern Amazon region, with pastures of different ages and degrees of degradation, was used as the site for this study: a primary forest area, PN; three Guinea grass (Panicum maximum Jacq.) pastures in an increasingly degraded sequence-P1, P2 and P3; one Gamba grass (Andropogon gayanus Kunth) pasture following an extremely degraded Guinea grass pasture, P4. Aboveground phytomass data showed differences between the pastures, reflecting initially observed degradation levels. Grass biomass decreased sharply from P1 to P2 and disappeared at P3. Pasture recovery with Gamba grass at P4 was very successful, with grass biomass higher than P1 and weed biomass smaller than P2 and P3. Root biomass also decreased with pasture degradation. Soil bulk density increased with pasture decrease at the topsoil layer. Results from the soil chemical analysis showed that there were no signs of decrease in organic carbon and total nitrogen after the forest was transformed into pasture. In all pastures, degraded or not, the soil pH, the sum of bases and the saturation degree were higher than in the forest soil. The extractable phosphorus content, lower in forest soil, remained quite stable in pasture soils, but it could become a limiting factor for the maintenance of Guinea grass. Results indicated that pasture degradation does not seem to be directly related to the modification of the chemical features of soils. (C) 2004 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is increasingly common use of a single computer system using different devices - personal computers, telephones cellular and others - and software platforms - systems graphical user interfaces, Web and other systems. Depending on the technologies involved, different software architectures may be employed. For example, in Web systems, it utilizes architecture client-server - usually extended in three layers. In systems with graphical interfaces, it is common architecture with the style MVC. The use of architectures with different styles hinders the interoperability of systems with multiple platforms. Another aggravating is that often the user interface in each of the devices have structure, appearance and behaviour different on each device, which leads to a low usability. Finally, the user interfaces specific to each of the devices involved, with distinct features and technologies is a job that needs to be done individually and not allow scalability. This study sought to address some of these problems by presenting a reference architecture platform-independent and that allows the user interface can be built from an abstract specification described in the language in the specification of the user interface, the MML. This solution is designed to offer greater interoperability between different platforms, greater consistency between the user interfaces and greater flexibility and scalability for the incorporation of new devices

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Incluye Bibliografía

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Paper submitted to MML 2013, 6th International Workshop on Machine Learning and Music, Prague, September 23, 2013.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cylindrospermopsin (CYN), a potent cyanobacterial hepatotoxin produced by Cylindrospermopsis raciborskii and other cyanobacteria, is regularly found in water supplies in many parts of the world and has been associated with the intoxication of humans and livestock.Water treatment via chlorination can degrade the toxin effectively but result in the production of several byproducts. In this study, male and female Balb/c mice were injected via the intraperitoneal (IP) route with a single dose of 10 mg/kg 5-chlorouracil and 10 mg/kg 5-chloro-6-hydroxymethyluracil; these two compounds are the predicted chlorinated degradation products of CYN.DNA was isolated from the mouse livers and examined for strand breakage by alkaline gel electrophoresis (pH 12). The median molecular length (MML) of the DNA distributed in the gel was determined by estimating the midpoint of the DNA size distribution by densitometry. The toxicity of 5-chlorouracil (as measured by DNA strand breakage) was significantly influenced by time from dosing. There was no significant difference in MML between mice dosed with 5-chloro-6-hydroxymethyluracil and the controls. In another experiment, mice were dosed with 0, 0.1, 1, 10 and 100 mg/kg body weight 5-chlorouracil and 0, 0.1, 1, 10 and 20 mg/kg 5-chloro-6-hydroxymethyluracil via IP injection. The heart, liver, kidney, lung and spleen were removed, fixed and examined under electron microscopy. Liver was the main target organ. The EM results revealed marked distortion on the nuclear membrane of liver cells in mice dosed with 1.0 mg/kg 5-chlorouracil or 10 mg/kg 5-chloro-6-hydroxymethyluracil, or higher.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We have recently developed a principled approach to interactive non-linear hierarchical visualization [8] based on the Generative Topographic Mapping (GTM). Hierarchical plots are needed when a single visualization plot is not sufficient (e.g. when dealing with large quantities of data). In this paper we extend our system by giving the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in [8], whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of GTMs is used. The latter is particularly useful when the plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a data set of 2300 18-dimensional points and mention extension of our system to accommodate discrete data types.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteHGTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKabanpami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteHGTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.