856 resultados para Data Driven Clustering
Resumo:
The Microarray technique is rather powerful, as it allows to test up thousands of genes at a time, but this produces an overwhelming set of data files containing huge amounts of data, which is quite difficult to pre-process, separate, classify and correlate for interesting conclusions to be extracted. Modern machine learning, data mining and clustering techniques based on information theory, are needed to read and interpret the information contents buried in those large data sets. Independent Component Analysis method can be used to correct the data affected by corruption processes or to filter the uncorrectable one and then clustering methods can group similar genes or classify samples. In this paper a hybrid approach is used to obtain a two way unsupervised clustering for a corrected microarray data.
Resumo:
Several activities in service oriented computing, such as automatic composition, monitoring, and adaptation, can benefit from knowing properties of a given service composition before executing them. Among these properties we will focus on those related to execution cost and resource usage, in a wide sense, as they can be linked to QoS characteristics. In order to attain more accuracy, we formulate execution costs / resource usage as functions on input data (or appropriate abstractions thereof) and show how these functions can be used to make better, more informed decisions when performing composition, adaptation, and proactive monitoring. We present an approach to, on one hand, synthesizing these functions in an automatic fashion from the definition of the different orchestrations taking part in a system and, on the other hand, to effectively using them to reduce the overall costs of non-trivial service-based systems featuring sensitivity to data and possibility of failure. We validate our approach by means of simulations of scenarios needing runtime selection of services and adaptation due to service failure. A number of rebinding strategies, including the use of cost functions, are compared.
Resumo:
The conformance of semantic technologies has to be systematically evaluated to measure and verify the real adherence of these technologies to the Semantic Web standards. Currente valuations of semantic technology conformance are not exhaustive enough and do not directly cover user requirements and use scenarios, which raises the need for a simple, extensible and parameterizable method to generate test data for such evaluations. To address this need, this paper presents a keyword-driven approach for generating ontology language conformance test data that can be used to evaluate semantic technologies, details the definition of a test suite for evaluating OWL DL conformance using this approach,and describes the use and extension of this test suite during the evaluation of some tools.
Resumo:
Nowadays, data mining is based on low-level specications of the employed techniques typically bounded to a specic analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Here, we propose a model-driven approach based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (via data-warehousing technology) and the analysis models for data mining (tailored to a specic platform). Thus, analysts can concentrate on the analysis problem via conceptual data-mining models instead of low-level programming tasks related to the underlying-platform technical details. These tasks are now entrusted to the model-transformations scaffolding.
Resumo:
Data mining is one of the most important analysis techniques to automatically extract knowledge from large amount of data. Nowadays, data mining is based on low-level specifications of the employed techniques typically bounded to a specific analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Bearing in mind this situation, we propose a model-driven approach which is based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (that is deployed via data-warehousing technology) and the analysis models for data mining (tailored to a specific platform). Thus, analysts can concentrate on understanding the analysis problem via conceptual data-mining models instead of wasting efforts on low-level programming tasks related to the underlying-platform technical details. These time consuming tasks are now entrusted to the model-transformations scaffolding. The feasibility of our approach is shown by means of a hypothetical data-mining scenario where a time series analysis is required.
Resumo:
"May 1980."
Resumo:
Stable social aggregations are rarely recorded in lizards, but have now been reported from several species in the Australian scincid genus Egernia. Most of those examples come from species using rock crevice refuges that are relatively easy to observe. But for many other Egernia species that occupy different habitats and are more secretive, it is hard to gather the observational data needed to deduce their social structure. Therefore, we used genotypes at six polymorphic microsatellite DNA loci of 229 individuals of Egernia frerei, trapped in 22 sampling sites over 3500 ha of eucalypt forest on Fraser Island, Australia. Each sampling site contained 15 trap locations in a 100 x 50 m grid. We estimated relatedness among pairs of individuals and found that relatedness was higher within than between sites. Relatedness of females within sites was higher than relatedness of males, and was higher than relatedness between males and females. Within sites we found that juvenile lizards were highly related to other juveniles and to adults trapped at the same location, or at adjacent locations, but relatedness decreased with increasing trap separation. We interpreted the results as suggesting high natal philopatry among juvenile lizards and adult females. This result is consistent with stable family group structure previously reported in rock dwelling Egernia species, and suggests that social behaviour in this genus is not habitat driven.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
It is generally believed that the structural reforms that were introduced in India following the macro-economic crisis of 1991 ushered in competition and forced companies to become more efficient. However, whether the post-1991 growth is an outcome of more efficient use of resources or greater use of factor inputs remains an open empirical question. In this paper, we use plant-level data from 1989–1990 and 2000–2001 to address this question. Our results indicate that while there was an increase in the productivity of factor inputs during the 1990s, most of the growth in value added is explained by growth in the use of factor inputs. We also find that median technical efficiency declined in all but one of the industries between 1989–1990 and 2000–2001, and that change in technical efficiency explains a very small proportion of the change in gross value added.
Resumo:
The purpose of this paper is to explain the notion of clustering and a concrete clustering method- agglomerative hierarchical clustering algorithm. It shows how a data mining method like clustering can be applied to the analysis of stocks, traded on the Bulgarian Stock Exchange in order to identify similar temporal behavior of the traded stocks. This problem is solved with the aid of a data mining tool that is called XLMiner™ for Microsoft Excel Office.