16 resultados para correlation-based feature selection

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An investigation was conducted to evaluate the impact of experimental designs and spatial analyses (single-trial models) of the response to selection for grain yield in the northern grains region of Australia (Queensland and northern New South Wales). Two sets of multi-environment experiments were considered. One set, based on 33 trials conducted from 1994 to 1996, was used to represent the testing system of the wheat breeding program and is referred to as the multi-environment trial (MET). The second set, based on 47 trials conducted from 1986 to 1993, sampled a more diverse set of years and management regimes and was used to represent the target population of environments (TPE). There were 18 genotypes in common between the MET and TPE sets of trials. From indirect selection theory, the phenotypic correlation coefficient between the MET and TPE single-trial adjusted genotype means [r(p(MT))] was used to determine the effect of the single-trial model on the expected indirect response to selection for grain yield in the TPE based on selection in the MET. Five single-trial models were considered: randomised complete block (RCB), incomplete block (IB), spatial analysis (SS), spatial analysis with a measurement error (SSM) and a combination of spatial analysis and experimental design information to identify the preferred (PF) model. Bootstrap-resampling methodology was used to construct multiple MET data sets, ranging in size from 2 to 20 environments per MET sample. The size and environmental composition of the MET and the single-trial model influenced the r(p(MT)). On average, the PF model resulted in a higher r(p(MT)) than the IB, SS and SSM models, which were in turn superior to the RCB model for MET sizes based on fewer than ten environments. For METs based on ten or more environments, the r(p(MT)) was similar for all single-trial models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ecological processes are central to the formation of new species when barriers to gene flow (reproductive isolation) evolve between populations as a result of ecologically-based divergent selection. Although laboratory and field studies provide evidence that 'ecological speciation' can occur, our understanding of the details of the process is incomplete. Here we review ecological speciation by considering its constituent components: an ecological source of divergent selection, a form of reproductive isolation, and a genetic mechanism linking the two. Sources of divergent selection include differences in environment or niche, certain forms of sexual selection, and the ecological interaction of populations. We explore the evidence for the contribution of each to ecological speciation. Forms of reproductive isolation are diverse and we discuss the likelihood that each may be involved in ecological speciation. Divergent selection on genes affecting ecological traits can be transmitted directly (via pleiotropy) or indirectly (via linkage disequilibrium) to genes causing reproductive isolation and we explore the consequences of both. Along with these components, we also discuss the geography and the genetic basis of ecological speciation. Throughout, we provide examples from nature, critically evaluate their quality, and highlight areas where more work is required.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, private health insurance rates have declined in many countries. In places requiring community rating in their health insurance premiums, a major cause is age-based adverse selection. However, even in countries without community rating, a de facto type of partial community rating tends to occur. In this note, a modified version of Pauly et al.'s guaranteed renewability model, which addresses the problem of age-based adverse selection (Pauly et al., 1995) is presented. Their model is extended from three to 35 periods. Also, probabilities are allowed to increase by age for low-risk types using actual age-based probabilities. This extension of their work shows that private health insurance contracts available stray far from optimal contracts that deal with age-based adverse selection. This suggests that government actions to affect private insurance options are warranted.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last few decades, private health insurance rates have declined in many countries. In countries and states with community rating, a major cause is adverse selection. In order to address age-based adverse selection, Australia has recently begun a novel approach which imposes stiff penalties for buying private insurance later in life, when expected costs are higher. In this paper, we analyze Australiarsquos Lifetime Cover in the context of a modified version of the Rothschild-Stiglitz insurance model (Rothschild and Stiglitz, 1976). We allow empirically-based probabilities to increase by age for low-risk types. The model highlights the shortcomings of the Australian plan. Based on empirically-based probabilities of illness, we predict that Lifetime Cover will not arrest adverse selection. The model has many policy implications for government regulation encouraging long-term health coverage.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although the aim of conservation planning is the persistence of biodiversity, current methods trade-off ecological realism at a species level in favour of including multiple species and landscape features. For conservation planning to be relevant, the impact of landscape configuration on population processes and the viability of species needs to be considered. We present a novel method for selecting reserve systems that maximize persistence across multiple species, subject to a conservation budget. We use a spatially explicit metapopulation model to estimate extinction risk, a function of the ecology of the species and the amount, quality and configuration of habitat. We compare our new method with more traditional, area-based reserve selection methods, using a ten-species case study, and find that the expected loss of species is reduced 20-fold. Unlike previous methods, we avoid designating arbitrary weightings between reserve size and configuration; rather, our method is based on population processes and is grounded in ecological theory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we explore the use of text-mining methods for the identification of the author of a text. We apply the support vector machine (SVM) to this problem, as it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of a text. We performed a number of experiments with texts from a German newspaper. With nearly perfect reliability the SVM was able to reject other authors and detected the target author in 60–80% of the cases. In a second experiment, we ignored nouns, verbs and adjectives and replaced them by grammatical tags and bigrams. This resulted in slightly reduced performance. Author detection with SVMs on full word forms was remarkably robust even if the author wrote about different topics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conventionally, document classification researches focus on improving the learning capabilities of classifiers. Nevertheless, according to our observation, the effectiveness of classification is limited by the suitability of document representation. Intuitively, the more features that are used in representation, the more comprehensive that documents are represented. However, if a representation contains too many irrelevant features, the classifier would suffer from not only the curse of high dimensionality, but also overfitting. To address this problem of suitableness of document representations, we present a classifier-independent approach to measure the effectiveness of document representations. Our approach utilises a labelled document corpus to estimate the distribution of documents in the feature space. By looking through documents in this way, we can clearly identify the contributions made by different features toward the document classification. Some experiments have been performed to show how the effectiveness is evaluated. Our approach can be used as a tool to assist feature selection, dimensionality reduction and document classification.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article investigates the expression patterns of 160 genes that are expressed during early mouse development. The cDNAs were isolated from 7.5 d postcoitum (dpc) encloderm, a region that comprises visceral encloderm (VE), definitive encloderm, and the node-tissues that are required for the initial steps of axial specification and tissue patterning in the mouse. To avoid examining the same gene more than once, and to exclude potentially ubiquitously expressed housekeeping genes, cDNA sequence was derived from 1978 clones of the Endoderm library. These yielded 1440 distinct cDNAs, of which 123 proved to be novel in the mouse. In situ hybridization analysis was carried out on 160 of the cDNAs, and of these, 29 (18%) proved to have restricted expression patterns.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Understanding and predicting the distribution of organisms in heterogeneous environments lies at the heart of ecology, and the theory of density-dependent habitat selection (DDHS) provides ecologists with an inferential framework linking evolution and population dynamics. Current theory does not allow for temporal variation in habitat quality, a serious limitation when confronted with real ecological systems. We develop both a stochastic equivalent of the ideal free distribution to study how spatial patterns of habitat use depend on the magnitude and spatial correlation of environmental stochasticity and also a stochastic habitat selection rule. The emerging patterns are confronted with deterministic predictions based on isodar analysis, an established empirical approach to the analysis of habitat selection patterns. Our simulations highlight some consistent patterns of habitat use, indicating that it is possible to make inferences about the habitat selection process based on observed patterns of habitat use. However, isodar analysis gives results that are contingent on the magnitude and spatial correlation of environmental stochasticity. Hence, DDHS is better revealed by a measure of habitat selectivity than by empirical isodars. The detection of DDHS is but a small component of isodar theory, which remains an important conceptual framework for linking evolutionary strategies in behavior and population dynamics.