960 resultados para Maximum-entropy selection criterion


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The localization of Last Glacial Maximum (LGM) refugia is crucial information to understand a species' history and predict its reaction to future climate changes. However, many phylogeographical studies often lack sampling designs intensive enough to precisely localize these refugia. The hairy land snail Trochulus villosus has a small range centred on Switzerland, which could be intensively covered by sampling 455 individuals from 52 populations. Based on mitochondrial DNA sequences (COI and 16S), we identified two divergent lineages with distinct geographical distributions. Bayesian skyline plots suggested that both lineages expanded at the end of the LGM. To find where the origin populations were located, we applied the principles of ancestral character reconstruction and identified a candidate refugium for each mtDNA lineage: the French Jura and Central Switzerland, both ice-free during the LGM. Additional refugia, however, could not be excluded, as suggested by the microsatellite analysis of a population subset. Modelling the LGM niche of T. villosus, we showed that suitable climatic conditions were expected in the inferred refugia, but potentially also in the nunataks of the alpine ice shield. In a model selection approach, we compared several alternative recolonization scenarios by estimating the Akaike information criterion for their respective maximum-likelihood migration rates. The 'two refugia' scenario received by far the best support given the distribution of genetic diversity in T. villosus populations. Provided that fine-scale sampling designs and various analytical approaches are combined, it is possible to refine our necessary understanding of species responses to environmental changes.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this study, feature selection in classification based problems is highlighted. The role of feature selection methods is to select important features by discarding redundant and irrelevant features in the data set, we investigated this case by using fuzzy entropy measures. We developed fuzzy entropy based feature selection method using Yu's similarity and test this using similarity classifier. As the similarity classifier we used Yu's similarity, we tested our similarity on the real world data set which is dermatological data set. By performing feature selection based on fuzzy entropy measures before classification on our data set the empirical results were very promising, the highest classification accuracy of 98.83% was achieved when testing our similarity measure to the data set. The achieved results were then compared with some other results previously obtained using different similarity classifiers, the obtained results show better accuracy than the one achieved before. The used methods helped to reduce the dimensionality of the used data set, to speed up the computation time of a learning algorithm and therefore have simplified the classification task

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we propose a novel filter for feature selection. Such filter relies on the estimation of the mutual information between features and classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon one. The complexity of such bypassing process does not depend on the number of dimensions but on the number of patterns/samples, and thus the curse of dimensionality is circumvented. We show that it is then possible to outperform a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The use of chemical control measures to reduce the impact of parasite and pest species has frequently resulted in the development of resistance. Thus, resistance management has become a key concern in human and veterinary medicine, and in agricultural production. Although it is known that factors such as gene flow between susceptible and resistant populations, drug type, application methods, and costs of resistance can affect the rate of resistance evolution, less is known about the impacts of density-dependent eco-evolutionary processes that could be altered by drug-induced mortality. The overall aim of this thesis was to take an experimental evolution approach to assess how life history traits respond to drug selection, using a free-living dioecious worm (Caenorhabditis remanei) as a model. In Chapter 2, I defined the relationship between C. remanei survival and Ivermectin dose over a range of concentrations, in order to control the intensity of selection used in the selection experiment described in Chapter 4. The dose-response data were also used to appraise curve-fitting methods, using Akaike Information Criterion (AIC) model selection to compare a series of nonlinear models. The type of model fitted to the dose response data had a significant effect on the estimates of LD50 and LD99, suggesting that failure to fit an appropriate model could give misleading estimates of resistance status. In addition, simulated data were used to establish that a potential cost of resistance could be predicted by comparing survival at the upper asymptote of dose-response curves for resistant and susceptible populations, even when differences were as low as 4%. This approach to dose-response modeling ensures that the maximum amount of useful information relating to resistance is gathered in one study. In Chapter 3, I asked how simulations could be used to inform important design choices used in selection experiments. Specifically, I focused on the effects of both within- and between-line variation on estimated power, when detecting small, medium and large effect sizes. Using mixed-effect models on simulated data, I demonstrated that commonly used designs with realistic levels of variation could be underpowered for substantial effect sizes. Thus, use of simulation-based power analysis provides an effective way to avoid under or overpowering a study designs incorporating variation due to random effects. In Chapter 4, I 3 investigated how Ivermectin dosage and changes in population density affect the rate of resistance evolution. I exposed replicate lines of C. remanei to two doses of Ivermectin (high and low) to assess relative survival of lines selected in drug-treated environments compared to untreated controls over 10 generations. Additionally, I maintained lines where mortality was imposed randomly to control for differences in density between drug treatments and to distinguish between the evolutionary consequences of drug treatment versus ecological processes affected by changes in density-dependent feedback. Intriguingly, both drug-selected and random-mortality lines showed an increase in survivorship when challenged with Ivermectin; the magnitude of this increase varied with the intensity of selection and life-history stage. The results suggest that interactions between density-dependent processes and life history may mediate evolved changes in susceptibility to control measures, which could result in misleading conclusions about the evolution of heritable resistance following drug treatment. In Chapter 5, I investigated whether the apparent changes in drug susceptibility found in Chapter 4 were related to evolved changes in life-history of C. remanei populations after selection in drug-treated and random-mortality environments. Rapid passage of lines in the drug-free environment had no effect on the measured life-history traits. In the drug-free environment, adult size and fecundity of drug-selected lines increased compared to the controls but drug selection did not affect lifespan. In the treated environment, drug-selected lines showed increased lifespan and fecundity relative to controls. Adult size of randomly culled lines responded in a similar way to drug-selected lines in the drug-free environment, but no change in fecundity or lifespan was observed in either environment. The results suggest that life histories of nematodes can respond to selection as a result of the application of control measures. Failure to take these responses into account when applying control measures could result in adverse outcomes, such as larger and more fecund parasites, as well as over-estimation of the development of genetically controlled resistance. In conclusion, my thesis shows that there may be a complex relationship between drug selection, density-dependent regulatory processes and life history of populations challenged with control measures. This relationship could have implications for how resistance is monitored and managed if life histories of parasitic species show such eco-evolutionary responses to drug application.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow’s milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The thesis deals with the problem of Model Selection (MS) motivated by information and prediction theory, focusing on parametric time series (TS) models. The main contribution of the thesis is the extension to the multivariate case of the Misspecification-Resistant Information Criterion (MRIC), a criterion introduced recently that solves Akaike’s original research problem posed 50 years ago, which led to the definition of the AIC. The importance of MS is witnessed by the huge amount of literature devoted to it and published in scientific journals of many different disciplines. Despite such a widespread treatment, the contributions that adopt a mathematically rigorous approach are not so numerous and one of the aims of this project is to review and assess them. Chapter 2 discusses methodological aspects of MS from information theory. Information criteria (IC) for the i.i.d. setting are surveyed along with their asymptotic properties; and the cases of small samples, misspecification, further estimators. Chapter 3 surveys criteria for TS. IC and prediction criteria are considered for: univariate models (AR, ARMA) in the time and frequency domain, parametric multivariate (VARMA, VAR); nonparametric nonlinear (NAR); and high-dimensional models. The MRIC answers Akaike’s original question on efficient criteria, for possibly-misspecified (PM) univariate TS models in multi-step prediction with high-dimensional data and nonlinear models. Chapter 4 extends the MRIC to PM multivariate TS models for multi-step prediction introducing the Vectorial MRIC (VMRIC). We show that the VMRIC is asymptotically efficient by proving the decomposition of the MSPE matrix and the consistency of its Method-of-Moments Estimator (MoME), for Least Squares multi-step prediction with univariate regressor. Chapter 5 extends the VMRIC to the general multiple regressor case, by showing that the MSPE matrix decomposition holds, obtaining consistency for its MoME, and proving its efficiency. The chapter concludes with a digression on the conditions for PM VARX models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The criteria and timing for nerve surgery in infants with obstetric brachial plexopathy remain controversial. Our aim was to develop a new method for early prognostic assessment to assist this decision process. Methods: Fifty-four patients with unilateral obstetric brachial plexopathy who were ten to sixty days old underwent bilateral motor-nerve-conduction studies of the axillary, musculocutaneous, proximal radial, distal radial, median, and ulnar nerves. The ratio between the amplitude of the compound muscle action potential of the affected limb and that of the healthy side was called the axonal viability index. The patients were followed and classified in three groups according to the clinical outcome. We analyzed the receiver operating characteristic curve of each index to define the best cutoff point to detect patients with a poor recovery. Results: The best cutoff points on the axonal viability index for each nerve (and its sensitivity and specificity) were <10% (88% and 89%, respectively) for the axillary nerve, 0% (88% and 73%) for the musculocutaneous nerve, <20% (82% and 97%) for the proximal radial nerve, <50% (82% and 97%) for the distal radial nerve, and <50% (59% and 97%) for the ulnar nerve. The indices from the proximal radial, distal radial, and ulnar nerves had better specificities compared with the most frequently used clinical criterion: absence of biceps function at three months of age. Conclusions: The axonal viability index yields an earlier and more specific prognostic estimation of obstetric brachial plexopathy than does the clinical criterion of biceps function, and we believe it may be useful in determining surgical indications in these patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e. g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. Results: The intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes ( targets or predictors) is also implemented in the system. Conclusion: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study presents a decision-making method for maintenance policy selection of power plants equipment. The method is based on risk analysis concepts. The method first step consists in identifying critical equipment both for power plant operational performance and availability based on risk concepts. The second step involves the proposal of a potential maintenance policy that could be applied to critical equipment in order to increase its availability. The costs associated with each potential maintenance policy must be estimated, including the maintenance costs and the cost of failure that measures the critical equipment failure consequences for the power plant operation. Once the failure probabilities and the costs of failures are estimated, a decision-making procedure is applied to select the best maintenance policy. The decision criterion is to minimize the equipment cost of failure, considering the costs and likelihood of occurrence of failure scenarios. The method is applied to the analysis of a lubrication oil system used in gas turbines journal bearings. The turbine has more than 150 MW nominal output, installed in an open cycle thermoelectric power plant. A design modification with the installation of a redundant oil pump is proposed for lubricating oil system availability improvement. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A total of 152,145 weekly test-day milk yield records from 7317 first lactations of Holstein cows distributed in 93 herds in southeastern Brazil were analyzed. Test-day milk yields were classified into 44 weekly classes of DIM. The contemporary groups were defined as herd-year-week of test-day. The model included direct additive genetic, permanent environmental and residual effects as random and fixed effects of contemporary group and age of cow at calving as covariable, linear and quadratic effects. Mean trends were modeled by a cubic regression on orthogonal polynomials of DIM. Additive genetic and permanent environmental random effects were estimated by random regression on orthogonal Legendre polynomials. Residual variances were modeled using third to seventh-order variance functions or a step function with 1, 6,13,17 and 44 variance classes. Results from Akaike`s and Schwarz`s Bayesian information criterion suggested that a model considering a 7th-order Legendre polynomial for additive effect, a 12th-order polynomial for permanent environment effect and a step function with 6 classes for residual variances, fitted best. However, a parsimonious model, with a 6th-order Legendre polynomial for additive effects and a 7th-order polynomial for permanent environmental effects, yielded very similar genetic parameter estimates. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Marine invertebrate sperm proteins are particularly interesting because they are characterized by positive selection and are likely to be involved in prezyogotic isolation and, thus, speciation. Here, we present the first survey of inter and intraspecific variation of a bivalve sperm protein among a group of species that regularly hybridize in nature. M7 lysin is found in sperm acrosomes of mussels and dissolves the egg vitelline coat, permitting fertilization. We sequenced multiple alleles of the mature protein-coding region of M7 lysin from allopatric populations of mussels in the Mytilus edulis species group (M. edulis, M. galloprovincialis, and M. trossulus). A significant McDonald-Kreitman test showed an excess of fixed amino acid replacing substitutions between species, consistent with positive selection. In addition, Kolmogorov-Smirnov tests showed significant heterogeneity in polymorphism to divergence ratios for both synonymous variation and combined synonymous and non-synonymous variation within M. galloprovincialis. These results indicate that there has been adaptive evolution at M7 lysin and, furthermore, shows that positive selection on sperm proteins can occur even when post-zygotic reproductive isolation is incomplete.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

O cultivo do café é uma das atividades do agronegócio de maior importância socioeconômica dentre as diferentes atividades ligadas ao comércio agrícola mundial. Uma das maiores contribuições da genética quantitativa para o melhoramento genético é a possibilidade de prever ganhos genéticos. Quando diferentes critérios de seleção são considerados, a predição de ganhos referentes a cada critério tem grande importância, pois indica os melhoristas sobre como utilizar o material genético disponível, visando obter o máximo de ganhos possível para as características de interesse. O presente trabalho foi instalado em julho de 2004, na Fazenda Experimental de Bananal do Norte, conduzida pelo Incaper, no distrito de Pacotuba, município de Cachoeiro de Itapemirim, região Sul do Estado, com o objetivo de selecionar as melhores plantas entre e dentro de progênies de meios- irmãos de Coffea canephora, por meio de diferentes critérios de seleção. Foram realizadas análises de variância individuais e conjuntas para 26 progênies de meios- irmãos Coffea canephora. O delineamento experimental utilizado foi em blocos ao acaso com quatro testemunhas adicionais com quatro repetições e parcela composta por cinco plantas, com o espaçamento de 3,0 m x 1,2 m. Neste trabalho, considerou-se os dados das últimas cinco colheitas. As características mensuradas foram: florescimento, maturação, tamanho do grão, peso, porte, vigor, ferrugem, mancha cercóspora, seca de ponteiros, escala geral, porcentagem de frutos boia e bicho mineiro. Todas as análises estatísticas foram realizadas com o aplicativo computacional em genética e estatística (GENES). Foram estimados os ganhos de seleção em função da porcentagem de seleção de 20% entre e dentro, sendo as mesmas mantidas para todas as características. Todas as características foram submetidas a seleção no sentido positivo, exceto para florescimento, porte, ferrugem, mancha cercóspora, seca de ponteiros, porcentagem de frutos boia e bicho mineiro, para obter decréscimo em suas médias originais. Os critérios de seleção estudados foram: seleção convencional entre e dentro das famílias, índice de seleção combinada, seleção massal e seleção massal estratificada. Esta dissertação é composta por dois capítulos, em que foram realizadas análises biométricas, como a obtenção de estimativas de parâmetros genéticos. Na maioria das características estudadas, verificaram-se diferenças significativas (P<0,05) para genótipos que, associados aos coeficientes de variação genotípicos e também ao coeficiente de determinação genotípico e à relação CVg/CVe, indicam a existência de variabilidade genética nos materiais genéticos para a maioria das características e condições favoráveis para obtenção de ganhos genéticos pela seleção. Essas características também foram correlacionadas. Os dados foram submetidos às análises de variância e multivariada, aplicando-se a técnica de agrupamento e UPGMA, teste de médias e estudo de correlações. Na técnica de agrupamento, foi utilizada a distância generalizada de Mahalanobis como medida de dissimilaridade, e na delimitação dos grupos, o método de Tocher. Foi encontrada diversidade genética para as características associadas à qualidade fisiológica, mobilização de reserva das sementes, dimensões e biomassa das plântulas. Quatro grupos de genótipos puderam ser formados. Peso de massa seca de sementes, redução de reserva de sementes e peso de massa seca de plântulas estão positivamente correlacionados entre si, enquanto a redução de reserva das sementes e a eficiência na conversão dessas reservas em plântulas estão negativamente correlacionadas. De acordo com os resultados obtidos, verificou-se que todas as características apresentaram níveis diferenciados de variabilidade genética e os critérios de seleção utilizados mostraram-se eficientes para o melhoramento, no qual o índice de seleção combinada é o critério de seleção que apresentou os melhores resultados em termos de ganhos, sendo indicado como critério mais apropriado para o melhoramento genético da população estudada. Nos estudos de correlações, em 70% dos casos, a correlação fenotípica foi superior à genotípica, mostrando maior influência dos fatores ambientais em relação aos genotípicos e condições propícias ao melhoramento dos diferentes caracteres. No estudo de divergência genética, observou-se que pelo agrupamento de genótipos, pela técnica de Tocher, indicou que os genótipos foram distribuídos em três grupos.