Biblioteca Digital

944 resultados para selection methods

Scalable Feature Selection Applications for Genome-Wide Association Studies of Complex Diseases

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have aﬀorded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to eﬀectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including ﬁlter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be eﬀective at predicting the disease phenotypes, but also doing so eﬃciently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.

Bayesian estimation and selection of nonlinear vector error correction models: The case of the sugar-ethanol-oil nexus in Brazil

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Nonlinear adjustment toward long-run price equilibrium relationships in the sugar-ethanol-oil nexus in Brazil is examined. We develop generalized bivariate error correction models that allow for cointegration between sugar, ethanol, and oil prices, where dynamic adjustments are potentially nonlinear functions of the disequilibrium errors. A range of models are estimated using Bayesian Monte Carlo Markov Chain algorithms and compared using Bayesian model selection methods. The results suggest that the long-run drivers of Brazilian sugar prices are oil prices and that there are nonlinearities in the adjustment processes of sugar and ethanol prices to oil price but linear adjustment between ethanol and sugar prices.

An objective, niche-based approach to indicator species selection

Relevância:

70.00% 70.00%

Publicador:

Resumo:

1. Species-based indices are frequently employed as surrogates for wider biodiversity health and measures of environmental condition. Species selection is crucial in determining an indicators metric value and hence the validity of the interpretation of ecosystem condition and function it provides, yet an objective process to identify appropriate indicator species is frequently lacking. 2. An effective indicator needs to (i) be representative, reflecting the status of wider biodiversity; (ii) be reactive, acting as early-warning systems for detrimental changes in environmental conditions; (iii) respond to change in a predictable way. We present an objective, niche-based approach for species' selection, founded on a coarse categorisation of species' niche space and key resource requirements, which ensures the resultant indicator has these key attributes. 3. We use UK farmland birds as a case study to demonstrate this approach, identifying an optimal indicator set containing 12 species. In contrast to the 19 species included in the farmland bird index (FBI), a key UK biodiversity indicator that contributes to one of the UK Government's headline indicators of sustainability, the niche space occupied by these species fully encompasses that occupied by the wider community of 62 species. 4. We demonstrate that the response of these 12 species to land-use change is a strong correlate to that of the wider farmland bird community. Furthermore, the temporal dynamics of the index based on their population trends closely matches the population dynamics of the wider community. However, in both analyses, the magnitude of the change in our indicator was significantly greater, allowing this indicator to act as an early-warning system. 5. Ecological indicators are embedded in environmental management, sustainable development and biodiversity conservation policy and practice where they act as metrics against which progress towards national, regional and global targets can be measured. Adopting this niche-based approach for objective selection of indicator species will facilitate the development of sensitive and representative indices for a range of taxonomic groups, habitats and spatial scales.

A comparative review of dimension reduction methods in approximate Bayesian computation

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.

Selection of an ivermectin-resistant strain of Rhipicephalus microplus (Acari: Ixodidae) in Brazil

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Resistance to ivermectin (IVM) in field Populations of Rhipicephalus microplus of Brazil has been observed since 2001 In this work, four selection methods (infestations with: (I) IVM-treated larvae, (2) larvae from IVM-treated adult female ticks, (3) larvae from IVM-treated adult female ticks on an IVM-treated host, and (4) larvae obtained from W-treated females that produced eggs with a high eclosion rate) were used oil a field population with an initial ivermectin (IVM) resistance ratio at LC50 (RR50) of 1 37 with the objective to obtain experimentally a highly-resistant strain After ten generations, using these methods combined, the final RR50 was 8 06 This work shows for the first time that it was possible to increase IVM resistance in R. microplus in laboratory conditions. The establishment of a drug resistant R microplus strain is a fundamental first step for further research into the mechanisms of ivermectin-resistance in R. microplus and potentially methods to control this resistance (C) 2009 Elsevier B V All rights reserved

Towards improving cluster-based feature selection with a simplified silhouette filter

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter May be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features. (C) 2011 Elsevier Inc. All rights reserved.

Isolate((R)) and Optiprep((R)) minigradients as alternatives for sperm selection in bovine in vitro embryo production

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objective of this study was to evaluate alternatives in small volumes to conventional gradient of Percoll((R)) on semen quality, in vitro embryo production, sex ratio and embryo survival after vitrification. Thawed semen was randomly allocated to one of four density gradient selection methods: (1) conventional Percoll((R)) (P), (2) MiniPercoll (MP), (3) MiniIsolate (MI), and (4) MiniOptiprep (MO). Sperm kinetics and quality were evaluated. Use of P, MP and MI gradients did not affect sperm motility (P > 0.05). However, there was a decrease in total and progressive sperm motility in MO (70.8 and 51.3% vs. 87.3 and 69.5% for P; 87.3 and 73% for MP; 92.3 and 78.8% for MI; P < 0.05). The MO had lower membrane integrity compared with P, MP and MI (39.7 vs. 70.5, 72.3, 63.8%, respectively, P < 0.05). The percentage of blastocysts produced was higher in MI than in MP and MO (21.1 vs. 16.1 and 16.9%, P < 0.05) and similar to P (18.4%; P > 0.05). Sex ratio and embryo survival after vitrification were similar among groups (P > 0.05). Semen selected by Isolate and Optiprep gradient, at the concentrations and small volumes used, demonstrated similar characteristics and in vitro embryo production to conventional Percoll((R)) gradient.

Early selection in open-pollinated Eucalyptus families based on competition covariates

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Regularized logistic regression and multi-objective variable selection for classifying MEG data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper addresses the question of maximizing classifier accuracy for classifying task-related mental activity from Magnetoencelophalography (MEG) data. We propose the use of different sources of information and introduce an automatic channel selection procedure. To determine an informative set of channels, our approach combines a variety of machine learning algorithms: feature subset selection methods, classifiers based on regularized logistic regression, information fusion, and multiobjective optimization based on probabilistic modeling of the search space. The experimental results show that our proposal is able to improve classification accuracy compared to approaches whose classifiers use only one type of MEG information or for which the set of channels is fixed a priori.

Automatic blood glucose classification for gestational diabetes with feature selection: decision trees vs neural networks

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Automatic blood glucose classification may help specialists to provide a better interpretation of blood glucose data, downloaded directly from patients glucose meter and will contribute in the development of decision support systems for gestational diabetes. This paper presents an automatic blood glucose classifier for gestational diabetes that compares 6 different feature selection methods for two machine learning algorithms: neural networks and decision trees. Three searching algorithms, Greedy, Best First and Genetic, were combined with two different evaluators, CSF and Wrapper, for the feature selection. The study has been made with 6080 blood glucose measurements from 25 patients. Decision trees with a feature set selected with the Wrapper evaluator and the Best first search algorithm obtained the best accuracy: 95.92%.

A survey of feature selection in Internet traffic characterization

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the last decade, the research community has focused on new classification methods that rely on statistical characteristics of Internet traffic, instead of pre-viously popular port-number-based or payload-based methods, which are under even bigger constrictions. Some research works based on statistical characteristics generated large fea-ture sets of Internet traffic; however, nowadays it?s impossible to handle hun-dreds of features in big data scenarios, only leading to unacceptable processing time and misleading classification results due to redundant and correlative data. As a consequence, a feature selection procedure is essential in the process of Internet traffic characterization. In this paper a survey of feature selection methods is presented: feature selection frameworks are introduced, and differ-ent categories of methods are briefly explained and compared; several proposals on feature selection in Internet traffic characterization are shown; finally, future application of feature selection to a concrete project is proposed.

Computational Methods for Identifying Hypertrophic Cardiomyopathy using 12-lead ECG

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Hypertrophic cardiomyopathy (HCM) is a cardiovascular disease where the heart muscle is partially thickened and blood flow is - potentially fatally - obstructed. It is one of the leading causes of sudden cardiac death in young people. Electrocardiography (ECG) and Echocardiography (Echo) are the standard tests for identifying HCM and other cardiac abnormalities. The American Heart Association has recommended using a pre-participation questionnaire for young athletes instead of ECG or Echo tests due to considerations of cost and time involved in interpreting the results of these tests by an expert cardiologist. Initially we set out to develop a classifier for automated prediction of young athletes’ heart conditions based on the answers to the questionnaire. Classification results and further in-depth analysis using computational and statistical methods indicated significant shortcomings of the questionnaire in predicting cardiac abnormalities. Automated methods for analyzing ECG signals can help reduce cost and save time in the pre-participation screening process by detecting HCM and other cardiac abnormalities. Therefore, the main goal of this dissertation work is to identify HCM through computational analysis of 12-lead ECG. ECG signals recorded on one or two leads have been analyzed in the past for classifying individual heartbeats into different types of arrhythmia as annotated primarily in the MIT-BIH database. In contrast, we classify complete sequences of 12-lead ECGs to assign patients into two groups: HCM vs. non-HCM. The challenges and issues we address include missing ECG waves in one or more leads and the dimensionality of a large feature-set. We address these by proposing imputation and feature-selection methods. We develop heartbeat-classifiers by employing Random Forests and Support Vector Machines, and propose a method to classify full 12-lead ECGs based on the proportion of heartbeats classified as HCM. The results from our experiments show that the classifiers developed using our methods perform well in identifying HCM. Thus the two contributions of this thesis are the utilization of computational and statistical methods for discovering shortcomings in a current screening procedure and the development of methods to identify HCM through computational analysis of 12-lead ECG signals.

When do conservation planning methods deliver? Quantifying the consequences of uncertainty

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The rapid global loss of biodiversity has led to a proliferation of systematic conservation planning methods. In spite of their utility and mathematical sophistication, these methods only provide approximate solutions to real-world problems where there is uncertainty and temporal change. The consequences of errors in these solutions are seldom characterized or addressed. We propose a conceptual structure for exploring the consequences of input uncertainty and oversimpli?ed approximations to real-world processes for any conservation planning tool or strategy. We then present a computational framework based on this structure to quantitatively model species representation and persistence outcomes across a range of uncertainties. These include factors such as land costs, landscape structure, species composition and distribution, and temporal changes in habitat. We demonstrate the utility of the framework using several reserve selection methods including simple rules of thumb and more sophisticated tools such as Marxan and Zonation. We present new results showing how outcomes can be strongly affected by variation in problem characteristics that are seldom compared across multiple studies. These characteristics include number of species prioritized, distribution of species richness and rarity, and uncertainties in the amount and quality of habitat patches. We also demonstrate how the framework allows comparisons between conservation planning strategies and their response to error under a range of conditions. Using the approach presented here will improve conservation outcomes and resource allocation by making it easier to predict and quantify the consequences of many different uncertainties and assumptions simultaneously. Our results show that without more rigorously generalizable results, it is very dif?cult to predict the amount of error in any conservation plan. These results imply the need for standard practice to include evaluating the effects of multiple real-world complications on the behavior of any conservation planning method.

A study of load support and other criteria appropriate to the selection of industrial conveyor belts

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A study of conveying practice demonstrates that belt conveyors provide a versatile and. much-used method of transporting bulk materials, but a review of belting manufacturers' design procedures shows that belt design and selection rules are often based on experience with all-cotton belts no longer in common use, and are net completely relevant to modern synthetic constructions. In particular, provision of the property "load support", which was not critical with cotton belts, is shown to determine the outcome of most belt selection exercises and lead to gross over specification of other design properties in many cases. The results of an original experimental investigation into this property, carried out to determine the belt and conveyor parameters that affect it, how the major role that belt stiffness plays in its provision; the basis for a belt stiffness test relevant to service conditions is given. A proposal for a more rational method of specifying load support data results from the work, but correlation of the test results with service performance is necessary before the absolute toad support capability required from a belt for given working conditions can be quantified. A study to attain this correlation is the major proposal for future work resulting from the present investigation, but a full review of the literature on conveyor design and a study of present practice within the belting industry demonstrate other, less critical, factors that could profitably be investigated. It is suggested that the most suitable method of studying these would be a rational data collection system to provide information on various facets of belt service behaviour; a basis for such a system is proposed. In addition to the work above, proposals for simplifying the present belt selection methods are made and a strain transducer suitable for use in future experimental investigations is developed.

A survey of UK selection practices across different organization sizes and industry sectors

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents results of a study examining the methods used to select employees in 579 UK organizations representing a range of different organization sizes and industry sectors. Overall, a smaller proportion of organizations in this sample reported using formalized methods (e.g., assessment centres) than informal methods (e.g., unstructured interviews). The curriculum vitae (CVs) was the most commonly used selection method, followed by the traditional triad of application form, interviews, and references. Findings also indicated that the use of different selection methods was similar in both large organizations and small-to-medium-sized enterprises. Differences were found across industry sector with public and voluntary sectors being more likely to use formalized techniques (e.g., application forms rather than CVs and structured rather than unstructured interviews). The results are discussed in relation to their implications, both in terms of practice and future research.

«
1
2
3
4
5
6
7
8
...
62
63
»