12 resultados para sample subset optimization (SSO)

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data in many biological problems are often compounded by imbalanced class distribution. That is, the positive examples may largely outnumbered by the negative examples. Many classification algorithms such as support vector machine (SVM) are sensitive to data with imbalanced class distribution, and result in a suboptimal classification. It is desirable to compensate the imbalance effect in model training for more accurate classification. In this study, we propose a sample subset optimization technique for classifying biological data with moderate and extremely high imbalanced class distributions. By using this optimization technique with an ensemble of SVMs, we build multiple roughly balanced SVM base classifiers, each trained on an optimized sample subset. The experimental results demonstrate that the ensemble of SVMs created by our sample subset optimization technique can achieve higher area under the ROC curve (AUC) value than popular sampling approaches such as random over-/under-sampling; SMOTE sampling, and those in widely used ensemble approaches such as bagging and boosting.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mineral potential mapping is the process of combining a set of input maps, each representing a distinct geo-scientific variable, to produce a single map which ranks areas according to their potential to host deposits of a particular type. The maps are combined using a mapping function which must be either provided by an expert (knowledge-driven approach), or induced from sample data (data-driven approach). Current data-driven approaches using multilayer perceptrons (MLPs) to represent the mapping function have several inherent problems: they rely heavily on subjective judgment in selecting training data and are highly sensitive to this selection; they do not utilize the contextual information provided by unlabeled data; and, there is no objective interpretation of the values output by the MLP. This paper presents a novel approach which overcomes these three problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.
Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.
Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose: To compare tear film osmolarity measurements between in situ and vapor pressure osmometers. Repeatability of in situ measurements and the effect of sample collection techniques on tear film osmolarity were also evaluated.

Methods: Osmolarity was measured in one randomly determined eye of 52 healthy participants using the in situ (TearLab Corporation, San Diego, CA) and the vapor pressure (Vapro 5520; Wescor, Inc., Logan, UT) osmometers. In a subset of 20 participants, tear osmolarity was measured twice on-eye with the in situ osmometer and was additionally determined on a sample of nonstimulated collected tears (3 µL) with both instruments.

Results: Mean (SD) tear film osmolarity with the in situ osmometer was 299.2 (10.3) mOsmol/L compared with 298.4 (10) mmol/kg with the vapor pressure osmometer, which correlated moderately (r = 0.5, P < 0.05). Limits of agreement between the two instruments were -19.7 to +20.5 mOsmol/L. Using collected tears, measurements with the vapor pressure osmometer were marginally higher (mean [SD], 303.0 [11.0] vs 299.3 [8.0] mOsmol/L; P > 0.05) but correlated well with those using the in situ osmometer (r = 0.9, P < 0.05). The mean (SD) osmolarity of on-eye tears was 5.0 (6.6) mOsmol/L higher than that of collected tears, when both measurements were conducted with the in situ osmometer. This was a consistent effect because the measurements correlated well (r = 0.65, P < 0.05).The in situ osmometer showed good repeatability with a coefficient of repeatability of 9.4 mOsmol/L (r = 0.8, P < 0.05).

Conclusions: Correlation between the two instruments was better when compared on collected tear samples. Tear film osmolarity measurement is influenced by the sample collection technique with the osmolarity of on-eye tears being higher than that of collected tears. This highlights the importance of measuring tear film osmolarity directly on-eye. The in situ osmometer has good repeatability for conducting this measurement.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Intelligent Water Drop (IWD) algorithm is a recent stochastic swarm-based method that is useful for solving combinatorial and function optimization problems. In this paper, we investigate the effectiveness of the selection method in the solution construction phase of the IWD algorithm. Instead of the fitness proportionate selection method in the original IWD algorithm, two ranking-based selection methods, namely linear ranking and exponential ranking, are proposed. Both ranking-based selection methods aim to solve the identified limitations of the fitness proportionate selection method as well as to enable the IWD algorithm to escape from local optima and ensure its search diversity. To evaluate the usefulness of the proposed ranking-based selection methods, a series of experiments pertaining to three combinatorial optimization problems, i.e., rough set feature subset selection, multiple knapsack and travelling salesman problems, is conducted. The results demonstrate that the exponential ranking selection method is able to preserve the search diversity, therefore improving the performance of the IWD algorithm. © 2014 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Surface modification of precipitated calcium carbonate particles (calcite) in a planetary ball mill using stearic acid as a modification agent for making dispersion in hydrocarbon oil was investigated. Different parameters for processing (milling) such as milling time, ball-to-sample ratio, and molar ratio of the reactant were varied and analyzed for optimization. The physical properties of the hydrophobically modified calcium carbonate particles were measured; the particle size and morphology of the resulting samples were characterized by transmission electron microscopy and X-ray diffraction. The surface coating thickness was estimated using small angle X-ray scattering. © 2014 American Coatings Association & Oil and Colour Chemists' Association.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Industrial producers face the task of optimizing production process in an attempt to achieve the desired quality such as mechanical properties with the lowest energy consumption. In industrial carbon fiber production, the fibers are processed in bundles containing (batches) several thousand filaments and consequently the energy optimization will be a stochastic process as it involves uncertainty, imprecision or randomness. This paper presents a stochastic optimization model to reduce energy consumption a given range of desired mechanical properties. Several processing condition sets are developed and for each set of conditions, 50 samples of fiber are analyzed for their tensile strength and modulus. The energy consumption during production of the samples is carefully monitored on the processing equipment. Then, five standard distribution functions are examined to determine those which can best describe the distribution of mechanical properties of filaments. To verify the distribution goodness of fit and correlation statistics, the Kolmogorov-Smirnov test is used. In order to estimate the selected distribution (Weibull) parameters, the maximum likelihood, least square and genetic algorithm methods are compared. An array of factors including the sample size, the confidence level, and relative error of estimated parameters are used for evaluating the tensile strength and modulus properties. The energy consumption and N2 gas cost are modeled by Convex Hull method. Finally, in order to optimize the carbon fiber production quality and its energy consumption and total cost, mixed integer linear programming is utilized. The results show that using the stochastic optimization models, we are able to predict the production quality in a given range and minimize the energy consumption of its industrial process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Prognosis, such as predicting mortality, is common in medicine. When confronted with small numbers of samples, as in rare medical conditions, the task is challenging. We propose a framework for classification with data with small numbers of samples. Conceptually, our solution is a hybrid of multi-task and transfer learning, employing data samples from source tasks as in transfer learning, but considering all tasks together as in multi-task learning. Each task is modelled jointly with other related tasks by directly augmenting the data from other tasks. The degree of augmentation depends on the task relatedness and is estimated directly from the data. We apply the model on three diverse real-world data sets (healthcare data, handwritten digit data and face data) and show that our method outperforms several state-of-the-art multi-task learning baselines. We extend the model for online multi-task learning where the model parameters are incrementally updated given new data or new tasks. The novelty of our method lies in offering a hybrid multi-task/transfer learning model to exploit sharing across tasks at the data-level and joint parameter learning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is crucial for a neuron spike sorting algorithm to cluster data from different neurons efficiently. In this study, the search capability of the Genetic Algorithm (GA) is exploited for identifying the optimal feature subset for neuron spike sorting with a clustering algorithm. Two important objectives of the optimization process are considered: to reduce the number of features and increase the clustering performance. Specifically, we employ a binary GA with the silhouette evaluation criterion as the fitness function for neuron spike sorting using the Super-Paramagnetic Clustering (SPC) algorithm. The clustering results of SPC with and without the GA-based feature selector are evaluated using benchmark synthetic neuron spike data sets. The outcome indicates the usefulness of the GA in identifying a smaller feature set with improved clustering performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Crown Copyright © 2015 Published by Elsevier Inc. All rights reserved. The Intelligent Water Drop (IWD) algorithm is a recent stochastic swarm-based method that is useful for solving combinatorial and function optimization problems. In this paper, we propose an IWD ensemble known as the Master-River, Multiple-Creek IWD (MRMC-IWD) model, which serves as an extension of the modified IWD algorithm. The MRMC-IWD model aims to improve the exploration capability of the modified IWD algorithm. It comprises a master river which cooperates with multiple independent creeks to undertake optimization problems based on the divide-and-conquer strategy. A technique to decompose the original problem into a number of sub-problems is first devised. Each sub-problem is then assigned to a creek, while the overall solution is handled by the master river. To empower the exploitation capability, a hybrid MRMC-IWD model is introduced. It integrates the iterative improvement local search method with the MRMC-IWD model to allow a local search to be conducted, therefore enhancing the quality of solutions provided by the master river. To evaluate the effectiveness of the proposed models, a series of experiments pertaining to two combinatorial problems, i.e., the travelling salesman problem (TSP) and rough set feature subset selection (RSFS), are conducted. The results indicate that the MRMC-IWD model can satisfactorily solve optimization problems using the divide-and-conquer strategy. By incorporating a local search method, the resulting hybrid MRMC-IWD model not only is able to balance exploration and exploitation, but also to enable convergence towards the optimal solutions, by employing a local search method. In all seven selected TSPLIB problems, the hybrid MRMC-IWD model achieves good results, with an average deviation of 0.021% from the best known optimal tour lengths. Compared with other state-of-the-art methods, the hybrid MRMC-IWD model produces the best results (i.e. the shortest and uniform reducts of 20 runs) for all13 selected RSFS problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

All rights reserved. In this paper, we propose and study a unified mixed-integer programming model that simultaneously optimizes fluence weights and multi-leaf collimator (MLC) apertures in the treatment planning optimization of VMAT, Tomotherapy, and CyberKnife. The contribution of our model is threefold: (i) Our model optimizes the fluence and MLC apertures simultaneously for a given set of control points. (ii) Our model can incorporate all volume limits or dose upper bounds for organs at risk (OAR) and dose lower bound limits for planning target volumes (PTV) as hard constraints, but it can also relax either of these constraint sets in a Lagrangian fashion and keep the other set as hard constraints. (iii) For faster solutions, we propose several heuristic methods based on the MIP model, as well as a meta-heuristic approach. The meta-heuristic is very efficient in practice, being able to generate dose- and machinery-feasible solutions for problem instances of clinical scale, e.g., obtaining feasible treatment plans to cases with 180 control points, 6750 sample voxels and 18,000 beamlets in 470 seconds, or cases with 72 control points, 8000 sample voxels and 28,800 beamlets in 352 seconds. With discretization and down-sampling of voxels, our method is capable of tackling a treatment field of 8000-64,000cm3, depending on the ratio of critical structure versus unspecified tissues.