17 resultados para Fuzzy K Nearest Neighbor
Resumo:
A visualization plot of a data set of molecular data is a useful tool for gaining insight into a set of molecules. In chemoinformatics, most visualization plots are of molecular descriptors, and the statistical model most often used to produce a visualization is principal component analysis (PCA). This paper takes PCA, together with four other statistical models (NeuroScale, GTM, LTM, and LTM-LIN), and evaluates their ability to produce clustering in visualizations not of molecular descriptors but of molecular fingerprints. Two different tasks are addressed: understanding structural information (particularly combinatorial libraries) and relating structure to activity. The quality of the visualizations is compared both subjectively (by visual inspection) and objectively (with global distance comparisons and local k-nearest-neighbor predictors). On the data sets used to evaluate clustering by structure, LTM is found to perform significantly better than the other models. In particular, the clusters in LTM visualization space are consistent with the relationships between the core scaffolds that define the combinatorial sublibraries. On the data sets used to evaluate clustering by activity, LTM again gives the best performance but by a smaller margin. The results of this paper demonstrate the value of using both a nonlinear projection map and a Bernoulli noise model for modeling binary data.
Resumo:
This thesis studies survival analysis techniques dealing with censoring to produce predictive tools that predict the risk of endovascular aortic aneurysm repair (EVAR) re-intervention. Censoring indicates that some patients do not continue follow up, so their outcome class is unknown. Methods dealing with censoring have drawbacks and cannot handle the high censoring of the two EVAR datasets collected. Therefore, this thesis presents a new solution to high censoring by modifying an approach that was incapable of differentiating between risks groups of aortic complications. Feature selection (FS) becomes complicated with censoring. Most survival FS methods depends on Cox's model, however machine learning classifiers (MLC) are preferred. Few methods adopted MLC to perform survival FS, but they cannot be used with high censoring. This thesis proposes two FS methods which use MLC to evaluate features. The two FS methods use the new solution to deal with censoring. They combine factor analysis with greedy stepwise FS search which allows eliminated features to enter the FS process. The first FS method searches for the best neural networks' configuration and subset of features. The second approach combines support vector machines, neural networks, and K nearest neighbor classifiers using simple and weighted majority voting to construct a multiple classifier system (MCS) for improving the performance of individual classifiers. It presents a new hybrid FS process by using MCS as a wrapper method and merging it with the iterated feature ranking filter method to further reduce the features. The proposed techniques outperformed FS methods based on Cox's model such as; Akaike and Bayesian information criteria, and least absolute shrinkage and selector operator in the log-rank test's p-values, sensitivity, and concordance. This proves that the proposed techniques are more powerful in correctly predicting the risk of re-intervention. Consequently, they enable doctors to set patients’ appropriate future observation plan.
Resumo:
This paper presents two hybrid genetic algorithms (HGAs) to optimize the component placement operation for the collect-and-place machines in printed circuit board (PCB) assembly. The component placement problem is to optimize (i) the assignment of components to a movable revolver head or assembly tour, (ii) the sequence of component placements on a stationary PCB in each tour, and (iii) the arrangement of component types to stationary feeders simultaneously. The objective of the problem is to minimize the total traveling time spent by the revolver head for assembling all components on the PCB. The major difference between the HGAs is that the initial solutions are generated randomly in HGA1. The Clarke and Wright saving method, the nearest neighbor heuristic, and the neighborhood frequency heuristic are incorporated into HGA2 for the initialization procedure. A computational study is carried out to compare the algorithms with different population sizes. It is proved that the performance of HGA2 is superior to HGA1 in terms of the total assembly time.
Resumo:
The distribution of finished products from depots to customers is a practical and challenging problem in logistics management. Better routing and scheduling decisions can result in higher level of customer satisfaction because more customers can be served in a shorter time. The distribution problem is generally formulated as the vehicle routing problem (VRP). Nevertheless, there is a rigid assumption that there is only one depot. In cases, for instance, where a logistics company has more than one depot, the VRP is not suitable. To resolve this limitation, this paper focuses on the VRP with multiple depots, or multi-depot VRP (MDVRP). The MDVRP is NP-hard, which means that an efficient algorithm for solving the problem to optimality is unavailable. To deal with the problem efficiently, two hybrid genetic algorithms (HGAs) are developed in this paper. The major difference between the HGAs is that the initial solutions are generated randomly in HGA1. The Clarke and Wright saving method and the nearest neighbor heuristic are incorporated into HGA2 for the initialization procedure. A computational study is carried out to compare the algorithms with different problem sizes. It is proved that the performance of HGA2 is superior to that of HGA1 in terms of the total delivery time.
Resumo:
A chip shooter machine for electronic component assembly has a movable feeder carrier, a movable X–Y table carrying a printed circuit board (PCB), and a rotary turret with multiple assembly heads. This paper presents a hybrid genetic algorithm (HGA) to optimize the sequence of component placements and the arrangement of component types to feeders simultaneously for a chip shooter machine, that is, the component scheduling problem. The objective of the problem is to minimize the total assembly time. The GA developed in the paper hybridizes different search heuristics including the nearest-neighbor heuristic, the 2-opt heuristic, and an iterated swap procedure, which is a new improved heuristic. Compared with the results obtained by other researchers, the performance of the HGA is superior in terms of the assembly time. Scope and purpose When assembling the surface mount components on a PCB, it is necessary to obtain the optimal sequence of component placements and the best arrangement of component types to feeders simultaneously in order to minimize the total assembly time. Since it is very difficult to obtain the optimality, a GA hybridized with several search heuristics is developed. The type of machines being studied is the chip shooter machine. This paper compares the algorithm with a simple GA. It shows that the performance of the algorithm is superior to that of the simple GA in terms of the total assembly time.
Resumo:
A chip shooter machine for electronic components assembly has a movable feeder carrier holding components, a movable X-Y table carrying a printed circuit board (PCB), and a rotary turret having multiple assembly heads. This paper presents a hybrid genetic algorithm to optimize the sequence of component placements for a chip shooter machine. The objective of the problem is to minimize the total traveling distance of the X-Y table or the board. The genetic algorithm developed in the paper hybridizes the nearest neighbor heuristic, and an iterated swap procedure, which is a new improved heuristic. We have compared the performance of the hybrid genetic algorithm with that of the approach proposed by other researchers and have demonstrated our algorithm is superior in terms of the distance traveled by the X-Y table or the board.
Resumo:
This paper presents a hybrid genetic algorithm to optimize the sequence of component placements on a printed circuit board and the arrangement of component types to feeders simultaneously for a pick-and-place machine with multiple stationary feeders, a fixed board table and a movable placement head. The objective of the problem is to minimize the total travelling distance, or the travelling time, of the placement head. The genetic algorithm developed in the paper hybrisizes different search heuristics including the nearest neighbor heuristic, the 2-opt heuristic, and an iterated swap procedure, which is a new improving heuristic. Compared with the results obtained by other researchers, the performance of the hybrid genetic algorithm is superior to others in terms of the distance travelled by the placement head.
Resumo:
We have studied the kinetics of the phase-separation process of mixtures of colloid and protein in solutions by real-time UV-vis spectroscopy. Complementary small-angle X-ray scattering (SAXS) was employed to determine the structures involved. The colloids used are gold nanoparticles functionalized with protein resistant oligo(ethylene glycol) (OEG) thiol, HS(CH(2))(11)(OCH(2)CH(2))(6)OMe (EG6OMe). After mixing with protein solution above a critical concentration, c*, SAXS measurements show that a scattering maximum appears after a short induction time at q = 0.0322 angstrom(-1) stop, which increases its intensity with time but the peak position does not change with time, protein concentration and salt addition. The peak corresponds to the distance of the nearest neighbor in the aggregates. The upturn of scattering intensities in the low q-range developed with time indicating the formation of aggregates. No Bragg peaks corresponding to the formation of colloidal crystallites could be observed before the clusters dropped out from the solution. The growth kinetics of aggregates is followed in detail by real-time UV-vis spectroscopy, using the flocculation parameter defined as the integral of the absorption in the range of 600-800 nm wavelengths. At low salt addition (<0.5 M), a kinetic crossover from reaction-limited cluster aggregation (RLCA) to diffusion-limited cluster aggregation (DLCA) growth model is observed, and interpreted as being due to the effective repulsive interaction barrier between colloids within the depletion potential. Above 0.5 M NaCl, the surface charge of proteins is screened significantly, and the repulsive potential barrier disappeared, thus the growth kinetics can be described by a DLCA model only.
Resumo:
Melt quenched silicate glasses containing calcium, phosphorus and alkali metals have the ability to promote bone regeneration and to fuse to living bone. Of these glasses 45S5 Bioglass® is the most widely used being sold in over 35 countries as a bone graft product for medical and dental applications; particulate 45S5 is also incorporated into toothpastes to help remineralize the surface of teeth. Recently it has been suggested that adding titanium dioxide can increase the bioactivity of these materials. This work investigates the structural consequences of incorporating 4 mol% TiO2 into Bioglass® using isotopic substitution (of the Ti) applied to neutron diffraction and X-ray Absorption Near Edge Structure (XANES). We present the first isotopic substitution data applied to melt quench derived Bioglass or its derivatives. Results show that titanium is on average surrounded by 5.2(1) nearest neighbor oxygen atoms. This implies an upper limit of 40% four-fold coordinated titanium and shows that the network connectivity is reduced from 2.11 to 1.97 for small quantities of titanium. Titanium XANES micro-fluorescence confirms the titanium environment is homogenous on the micron length scale within these glasses. Solid state magic angle spinning (MAS) NMR confirms the network connectivity model proposed. Furthermore, the results show the intermediate range order containing Na-O, Ca-O, O-P-O and O-Si-O correlations are unaffected by the addition of small quantities of TiO2 into these systems.
Resumo:
The relation between the fragility of glass-forming systems, a parameter which describes many of their key physical characteristics, and atomic scale structure is investigated by using neutron diffraction to measure the topological and chemical ordering for germania, or GeO2, which is an archetypal strong glass former. We find that the ordering for this and other tetrahedral network-forming glasses at distances greater than the nearest neighbor can be rationalized in terms of an interplay between the relative importance of two length scales. One of these is associated with an intermediate range, the other with an extended range and, with increasing glass fragility, it is the extended range ordering which dominates.
Resumo:
Allergy is an overreaction by the immune system to a previously encountered, ordinarily harmless substance - typically proteins - resulting in skin rash, swelling of mucous membranes, sneezing or wheezing, or other abnormal conditions. The use of modified proteins is increasingly widespread: their presence in food, commercial products, such as washing powder, and medical therapeutics and diagnostics, makes predicting and identifying potential allergens a crucial societal issue. The prediction of allergens has been explored widely using bioinformatics, with many tools being developed in the last decade; many of these are freely available online. Here, we report a set of novel models for allergen prediction utilizing amino acid E-descriptors, auto- and cross-covariance transformation, and several machine learning methods for classification, including logistic regression (LR), decision tree (DT), naïve Bayes (NB), random forest (RF), multilayer perceptron (MLP) and k nearest neighbours (kNN). The best performing method was kNN with 85.3% accuracy at 5-fold cross-validation. The resulting model has been implemented in a revised version of the AllerTOP server (http://www.ddg-pharmfac.net/AllerTOP). © Springer-Verlag 2014.
Resumo:
Background: Allergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences.Results: A set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (kNN). The best performing model was derived by kNN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity.Conclusions: AllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin. © 2013 Dimitrov et al.; licensee BioMed Central Ltd.
Resumo:
In order to survive in the increasingly customer-oriented marketplace, continuous quality improvement marks the fastest growing quality organization’s success. In recent years, attention has been focused on intelligent systems which have shown great promise in supporting quality control. However, only a small number of the currently used systems are reported to be operating effectively because they are designed to maintain a quality level within the specified process, rather than to focus on cooperation within the production workflow. This paper proposes an intelligent system with a newly designed algorithm and the universal process data exchange standard to overcome the challenges of demanding customers who seek high-quality and low-cost products. The intelligent quality management system is equipped with the ‘‘distributed process mining” feature to provide all levels of employees with the ability to understand the relationships between processes, especially when any aspect of the process is going to degrade or fail. An example of generalized fuzzy association rules are applied in manufacturing sector to demonstrate how the proposed iterative process mining algorithm finds the relationships between distributed process parameters and the presence of quality problems.
Resumo:
This paper develops an integratedapproach, combining quality function deployment (QFD), fuzzy set theory, and analytic hierarchy process (AHP) approach, to evaluate and select the optimal third-party logistics service providers (3PLs). In the approach, multiple evaluating criteria are derived from the requirements of company stakeholders using a series of house of quality (HOQ). The importance of evaluating criteria is prioritized with respect to the degree of achieving the stakeholder requirements using fuzzyAHP. Based on the ranked criteria, alternative 3PLs are evaluated and compared with each other using fuzzyAHP again to make an optimal selection. The effectiveness of proposed approach is demonstrated by applying it to a Hong Kong based enterprise that supplies hard disk components. The proposed integratedapproach outperforms the existing approaches because the outsourcing strategy and 3PLs selection are derived from the corporate/business strategy.
Resumo:
Renewable energy project development is highly complex and success is by no means guaranteed. Decisions are often made with approximate or uncertain information yet the current methods employed by decision-makers do not necessarily accommodate this. Levelised energy costs (LEC) are one such commonly applied measure utilised within the energy industry to assess the viability of potential projects and inform policy. The research proposes a method for achieving this by enhancing the traditional discounting LEC measure with fuzzy set theory. Furthermore, the research develops the fuzzy LEC (F-LEC) methodology to incorporate the cost of financing a project from debt and equity sources. Applied to an example bioenergy project, the research demonstrates the benefit of incorporating fuzziness for project viability, optimal capital structure and key variable sensitivity analysis decision-making. The proposed method contributes by incorporating uncertain and approximate information to the widely utilised LEC measure and by being applicable to a wide range of energy project viability decisions. © 2013 Elsevier Ltd. All rights reserved.