996 resultados para classifier design
Resumo:
The design of binary morphological operators that are translation-invariant and locally defined by a finite neighborhood window corresponds to the problem of designing Boolean functions. As in any supervised classification problem, morphological operators designed from a training sample also suffer from overfitting. Large neighborhood tends to lead to performance degradation of the designed operator. This work proposes a multilevel design approach to deal with the issue of designing large neighborhood-based operators. The main idea is inspired by stacked generalization (a multilevel classifier design approach) and consists of, at each training level, combining the outcomes of the previous level operators. The final operator is a multilevel operator that ultimately depends on a larger neighborhood than of the individual operators that have been combined. Experimental results show that two-level operators obtained by combining operators designed on subwindows of a large window consistently outperform the single-level operators designed on the full window. They also show that iterating two-level operators is an effective multilevel approach to obtain better results.
Resumo:
Analyzing “nuggety” gold samples commonly produces erratic fire assay results, due to random inclusion or exclusion of coarse gold in analytical samples. Preconcentrating gold samples might allow the nuggets to be concentrated and fire assayed separately. In this investigation synthetic gold samples were made using similar density tungsten powder and silica, and were preconcentrated using two approaches: an air jig and an air classifier. Current analytical gold sampling method is time and labor intensive and our aim is to design a set-up for rapid testing. It was observed that the preliminary air classifier design showed more promise than the air jig in terms of control over mineral recovery and preconcentrating bulk ore sub-samples. Hence the air classifier was modified with the goal of producing 10-30 grams samples aiming to capture all of the high density metallic particles, tungsten in this case. Effects of air velocity and feed rate on the recovery of tungsten from synthetic tungsten-silica mixtures were studied. The air classifier achieved optimal high density metal recovery of 97.7% at an air velocity of 0.72 m/s and feed rate of 160 g/min. Effects of density on classification were investigated by using iron as the dense metal instead of tungsten and the recovery was seen to drop from 96.13% to 20.82%. Preliminary investigations suggest that preconcentration of gold samples is feasible using the laboratory designed air classifier.
Resumo:
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
Resumo:
This paper presents a composite multi-layer classifier system for predicting the subcellular localization of proteins based on their amino acid sequence. The work is an extension of our previous predictor PProwler v1.1 which is itself built upon the series of predictors SignalP and TargetP. In this study we outline experiments conducted to improve the classifier design. The major improvement came from using Support Vector machines as a "smart gate" sorting the outputs of several different targeting peptide detection networks. Our final model (PProwler v1.2) gives MCC values of 0.873 for non-plant and 0.849 for plant proteins. The model improves upon the accuracy of our previous subcellular localization predictor (PProwler v1.1) by 2% for plant data (which represents 7.5% improvement upon TargetP).
Resumo:
Usually, data mining projects that are based on decision trees for classifying test cases will use the probabilities provided by these decision trees for ranking classified test cases. We have a need for a better method for ranking test cases that have already been classified by a binary decision tree because these probabilities are not always accurate and reliable enough. A reason for this is that the probability estimates computed by existing decision tree algorithms are always the same for all the different cases in a particular leaf of the decision tree. This is only one reason why the probability estimates given by decision tree algorithms can not be used as an accurate means of deciding if a test case has been correctly classified. Isabelle Alvarez has proposed a new method that could be used to rank the test cases that were classified by a binary decision tree [Alvarez, 2004]. In this paper we will give the results of a comparison of different ranking methods that are based on the probability estimate, the sensitivity of a particular case or both.
Resumo:
We have discovered a novel approach of intrusion detection system using an intelligent data classifier based on a self organizing map (SOM). We have surveyed all other unsupervised intrusion detection methods, different alternative SOM based techniques and KDD winner IDS methods. This paper provides a robust designed and implemented intelligent data classifier technique based on a single large size (30x30) self organizing map (SOM) having the capability to detect all types of attacks given in the DARPA Archive 1999 the lowest false positive rate being 0.04 % and higher detection rate being 99.73% tested using full KDD data sets and 89.54% comparable detection rate and 0.18% lowest false positive rate tested using corrected data sets.
Resumo:
In this paper, a computer-aided diagnostic (CAD) system for the classification of hepatic lesions from computed tomography (CT) images is presented. Regions of interest (ROIs) taken from nonenhanced CT images of normal liver, hepatic cysts, hemangiomas, and hepatocellular carcinomas have been used as input to the system. The proposed system consists of two modules: the feature extraction and the classification modules. The feature extraction module calculates the average gray level and 48 texture characteristics, which are derived from the spatial gray-level co-occurrence matrices, obtained from the ROIs. The classifier module consists of three sequentially placed feed-forward neural networks (NNs). The first NN classifies into normal or pathological liver regions. The pathological liver regions are characterized by the second NN as cyst or "other disease." The third NN classifies "other disease" into hemangioma or hepatocellular carcinoma. Three feature selection techniques have been applied to each individual NN: the sequential forward selection, the sequential floating forward selection, and a genetic algorithm for feature selection. The comparative study of the above dimensionality reduction methods shows that genetic algorithms result in lower dimension feature vectors and improved classification performance.
Resumo:
A common way to model multiclass classification problems is by means of Error-Correcting Output Codes (ECOCs). Given a multiclass problem, the ECOC technique designs a code word for each class, where each position of the code identifies the membership of the class for a given binary problem. A classification decision is obtained by assigning the label of the class with the closest code. One of the main requirements of the ECOC design is that the base classifier is capable of splitting each subgroup of classes from each binary problem. However, we cannot guarantee that a linear classifier model convex regions. Furthermore, nonlinear classifiers also fail to manage some type of surfaces. In this paper, we present a novel strategy to model multiclass classification problems using subclass information in the ECOC framework. Complex problems are solved by splitting the original set of classes into subclasses and embedding the binary problems in a problem-dependent ECOC design. Experimental results show that the proposed splitting procedure yields a better performance when the class overlap or the distribution of the training objects conceal the decision boundaries for the base classifier. The results are even more significant when one has a sufficiently large training size.
Resumo:
Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.
Resumo:
Hybrid bioisoster derivatives from N-acylhydrazones and furoxan groups were designed with the objective of obtaining at least a dual mechanism of action: cruzain inhibition and nitric oxide (NO) releasing activity. Fifteen designed compounds were synthesized varying the substitution in N-acylhydrazone and in furoxan group as well. They had its anti-Trypanosoma cruzi activity in amastigotes forms, NO releasing potential and inhibitory cruzain activity evaluated. The two most active compounds (6, 14) both in the parasite amastigotes and in the enzyme contain the nitro group in para position of the aromatic ring. The permeability screening in Caco-2 cell and cytotoxicity assay in human cells were performed for those most active compounds and both showed to be less cytotoxic than the reference drug, benznidazole. Compound 6 was the most promising, since besides activity it showed good permeability and selectivity index, higher than the reference drug. Thereby the compound 6 was considered as a possible candidate for additional studies.
Resumo:
Split-plot design (SPD) and near-infrared chemical imaging were used to study the homogeneity of the drug paracetamol loaded in films and prepared from mixtures of the biocompatible polymers hydroxypropyl methylcellulose, polyvinylpyrrolidone, and polyethyleneglycol. The study was split into two parts: a partial least-squares (PLS) model was developed for a pixel-to-pixel quantification of the drug loaded into films. Afterwards, a SPD was developed to study the influence of the polymeric composition of films and the two process conditions related to their preparation (percentage of the drug in the formulations and curing temperature) on the homogeneity of the drug dispersed in the polymeric matrix. Chemical images of each formulation of the SPD were obtained by pixel-to-pixel predictions of the drug using the PLS model of the first part, and macropixel analyses were performed for each image to obtain the y-responses (homogeneity parameter). The design was modeled using PLS regression, allowing only the most relevant factors to remain in the final model. The interpretation of the SPD was enhanced by utilizing the orthogonal PLS algorithm, where the y-orthogonal variations in the design were separated from the y-correlated variation.
Resumo:
In Brazil, the consumption of extra-virgin olive oil (EVOO) is increasing annually, but there are no experimental studies concerning the phenolic compound contents of commercial EVOO. The aim of this work was to optimise the separation of 17 phenolic compounds already detected in EVOO. A Doehlert matrix experimental design was used, evaluating the effects of pH and electrolyte concentration. Resolution, runtime and migration time relative standard deviation values were evaluated. Derringer's desirability function was used to simultaneously optimise all 37 responses. The 17 peaks were separated in 19min using a fused-silica capillary (50μm internal diameter, 72cm of effective length) with an extended light path and 101.3mmolL(-1) of boric acid electrolyte (pH 9.15, 30kV). The method was validated and applied to 15 EVOO samples found in Brazilian supermarkets.
Resumo:
Herein we describe the synthesis of a focused library of compounds based on the structure of goniothalamin (1) and the evaluation of the potential antitumor activity of the compounds. N-Acylation of aza-goniothalamin (2) restored the in vitro antiproliferative activity of this family of compounds. 1-(E)-But-2-enoyl-6-styryl-5,6-dihydropyridin-2(1H)-one (18) displayed enhanced antiproliferative activity. Both goniothalamin (1) and derivative 18 led to reactive oxygen species generation in PC-3 cells, which was probably a signal for caspase-dependent apoptosis. Treatment with derivative 18 promoted Annexin V/7-aminoactinomycin D double staining, which indicated apoptosis, and also led to G2 /M cell-cycle arrest. In vivo studies in Ehrlich ascitic and solid tumor models confirmed the antitumor activity of goniothalamin (1), without signs of toxicity. However, derivative 18 exhibited an unexpectedly lower in vivo antitumor activity, despite the treatments being administered at the same site of inoculation. Contrary to its in vitro profile, aza-goniothalamin (2) inhibited Ehrlich tumor growth, both on the ascitic and solid forms. Our findings highlight the importance of in vivo studies in the search for new candidates for cancer treatment.
Resumo:
An HPLC-PAD method using a gold working electrode and a triple-potential waveform was developed for the simultaneous determination of streptomycin and dihydrostreptomycin in veterinary drugs. Glucose was used as the internal standard, and the triple-potential waveform was optimized using a factorial and a central composite design. The optimum potentials were as follows: amperometric detection, E1=-0.15V; cleaning potential, E2=+0.85V; and reactivation of the electrode surface, E3=-0.65V. For the separation of the aminoglycosides and the internal standard of glucose, a CarboPac™ PA1 anion exchange column was used together with a mobile phase consisting of a 0.070 mol L(-1) sodium hydroxide solution in the isocratic elution mode with a flow rate of 0.8 mL min(-1). The method was validated and applied to the determination of streptomycin and dihydrostreptomycin in veterinary formulations (injection, suspension and ointment) without any previous sample pretreatment, except for the ointments, for which a liquid-liquid extraction was required before HPLC-PAD analysis. The method showed adequate selectivity, with an accuracy of 98-107% and a precision of less than 3.9%.