930 resultados para Classification algorithms


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diffuse large B-cell lymphoma can be subclassified into at least two molecular subgroups by gene expression profiling: germinal center B-cell like and activated B-cell like diffuse large B-cell lymphoma. Several immunohistological algorithms have been proposed as surrogates to gene expression profiling at the level of protein expression, but their reliability has been an issue of controversy. Furthermore, the proportion of misclassified cases of germinal center B-cell subgroup by immunohistochemistry, in all reported algorithms, is higher compared with germinal center B-cell cases defined by gene expression profiling. We analyzed 424 cases of nodal diffuse large B-cell lymphoma with the panel of markers included in the three previously described algorithms: Hans, Choi, and Tally. To test whether the sensitivity of detecting germinal center B-cell cases could be improved, the germinal center B-cell marker HGAL/GCET2 was also added to all three algorithms. Our results show that the inclusion of HGAL/GCET2 significantly increased the detection of germinal center B-cell cases in all three algorithms (P<0.001). The proportions of germinal center B-cell cases in the original algorithms were 27%, 34%, and 19% for Hans, Choi, and Tally, respectively. In the modified algorithms, with the inclusion of HGAL/GCET2, the frequencies of germinal center B-cell cases were increased to 38%, 48%, and 35%, respectively. Therefore, HGAL/GCET2 protein expression may function as a marker for germinal center B-cell type diffuse large B-cell lymphoma. Consideration should be given to the inclusion of HGAL/GCET2 analysis in algorithms to better predict the cell of origin. These findings bear further validation, from comparison to gene expression profiles and from clinical/therapeutic data. Modern Pathology (2012) 25, 1439-1445; doi: 10.1038/modpathol.2012.119; published online 29 June 2012

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we made the first steps towards the systematic application of a methodology for automatically building formal models of complex biological systems. Such a methodology could be useful also to design artificial systems possessing desirable properties such as robustness and evolvability. The approach we follow in this thesis is to manipulate formal models by means of adaptive search methods called metaheuristics. In the first part of the thesis we develop state-of-the-art hybrid metaheuristic algorithms to tackle two important problems in genomics, namely, the Haplotype Inference by parsimony and the Founder Sequence Reconstruction Problem. We compare our algorithms with other effective techniques in the literature, we show strength and limitations of our approaches to various problem formulations and, finally, we propose further enhancements that could possibly improve the performance of our algorithms and widen their applicability. In the second part, we concentrate on Boolean network (BN) models of gene regulatory networks (GRNs). We detail our automatic design methodology and apply it to four use cases which correspond to different design criteria and address some limitations of GRN modeling by BNs. Finally, we tackle the Density Classification Problem with the aim of showing the learning capabilities of BNs. Experimental evaluation of this methodology shows its efficacy in producing network that meet our design criteria. Our results, coherently to what has been found in other works, also suggest that networks manipulated by a search process exhibit a mixture of characteristics typical of different dynamical regimes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In questionable cystic fibrosis (CF), mild or monosymptomatic phenotypes frequently cause diagnostic difficulties despite detailed algorithms. CF transmembrane conductance regulator (CFTR)-mediated ion transport can be studied ex vivo in rectal biopsies by intestinal current measurement (ICM).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In clinical diagnostics, it is of outmost importance to correctly identify the source of a metastatic tumor, especially if no apparent primary tumor is present. Tissue-based proteomics might allow correct tumor classification. As a result, we performed MALDI imaging to generate proteomic signatures for different tumors. These signatures were used to classify common cancer types. At first, a cohort comprised of tissue samples from six adenocarcinoma entities located at different organ sites (esophagus, breast, colon, liver, stomach, thyroid gland, n = 171) was classified using two algorithms for a training and test set. For the test set, Support Vector Machine and Random Forest yielded overall accuracies of 82.74 and 81.18%, respectively. Then, colon cancer liver metastasis samples (n = 19) were introduced into the classification. The liver metastasis samples could be discriminated with high accuracy from primary tumors of colon cancer and hepatocellular carcinoma. Additionally, colon cancer liver metastasis samples could be successfully classified by using colon cancer primary tumor samples for the training of the classifier. These findings demonstrate that MALDI imaging-derived proteomic classifiers can discriminate between different tumor types at different organ sites and in the same site.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Eosinophils and their products play an essential role in the pathogenesis of various reactive and neoplastic disorders. Depending on the underlying disease, molecular defect and involved cytokines, hypereosinophilia may develop and may lead to organ damage. In other patients, persistent eosinophilia is accompanied by typical clinical findings, but the causative role and impact of eosinophilia remain uncertain. For patients with eosinophil-mediated organ pathology, early therapeutic intervention with agents reducing eosinophil counts can be effective in limiting or preventing irreversible organ damage. Therefore, it is important to approach eosinophil disorders and related syndromes early by using established criteria, to perform all appropriate staging investigations, and to search for molecular targets of therapy. In this article, we review current concepts in the pathogenesis and evolution of eosinophilia and eosinophil-related organ damage in neoplastic and non-neoplastic conditions. In addition, we discuss classifications of eosinophil disorders and related syndromes as well as diagnostic algorithms and standard treatment for various eosinophil-related disorders.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ABSTRACT: BACKGROUND: Pelvic x-ray is a routine part of the primary survey of polytraumatized patients according to Advanced Trauma Life Support (ATLS) guidelines. However, pelvic CT is the gold standard imaging technique in the diagnosis of pelvic fractures. This study was conducted to confirm the safety of a modified ATLS algorithm omitting pelvic x-ray in hemodynamically stable polytraumatized patients with clinically stable pelvis in favour of later pelvic examination by CT scan. METHODS: We conducted a retrospective analysis of all polytraumatized patients in our emergency room between 01.07.2004 and 31.01.2006. Inclusion criteria were blunt abdominal trauma, initial hemodynamic stability and a stable pelvis on clinical examination. We excluded patients requiring immediate intervention because of hemodynamic instability. RESULTS: We reviewed the records of n = 452 polytraumatized patients, of which n = 91 fulfilled inclusion criteria (56% male, mean age = 45 years). The mechanism of trauma included 43% road traffic accidents, 47% falls. In 68/91 (75%) patients, both a pelvic x-ray and a CT examination were performed; the remainder had only pelvic CT. In 6/68 (9%) patients, pelvic fracture was diagnosed by pelvic x-ray. None of these 6 patients was found having a false positive pelvic x-ray, i.e. there was no fracture on pelvic CT scan. In 3/68 (4%) cases a fracture was missed in the pelvic x-ray, but confirmed on CT (false negative on x-ray). None of the diagnosed fractures needed an immediate therapeutic intervention. 5 (56%) were classified type A fractures, and another 4 (44%) B 2.1 in computed tomography (AO classification). One A 2.1 fracture was found in a clinically stable patient who only received CT scan (1/23). CONCLUSION: While pelvic x-ray is an integral part of ATLS assessment, this retrospective study suggests that in hemodynamically stable patients with clinically stable pevis, its sensitivity is only 67% and it may safely be omitted in favor of a pelvic CT examination if such is planned in adjunct assessment and available. The results support the safety and utility of our modified ATLS algorithm. A randomized controlled trial using the algorithm can safely be conducted to confirm the results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper uses folksonomies and fuzzy clustering algorithms to establish term-relevant related results. This paper will propose a Meta search engine with the ability to search for vaguely associated terms and aggregate them into several meaningful cluster categories. The potential of the fuzzy weblog extraction is illustrated using a simple example and added value and possible future studies are discussed in the conclusion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: To develop and implement a method for improved cerebellar tissue classification on the MRI of brain by automatically isolating the cerebellum prior to segmentation. MATERIALS AND METHODS: Dual fast spin echo (FSE) and fluid attenuation inversion recovery (FLAIR) images were acquired on 18 normal volunteers on a 3 T Philips scanner. The cerebellum was isolated from the rest of the brain using a symmetric inverse consistent nonlinear registration of individual brain with the parcellated template. The cerebellum was then separated by masking the anatomical image with individual FLAIR images. Tissues in both the cerebellum and rest of the brain were separately classified using hidden Markov random field (HMRF), a parametric method, and then combined to obtain tissue classification of the whole brain. The proposed method for tissue classification on real MR brain images was evaluated subjectively by two experts. The segmentation results on Brainweb images with varying noise and intensity nonuniformity levels were quantitatively compared with the ground truth by computing the Dice similarity indices. RESULTS: The proposed method significantly improved the cerebellar tissue classification on all normal volunteers included in this study without compromising the classification in remaining part of the brain. The average similarity indices for gray matter (GM) and white matter (WM) in the cerebellum are 89.81 (+/-2.34) and 93.04 (+/-2.41), demonstrating excellent performance of the proposed methodology. CONCLUSION: The proposed method significantly improved tissue classification in the cerebellum. The GM was overestimated when segmentation was performed on the whole brain as a single object.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Well-known data mining algorithms rely on inputs in the form of pairwise similarities between objects. For large datasets it is computationally impossible to perform all pairwise comparisons. We therefore propose a novel approach that uses approximate Principal Component Analysis to efficiently identify groups of similar objects. The effectiveness of the approach is demonstrated in the context of binary classification using the supervised normalized cut as a classifier. For large datasets from the UCI repository, the approach significantly improves run times with minimal loss in accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cervical cancer is the leading cause of death and disease from malignant neoplasms among women in developing countries. Even though the Pap smear has significantly decreased the number of deaths from cervical cancer in the past years, it has its limitations. Researchers have developed an automated screening machine which can potentially detect abnormal cases that are overlooked by conventional screening. The goal of quantitative cytology is to classify the patient's tissue sample based on quantitative measurements of the individual cells. It is also much cheaper and potentially can take less time. One of the major challenges of collecting cells with a cytobrush is the possibility of not sampling any existing dysplastic cells on the cervix. Being able to correctly classify patients who have disease without the presence of dysplastic cells could improve the accuracy of quantitative cytology algorithms. Subtle morphologic changes in normal-appearing tissues adjacent to or distant from malignant tumors have been shown to exist, but a comparison of various statistical methods, including many recent advances in the statistical learning field, has not previously been done. The objective of this thesis is to use different classification methods applied to quantitative cytology data for the detection of malignancy associated changes (MACs). In this thesis, Elastic Net is the best algorithm. When we applied the Elastic Net algorithm to the test set, we combined the training set and validation set as "training" set and used 5-fold cross validation to choose the parameter for Elastic Net. It has a sensitivity of 47% at 80% specificity, an AUC 0.52, and a partial AUC 0.10 (95% CI 0.09-0.11).^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic blood glucose classification may help specialists to provide a better interpretation of blood glucose data, downloaded directly from patients glucose meter and will contribute in the development of decision support systems for gestational diabetes. This paper presents an automatic blood glucose classifier for gestational diabetes that compares 6 different feature selection methods for two machine learning algorithms: neural networks and decision trees. Three searching algorithms, Greedy, Best First and Genetic, were combined with two different evaluators, CSF and Wrapper, for the feature selection. The study has been made with 6080 blood glucose measurements from 25 patients. Decision trees with a feature set selected with the Wrapper evaluator and the Best first search algorithm obtained the best accuracy: 95.92%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La familia de algoritmos de Boosting son un tipo de técnicas de clasificación y regresión que han demostrado ser muy eficaces en problemas de Visión Computacional. Tal es el caso de los problemas de detección, de seguimiento o bien de reconocimiento de caras, personas, objetos deformables y acciones. El primer y más popular algoritmo de Boosting, AdaBoost, fue concebido para problemas binarios. Desde entonces, muchas han sido las propuestas que han aparecido con objeto de trasladarlo a otros dominios más generales: multiclase, multilabel, con costes, etc. Nuestro interés se centra en extender AdaBoost al terreno de la clasificación multiclase, considerándolo como un primer paso para posteriores ampliaciones. En la presente tesis proponemos dos algoritmos de Boosting para problemas multiclase basados en nuevas derivaciones del concepto margen. El primero de ellos, PIBoost, está concebido para abordar el problema descomponiéndolo en subproblemas binarios. Por un lado, usamos una codificación vectorial para representar etiquetas y, por otro, utilizamos la función de pérdida exponencial multiclase para evaluar las respuestas. Esta codificación produce un conjunto de valores margen que conllevan un rango de penalizaciones en caso de fallo y recompensas en caso de acierto. La optimización iterativa del modelo genera un proceso de Boosting asimétrico cuyos costes dependen del número de etiquetas separadas por cada clasificador débil. De este modo nuestro algoritmo de Boosting tiene en cuenta el desbalanceo debido a las clases a la hora de construir el clasificador. El resultado es un método bien fundamentado que extiende de manera canónica al AdaBoost original. El segundo algoritmo propuesto, BAdaCost, está concebido para problemas multiclase dotados de una matriz de costes. Motivados por los escasos trabajos dedicados a generalizar AdaBoost al terreno multiclase con costes, hemos propuesto un nuevo concepto de margen que, a su vez, permite derivar una función de pérdida adecuada para evaluar costes. Consideramos nuestro algoritmo como la extensión más canónica de AdaBoost para este tipo de problemas, ya que generaliza a los algoritmos SAMME, Cost-Sensitive AdaBoost y PIBoost. Por otro lado, sugerimos un simple procedimiento para calcular matrices de coste adecuadas para mejorar el rendimiento de Boosting a la hora de abordar problemas estándar y problemas con datos desbalanceados. Una serie de experimentos nos sirven para demostrar la efectividad de ambos métodos frente a otros conocidos algoritmos de Boosting multiclase en sus respectivas áreas. En dichos experimentos se usan bases de datos de referencia en el área de Machine Learning, en primer lugar para minimizar errores y en segundo lugar para minimizar costes. Además, hemos podido aplicar BAdaCost con éxito a un proceso de segmentación, un caso particular de problema con datos desbalanceados. Concluimos justificando el horizonte de futuro que encierra el marco de trabajo que presentamos, tanto por su aplicabilidad como por su flexibilidad teórica. Abstract The family of Boosting algorithms represents a type of classification and regression approach that has shown to be very effective in Computer Vision problems. Such is the case of detection, tracking and recognition of faces, people, deformable objects and actions. The first and most popular algorithm, AdaBoost, was introduced in the context of binary classification. Since then, many works have been proposed to extend it to the more general multi-class, multi-label, costsensitive, etc... domains. Our interest is centered in extending AdaBoost to two problems in the multi-class field, considering it a first step for upcoming generalizations. In this dissertation we propose two Boosting algorithms for multi-class classification based on new generalizations of the concept of margin. The first of them, PIBoost, is conceived to tackle the multi-class problem by solving many binary sub-problems. We use a vectorial codification to represent class labels and a multi-class exponential loss function to evaluate classifier responses. This representation produces a set of margin values that provide a range of penalties for failures and rewards for successes. The stagewise optimization of this model introduces an asymmetric Boosting procedure whose costs depend on the number of classes separated by each weak-learner. In this way the Boosting procedure takes into account class imbalances when building the ensemble. The resulting algorithm is a well grounded method that canonically extends the original AdaBoost. The second algorithm proposed, BAdaCost, is conceived for multi-class problems endowed with a cost matrix. Motivated by the few cost-sensitive extensions of AdaBoost to the multi-class field, we propose a new margin that, in turn, yields a new loss function appropriate for evaluating costs. Since BAdaCost generalizes SAMME, Cost-Sensitive AdaBoost and PIBoost algorithms, we consider our algorithm as a canonical extension of AdaBoost to this kind of problems. We additionally suggest a simple procedure to compute cost matrices that improve the performance of Boosting in standard and unbalanced problems. A set of experiments is carried out to demonstrate the effectiveness of both methods against other relevant Boosting algorithms in their respective areas. In the experiments we resort to benchmark data sets used in the Machine Learning community, firstly for minimizing classification errors and secondly for minimizing costs. In addition, we successfully applied BAdaCost to a segmentation task, a particular problem in presence of imbalanced data. We conclude the thesis justifying the horizon of future improvements encompassed in our framework, due to its applicability and theoretical flexibility.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Stream-mining approach is defined as a set of cutting-edge techniques designed to process streams of data in real time, in order to extract knowledge. In the particular case of classification, stream-mining has to adapt its behaviour to the volatile underlying data distributions, what has been called concept drift. Moreover, it is important to note that concept drift may lead to situations where predictive models become invalid and have therefore to be updated to represent the actual concepts that data poses. In this context, there is a specific type of concept drift, known as recurrent concept drift, where the concepts represented by data have already appeared in the past. In those cases the learning process could be saved or at least minimized by applying a previously trained model. This could be extremely useful in ubiquitous environments that are characterized by the existence of resource constrained devices. To deal with the aforementioned scenario, meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems (IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. By means of using meta-models as a recurrent drift detection mechanism, the ability to share concepts representations among different data mining processes is open. That kind of exchanges could improve the accuracy of the resultant local model as such model may benefit from patterns similar to the local concept that were observed in other scenarios, but not yet locally. This would also improve the efficiency of training instances used during the classification process, as long as the exchange of models would aid in the application of already trained recurrent models, that have been previously seen by any of the collaborative devices. Which it is to say that the scope of recurrence detection and representation is broaden. In fact the detection, representation and exchange of concept drift patterns would be extremely useful for the law enforcement activities fighting against cyber crime. Being the information exchange one of the main pillars of cooperation, national units would benefit from the experience and knowledge gained by third parties. Moreover, in the specific scope of critical infrastructures protection it is crucial to count with information exchange mechanisms, both from a strategical and technical scope. The exchange of concept drift detection schemes in cyber security environments would aid in the process of preventing, detecting and effectively responding to threads in cyber space. Furthermore, as a complement of meta-models, a mechanism to assess the similarity between classification models is also needed when dealing with recurrent concepts. In this context, when reusing a previously trained model a rough comparison between concepts is usually made, applying boolean logic. The introduction of fuzzy logic comparisons between models could lead to a better efficient reuse of previously seen concepts, by applying not just equal models, but also similar ones. This work faces the aforementioned open issues by means of: the MMPRec system, that integrates a meta-model mechanism and a fuzzy similarity function; a collaborative environment to share meta-models between different devices; a recurrent drift generator that allows to test the usefulness of recurrent drift systems, as it is the case of MMPRec. Moreover, this thesis presents an experimental validation of the proposed contributions using synthetic and real datasets.