947 resultados para naive bayes classifier


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this demo the basic text mining technologies by using RapidMining have been reviewed. RapidMining basic characteristics and operators of text mining have been described. Text mining example by using Navie Bayes algorithm and process modeling have been revealed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A certain type of bacterial inclusion, known as a bacterial microcompartment, was recently identified and imaged through cryo-electron tomography. A reconstructed 3D object from single-axis limited angle tilt-series cryo-electron tomography contains missing regions and this problem is known as the missing wedge problem. Due to missing regions on the reconstructed images, analyzing their 3D structures is a challenging problem. The existing methods overcome this problem by aligning and averaging several similar shaped objects. These schemes work well if the objects are symmetric and several objects with almost similar shapes and sizes are available. Since the bacterial inclusions studied here are not symmetric, are deformed, and show a wide range of shapes and sizes, the existing approaches are not appropriate. This research develops new statistical methods for analyzing geometric properties, such as volume, symmetry, aspect ratio, polyhedral structures etc., of these bacterial inclusions in presence of missing data. These methods work with deformed and non-symmetric varied shaped objects and do not necessitate multiple objects for handling the missing wedge problem. The developed methods and contributions include: (a) an improved method for manual image segmentation, (b) a new approach to 'complete' the segmented and reconstructed incomplete 3D images, (c) a polyhedral structural distance model to predict the polyhedral shapes of these microstructures, (d) a new shape descriptor for polyhedral shapes, named as polyhedron profile statistic, and (e) the Bayes classifier, linear discriminant analysis and support vector machine based classifiers for supervised incomplete polyhedral shape classification. Finally, the predicted 3D shapes for these bacterial microstructures belong to the Johnson solids family, and these shapes along with their other geometric properties are important for better understanding of their chemical and biological characteristics.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hebb proposed that synapses between neurons that fire synchronously are strengthened, forming cell assemblies and phase sequences. The former, on a shorter scale, are ensembles of synchronized cells that function transiently as a closed processing system; the latter, on a larger scale, correspond to the sequential activation of cell assemblies able to represent percepts and behaviors. Nowadays, the recording of large neuronal populations allows for the detection of multiple cell assemblies. Within Hebb's theory, the next logical step is the analysis of phase sequences. Here we detected phase sequences as consecutive assembly activation patterns, and then analyzed their graph attributes in relation to behavior. We investigated action potentials recorded from the adult rat hippocampus and neocortex before, during and after novel object exploration (experimental periods). Within assembly graphs, each assembly corresponded to a node, and each edge corresponded to the temporal sequence of consecutive node activations. The sum of all assembly activations was proportional to firing rates, but the activity of individual assemblies was not. Assembly repertoire was stable across experimental periods, suggesting that novel experience does not create new assemblies in the adult rat. Assembly graph attributes, on the other hand, varied significantly across behavioral states and experimental periods, and were separable enough to correctly classify experimental periods (Naïve Bayes classifier; maximum AUROCs ranging from 0.55 to 0.99) and behavioral states (waking, slow wave sleep, and rapid eye movement sleep; maximum AUROCs ranging from 0.64 to 0.98). Our findings agree with Hebb's view that assemblies correspond to primitive building blocks of representation, nearly unchanged in the adult, while phase sequences are labile across behavioral states and change after novel experience. The results are compatible with a role for phase sequences in behavior and cognition.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hebb proposed that synapses between neurons that fire synchronously are strengthened, forming cell assemblies and phase sequences. The former, on a shorter scale, are ensembles of synchronized cells that function transiently as a closed processing system; the latter, on a larger scale, correspond to the sequential activation of cell assemblies able to represent percepts and behaviors. Nowadays, the recording of large neuronal populations allows for the detection of multiple cell assemblies. Within Hebb's theory, the next logical step is the analysis of phase sequences. Here we detected phase sequences as consecutive assembly activation patterns, and then analyzed their graph attributes in relation to behavior. We investigated action potentials recorded from the adult rat hippocampus and neocortex before, during and after novel object exploration (experimental periods). Within assembly graphs, each assembly corresponded to a node, and each edge corresponded to the temporal sequence of consecutive node activations. The sum of all assembly activations was proportional to firing rates, but the activity of individual assemblies was not. Assembly repertoire was stable across experimental periods, suggesting that novel experience does not create new assemblies in the adult rat. Assembly graph attributes, on the other hand, varied significantly across behavioral states and experimental periods, and were separable enough to correctly classify experimental periods (Naïve Bayes classifier; maximum AUROCs ranging from 0.55 to 0.99) and behavioral states (waking, slow wave sleep, and rapid eye movement sleep; maximum AUROCs ranging from 0.64 to 0.98). Our findings agree with Hebb's view that assemblies correspond to primitive building blocks of representation, nearly unchanged in the adult, while phase sequences are labile across behavioral states and change after novel experience. The results are compatible with a role for phase sequences in behavior and cognition.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El análisis de datos actual se enfrenta a problemas derivados de la combinación de datos procedentes de diversas fuentes de información. El valor de la información puede enriquecerse enormemente facilitando la integración de nuevas fuentes de datos y la industria es muy consciente de ello en la actualidad. Sin embargo, no solo el volumen sino también la gran diversidad de los datos constituye un problema previo al análisis. Una buena integración de los datos garantiza unos resultados fiables y por ello merece la pena detenerse en la mejora de procesos de especificación, recolección, limpieza e integración de los datos. Este trabajo está dedicado a la fase de limpieza e integración de datos analizando los procedimientos existentes y proponiendo una solución que se aplica a datos médicos, centrándose así en los proyectos de predicción (con finalidad de prevención) en ciencias de la salud. Además de la implementación de los procesos de limpieza, se desarrollan algoritmos de detección de outliers que permiten mejorar la calidad del conjunto de datos tras su eliminación. El trabajo también incluye la implementación de un proceso de predicción que sirva de ayuda a la toma de decisiones. Concretamente este trabajo realiza un análisis predictivo de los datos de pacientes drogodependientes de la Clínica Nuestra Señora de la Paz, con la finalidad de poder brindar un apoyo en la toma de decisiones del médico a cargo de admitir el internamiento de pacientes en dicha clínica. En la mayoría de los casos el estudio de los datos facilitados requiere un pre-procesado adecuado para que los resultados de los análisis estadísticos tradicionales sean fiables. En tal sentido en este trabajo se implementan varias formas de detectar los outliers: un algoritmo propio (Detección de Outliers con Cadenas No Monótonas), que utiliza las ventajas del algoritmo Knuth-Morris-Pratt para reconocimiento de patrones, y las librerías outliers y Rcmdr de R. La aplicación de procedimientos de cleaning e integración de datos, así como de eliminación de datos atípicos proporciona una base de datos limpia y fiable sobre la que se implementarán procedimientos de predicción de los datos con el algoritmo de clasificación Naive Bayes en R.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Security defects are common in large software systems because of their size and complexity. Although efficient development processes, testing, and maintenance policies are applied to software systems, there are still a large number of vulnerabilities that can remain, despite these measures. Some vulnerabilities stay in a system from one release to the next one because they cannot be easily reproduced through testing. These vulnerabilities endanger the security of the systems. We propose vulnerability classification and prediction frameworks based on vulnerability reproducibility. The frameworks are effective to identify the types and locations of vulnerabilities in the earlier stage, and improve the security of software in the next versions (referred to as releases). We expand an existing concept of software bug classification to vulnerability classification (easily reproducible and hard to reproduce) to develop a classification framework for differentiating between these vulnerabilities based on code fixes and textual reports. We then investigate the potential correlations between the vulnerability categories and the classical software metrics and some other runtime environmental factors of reproducibility to develop a vulnerability prediction framework. The classification and prediction frameworks help developers adopt corresponding mitigation or elimination actions and develop appropriate test cases. Also, the vulnerability prediction framework is of great help for security experts focus their effort on the top-ranked vulnerability-prone files. As a result, the frameworks decrease the number of attacks that exploit security vulnerabilities in the next versions of the software. To build the classification and prediction frameworks, different machine learning techniques (C4.5 Decision Tree, Random Forest, Logistic Regression, and Naive Bayes) are employed. The effectiveness of the proposed frameworks is assessed based on collected software security defects of Mozilla Firefox.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this work, we explore and demonstrate the potential for modeling and classification using quantile-based distributions, which are random variables defined by their quantile function. In the first part we formalize a least squares estimation framework for the class of linear quantile functions, leading to unbiased and asymptotically normal estimators. Among the distributions with a linear quantile function, we focus on the flattened generalized logistic distribution (fgld), which offers a wide range of distributional shapes. A novel naïve-Bayes classifier is proposed that utilizes the fgld estimated via least squares, and through simulations and applications, we demonstrate its competitiveness against state-of-the-art alternatives. In the second part we consider the Bayesian estimation of quantile-based distributions. We introduce a factor model with independent latent variables, which are distributed according to the fgld. Similar to the independent factor analysis model, this approach accommodates flexible factor distributions while using fewer parameters. The model is presented within a Bayesian framework, an MCMC algorithm for its estimation is developed, and its effectiveness is illustrated with data coming from the European Social Survey. The third part focuses on depth functions, which extend the concept of quantiles to multivariate data by imposing a center-outward ordering in the multivariate space. We investigate the recently introduced integrated rank-weighted (IRW) depth function, which is based on the distribution of random spherical projections of the multivariate data. This depth function proves to be computationally efficient and to increase its flexibility we propose different methods to explicitly model the projected univariate distributions. Its usefulness is shown in classification tasks: the maximum depth classifier based on the IRW depth is proven to be asymptotically optimal under certain conditions, and classifiers based on the IRW depth are shown to perform well in simulated and real data experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Optimum-Path Forest (OPF) classifier is a recent and promising method for pattern recognition, with a fast training algorithm and good accuracy results. Therefore, the investigation of a combining method for this kind of classifier can be important for many applications. In this paper we report a fast method to combine OPF-based classifiers trained with disjoint training subsets. Given a fixed number of subsets, the algorithm chooses random samples, without replacement, from the original training set. Each subset accuracy is improved by a learning procedure. The final decision is given by majority vote. Experiments with simulated and real data sets showed that the proposed combining method is more efficient and effective than naive approach provided some conditions. It was also showed that OPF training step runs faster for a series of small subsets than for the whole training set. The combining scheme was also designed to support parallel or distributed processing, speeding up the procedure even more. © 2011 Springer-Verlag.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multi-label classification (MLC) is the supervised learning problem where an instance may be associated with multiple labels. Modeling dependencies between labels allows MLC methods to improve their performance at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies. On the one hand, the original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors down the chain. On the other hand, a recent Bayes-optimal method improves the performance, but is computationally intractable in practice. Here we present a novel double-Monte Carlo scheme (M2CC), both for finding a good chain sequence and performing efficient inference. The M2CC algorithm remains tractable for high-dimensional data sets and obtains the best overall accuracy, as shown on several real data sets with input dimension as high as 1449 and up to 103 labels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The presence of mutations associated with integrase inhibitor (INI) resistance among INI-naive patients may play an important clinical role in the use of those drugs Samples from 76 HIV-1-infected subjects naive to INIs were submitted to direct sequencing. No differences were found between naive (25%) subjects and subjects on HAART (75%). No primary mutation associated with raltegravir or elvitegravir resistance was found. However, 78% of sequences showed at least one accessory mutation associated with resistance. The analysis of the 76 IN sequences showed a high polymorphic level on this region among Brazilian HIV-1-infected subjects, including a high prevalence of aa substitutions related to INI resistance. The impact of these findings remains unclear and further studies are necessary to address these questions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objectives: Adults with major depressive disorder (MDD) are reported to have reduced orbitofrontal cortex (OFC) volumes, which could be related to decreased neuronal density. We conducted a study on medication naive children with MDD to determine whether abnormalities of OFC are present early in the illness course. Methods: Twenty seven medication naive pediatric Diagnostic and Statistical Manual of Mental Disorders, 4(th) edition (DSM-IV) MDD patients (mean age +/- SD = 14.4 +/- 2.2 years; 10 males) and 26 healthy controls (mean age +/- SD = 14.4 +/- 2.4 years; 12 males) underwent a 1.5T magnetic resonance imaging (MRI) with 3D spoiled gradient recalled acquisition. The OFC volumes were compared using analysis of covariance with age, gender, and total brain volume as covariates. Results: There was no significant difference in either total OFC volume or total gray matter OFC volume between MDD patients and healthy controls. Exploratory analysis revealed that patients had unexpectedly larger total right lateral (F = 4.2, df = 1, 48, p = 0.05) and right lateral gray matter (F = 4.6, df = 1, 48, p = 0.04) OFC volumes compared to healthy controls, but this finding was not significant following statistical correction for multiple comparisons. No other OFC subregions showed a significant difference. Conclusions: The lack of OFC volume abnormalities in pediatric MDD patients suggests the abnormalities previously reported for adults may develop later in life as a result of neural cell loss.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: The striatum, including the putamen and caudate, plays an important role in executive and emotional processing and may be involved in the pathophysiology of mood disorders. Few studies have examined structural abnormalities of the striatum in pediatric major depressive disorder (MDD) patients. We report striatal volume abnormalities in medication-naive pediatric MDD compared to healthy comparison subjects. Method: Twenty seven medication-naive pediatric Diagnostic and Statistical Manual of Mental Disorders, 4(th) edition (DSM-IV) MDD and 26 healthy comparison subjects underwent volumetric magnetic resonance imaging (MRI). The putamen and caudate volumes were traced manually by a blinded rater, and the patient and control groups were compared using analysis of covariance adjusting for age, sex, intelligence quotient, and total brain volumes. Results: MDD patients had significantly smaller right striatum (6.0% smaller) and right caudate volumes (7.4% smaller) compared to the healthy subjects. Left caudate volumes were inversely correlated with severity of depression in MDD subjects. Age was inversely correlated with left and right putamen volumes in MDD patients but not in the healthy subjects. Conclusions: These findings provide fresh evidence for abnormalities in the striatum of medication-naive pediatric MDD patients and suggest the possible involvement of the striatum in the pathophysiology of MDD.