986 resultados para Naïve Bayesian Classification


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of semialgebraic Lipschitz classification of quasihomogeneous polynomials on a Holder triangle is studied. For this problem, the ""moduli"" are described completely in certain combinatorial terms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Chagas disease is still a major public health problem in Latin America. Its causative agent, Trypanosoma cruzi, can be typed into three major groups, T. cruzi I, T. cruzi II and hybrids. These groups each have specific genetic characteristics and epidemiological distributions. Several highly virulent strains are found in the hybrid group; their origin is still a matter of debate. The null hypothesis is that the hybrids are of polyphyletic origin, evolving independently from various hybridization events. The alternative hypothesis is that all extant hybrid strains originated from a single hybridization event. We sequenced both alleles of genes encoding EF-1 alpha, actin and SSU rDNA of 26 T. cruzi strains and DHFR-TS and TR of 12 strains. This information was used for network genealogy analysis and Bayesian phylogenies. We found T. cruzi I and T. cruzi II to be monophyletic and that all hybrids had different combinations of T. cruzi I and T. cruzi II haplotypes plus hybrid-specific haplotypes. Bootstrap values (networks) and posterior probabilities (Bayesian phylogenies) of clades supporting the monophyly of hybrids were far below the 95% confidence interval, indicating that the hybrid group is polyphyletic. We hypothesize that T. cruzi I and T. cruzi II are two different species and that the hybrids are extant representatives of independent events of genome hybridization, which sporadically have sufficient fitness to impact on the epidemiology of Chagas disease.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Stream discharge-concentration relationships are indicators of terrestrial ecosystem function. Throughout the Amazon and Cerrado regions of Brazil rapid changes in land use and land cover may be altering these hydrochemical relationships. The current analysis focuses on factors controlling the discharge-calcium (Ca) concentration relationship since previous research in these regions has demonstrated both positive and negative slopes in linear log(10)discharge-log(10)Ca concentration regressions. The objective of the current study was to evaluate factors controlling stream discharge-Ca concentration relationships including year, season, stream order, vegetation cover, land use, and soil classification. It was hypothesized that land use and soil class are the most critical attributes controlling discharge-Ca concentration relationships. A multilevel, linear regression approach was utilized with data from 28 streams throughout Brazil. These streams come from three distinct regions and varied broadly in watershed size (< 1 to > 10(6) ha) and discharge (10(-5.7)-10(3.2) m(3) s(-1)). Linear regressions of log(10)Ca versus log(10)discharge in 13 streams have a preponderance of negative slopes with only two streams having significant positive slopes. An ANOVA decomposition suggests the effect of discharge on Ca concentration is large but variable. Vegetation cover, which incorporates aspects of land use, explains the largest proportion of the variance in the effect of discharge on Ca followed by season and year. In contrast, stream order, land use, and soil class explain most of the variation in stream Ca concentration. In the current data set, soil class, which is related to lithology, has an important effect on Ca concentration but land use, likely through its effect on runoff concentration and hydrology, has a greater effect on discharge-concentration relationships.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Quality control of toys for avoiding children exposure to potentially toxic elements is of utmost relevance and it is a common requirement in national and/or international norms for health and safety reasons. Laser-induced breakdown spectroscopy (LIBS) was recently evaluated at authors` laboratory for direct analysis of plastic toys and one of the main difficulties for the determination of Cd. Cr and Pb was the variety of mixtures and types of polymers. As most norms rely on migration (lixiviation) protocols, chemometric classification models from LIBS spectra were tested for sampling toys that present potential risk of Cd, Cr and Pb contamination. The classification models were generated from the emission spectra of 51 polymeric toys and by using Partial Least Squares - Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA) and K-Nearest Neighbor (KNN). The classification models and validations were carried out with 40 and 11 test samples, respectively. Best results were obtained when KNN was used, with corrected predictions varying from 95% for Cd to 100% for Cr and Pb. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Here, I investigate the use of Bayesian updating rules applied to modeling how social agents change their minds in the case of continuous opinion models. Given another agent statement about the continuous value of a variable, we will see that interesting dynamics emerge when an agent assigns a likelihood to that value that is a mixture of a Gaussian and a uniform distribution. This represents the idea that the other agent might have no idea about what is being talked about. The effect of updating only the first moments of the distribution will be studied, and we will see that this generates results similar to those of the bounded confidence models. On also updating the second moment, several different opinions always survive in the long run, as agents become more stubborn with time. However, depending on the probability of error and initial uncertainty, those opinions might be clustered around a central value.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditionally, chronotype classification is based on the Morningness-Eveningness Questionnaire (MEQ). It is implicit in the classification that intermediate individuals get intermediate scores to most of the MEQ questions. However, a small group of individuals has a different pattern of answers. In some questions, they answer as ""morning-types"" and in some others they answer as ""evening-types,"" resulting in an intermediate total score. ""Evening-type"" and ""Morning-type"" answers were set as A(1) and A(4), respectively. Intermediate answers were set as A(2) and A(3). The following algorithm was applied: Bimodality Index = (Sigma A(1) x Sigma A(4))(2) - (Sigma A(2) x Sigma A(3))(2). Neither-types that had positive bimodality scores were classified as bimodal. If our hypothesis is validated by objective data, an update of chronotype classification will be required. (Author correspondence: brunojm@ymail.com)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Motivation: Understanding the patterns of association between polymorphisms at different loci in a population ( linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging. Results: We present a more practical method to build GM that describe LD. The method is based on learning weighted Bayesian network structures from haplotype data, extracting equivalence structure classes and using them to model LD. The results obtained in public data from the HapMap database showed that the method is a promising tool for modeling LD. The associations represented by the learned models are correlated with the traditional measure of LD D`. The method was able to represent LD blocks found by standard tools. The granularity of the association blocks and the readability of the models can be controlled in the method. The results suggest that the causality information gained by our method can be useful to tell about the conservability of the genetic markers and to guide the selection of subset of representative markers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Oropharyngeal dysphagia is characterized by any alteration in swallowing dynamics which may lead to malnutrition and aspiration pneumonia. Early diagnosis is crucial for the prognosis of patients with dysphagia, and the best method for swallowing dynamics assessment is swallowing videofluoroscopy, an exam performed with X-rays. Because it exposes patients to radiation, videofluoroscopy should not be performed frequently nor should it be prolonged. This study presents a non-invasive method for the pre-diagnosis of dysphagia based on the analysis of the swallowing acoustics, where the discrete wavelet transform plays an important role to increase sensitivity and specificity in the identification of dysphagic patients. (C) 2008 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite modern weed control practices, weeds continue to be a threat to agricultural production. Considering the variability of weeds, a classification methodology for the risk of infestation in agricultural zones using fuzzy logic is proposed. The inputs for the classification are attributes extracted from estimated maps for weed seed production and weed coverage using kriging and map analysis and from the percentage of surface infested by grass weeds, in order to account for the presence of weed species with a high rate of development and proliferation. The output for the classification predicts the risk of infestation of regions of the field for the next crop. The risk classification methodology described in this paper integrates analysis techniques which may help to reduce costs and improve weed control practices. Results for the risk classification of the infestation in a maize crop field are presented. To illustrate the effectiveness of the proposed system, the risk of infestation over the entire field is checked against the yield loss map estimated by kriging and also with the average yield loss estimated from a hyperbolic model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The properties of recycled aggregate produced from mixed (masonry and concrete) construction and demolition (C&D) waste are highly variable, and this restricts the use of such aggregate in structural concrete production. The development of classification techniques capable of reducing this variability is instrumental for quality control purposes and the production of high quality C&D aggregate. This paper investigates how the classification of C&D mixed coarse aggregate according to porosity influences the mechanical performance of concrete. Concretes using a variety of C&D aggregate porosity classes and different water/cement ratios were produced and the mechanical properties measured. For concretes produced with constant volume fractions of water, cement, natural sand and coarse aggregate from recycled mixed C&D waste, the compressive strength and Young modulus are direct exponential functions of the aggregate porosity. Sink and float technique is a simple laboratory density separation tool that facilitates the separation of cement particles with lower porosity, a difficult task when done only by visual sorting. For this experiment, separation using a 2.2 kg/dmA(3) suspension produced recycled aggregate (porosity less than 17%) which yielded good performance in concrete production. Industrial gravity separators may lead to the production of high quality recycled aggregate from mixed C&D waste for structural concrete applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective To describe onset features, classification and treatment of juvenile dermatomyositis (JDM) and juvenile polymyositis (JPM) from a multicentre registry. Methods Inclusion criteria were onset age lower than 18 years and a diagnosis of any idiopathic inflammatory myopathy (IIM) by attending physician. Bohan & Peter (1975) criteria categorisation was established by a scoring algorithm to define JDM and JPM based oil clinical protocol data. Results Of the 189 cases included, 178 were classified as JDM, 9 as JPM (19.8: 1) and 2 did not fit the criteria; 6.9% had features of chronic arthritis and connective tissue disease overlap. Diagnosis classification agreement occurred in 66.1%. Medial? onset age was 7 years, median follow-up duration was 3.6 years. Malignancy was described in 2 (1.1%) cases. Muscle weakness occurred in 95.8%; heliotrope rash 83.5%; Gottron plaques 83.1%; 92% had at least one abnormal muscle enzyme result. Muscle biopsy performed in 74.6% was abnormal in 91.5% and electromyogram performed in 39.2% resulted abnormal in 93.2%. Logistic regression analysis was done in 66 cases with all parameters assessed and only aldolase resulted significant, as independent variable for definite JDM (OR=5.4, 95%CI 1.2-24.4, p=0.03). Regarding treatment, 97.9% received steroids; 72% had in addition at least one: methotrexate (75.7%), hydroxychloroquine (64.7%), cyclosporine A (20.6%), IV immunoglobulin (20.6%), azathioprine (10.3%) or cyclophosphamide (9.6%). In this series 24.3% developed calcinosis and mortality rate was 4.2%. Conclusion Evaluation of predefined criteria set for a valid diagnosis indicated aldolase as the most important parameter associated with de, methotrexate combination, was the most indicated treatment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Joint generalized linear models and double generalized linear models (DGLMs) were designed to model outcomes for which the variability can be explained using factors and/or covariates. When such factors operate, the usual normal regression models, which inherently exhibit constant variance, will under-represent variation in the data and hence may lead to erroneous inferences. For count and proportion data, such noise factors can generate a so-called overdispersion effect, and the use of binomial and Poisson models underestimates the variability and, consequently, incorrectly indicate significant effects. In this manuscript, we propose a DGLM from a Bayesian perspective, focusing on the case of proportion data, where the overdispersion can be modeled using a random effect that depends on some noise factors. The posterior joint density function was sampled using Monte Carlo Markov Chain algorithms, allowing inferences over the model parameters. An application to a data set on apple tissue culture is presented, for which it is shown that the Bayesian approach is quite feasible, even when limited prior information is available, thereby generating valuable insight for the researcher about its experimental results.