58 resultados para Semi-supervised classification
Resumo:
The PHENIX experiment has measured the suppression of semi-inclusive single high-transverse-momentum pi(0)'s in Au+Au collisions at root s(NN) = 200 GeV. The present understanding of this suppression is in terms of energy loss of the parent (fragmenting) parton in a dense color-charge medium. We have performed a quantitative comparison between various parton energy-loss models and our experimental data. The statistical point-to-point uncorrelated as well as correlated systematic uncertainties are taken into account in the comparison. We detail this methodology and the resulting constraint on the model parameters, such as the initial color-charge density dN(g)/dy, the medium transport coefficient <(q) over cap >, or the initial energy-loss parameter epsilon(0). We find that high-transverse-momentum pi(0) suppression in Au+Au collisions has sufficient precision to constrain these model-dependent parameters at the +/- 20-25% (one standard deviation) level. These constraints include only the experimental uncertainties, and further studies are needed to compute the corresponding theoretical uncertainties.
Resumo:
Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
The problem of semialgebraic Lipschitz classification of quasihomogeneous polynomials on a Holder triangle is studied. For this problem, the ""moduli"" are described completely in certain combinatorial terms.
Resumo:
Quality control of toys for avoiding children exposure to potentially toxic elements is of utmost relevance and it is a common requirement in national and/or international norms for health and safety reasons. Laser-induced breakdown spectroscopy (LIBS) was recently evaluated at authors` laboratory for direct analysis of plastic toys and one of the main difficulties for the determination of Cd. Cr and Pb was the variety of mixtures and types of polymers. As most norms rely on migration (lixiviation) protocols, chemometric classification models from LIBS spectra were tested for sampling toys that present potential risk of Cd, Cr and Pb contamination. The classification models were generated from the emission spectra of 51 polymeric toys and by using Partial Least Squares - Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA) and K-Nearest Neighbor (KNN). The classification models and validations were carried out with 40 and 11 test samples, respectively. Best results were obtained when KNN was used, with corrected predictions varying from 95% for Cd to 100% for Cr and Pb. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Traditionally, chronotype classification is based on the Morningness-Eveningness Questionnaire (MEQ). It is implicit in the classification that intermediate individuals get intermediate scores to most of the MEQ questions. However, a small group of individuals has a different pattern of answers. In some questions, they answer as ""morning-types"" and in some others they answer as ""evening-types,"" resulting in an intermediate total score. ""Evening-type"" and ""Morning-type"" answers were set as A(1) and A(4), respectively. Intermediate answers were set as A(2) and A(3). The following algorithm was applied: Bimodality Index = (Sigma A(1) x Sigma A(4))(2) - (Sigma A(2) x Sigma A(3))(2). Neither-types that had positive bimodality scores were classified as bimodal. If our hypothesis is validated by objective data, an update of chronotype classification will be required. (Author correspondence: brunojm@ymail.com)
Resumo:
We analyzed the usefulness of a semi-tethered field running test (STR) and the relationships between indices of anaerobic power, anaerobic capacity and running performance in 9 trained male sprinters (22.2 +/- 2.9 yrs, 176 +/- 1 cm, 68.0 +/- 9.4 kg). STR involved an all out 120 m run attached to an apparatus that enabled power calculation from force and velocity measures. Subjects also carried out a cycloergometer Win-gate Anaerobic Test (WT), an all out 300 m run and had accessed their maximal accumulated oxygen deficit (MAOD) on a treadmill. Peak and mean powers attained in STR (1 720 +/- 221 and 1 391 +/- 201 W) were greater but significantly related (r=0.82; P<0.01) to those in the WT (808 +/- 130 and 603 +/- 87 W). In addition, power measures derived from the STR were stronger related to running performance compared to those from the WT (r=0.81-0.94 vs. 0.68-0.84; P<0.05). Relationships between MAOD and most power indices were only weak to moderate. These results support the usefulness of STR for specific power assessment in field running and suggest that anaerobic power and capacity are not related entities, irrespective of having been evaluated using similar or dissimilar exercise modes.
Resumo:
This paper describes the modeling of a weed infestation risk inference system that implements a collaborative inference scheme based on rules extracted from two Bayesian network classifiers. The first Bayesian classifier infers a categorical variable value for the weed-crop competitiveness using as input categorical variables for the total density of weeds and corresponding proportions of narrow and broad-leaved weeds. The inferred categorical variable values for the weed-crop competitiveness along with three other categorical variables extracted from estimated maps for the weed seed production and weed coverage are then used as input for a second Bayesian network classifier to infer categorical variables values for the risk of infestation. Weed biomass and yield loss data samples are used to learn the probability relationship among the nodes of the first and second Bayesian classifiers in a supervised fashion, respectively. For comparison purposes, two types of Bayesian network structures are considered, namely an expert-based Bayesian classifier and a naive Bayes classifier. The inference system focused on the knowledge interpretation by translating a Bayesian classifier into a set of classification rules. The results obtained for the risk inference in a corn-crop field are presented and discussed. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Oropharyngeal dysphagia is characterized by any alteration in swallowing dynamics which may lead to malnutrition and aspiration pneumonia. Early diagnosis is crucial for the prognosis of patients with dysphagia, and the best method for swallowing dynamics assessment is swallowing videofluoroscopy, an exam performed with X-rays. Because it exposes patients to radiation, videofluoroscopy should not be performed frequently nor should it be prolonged. This study presents a non-invasive method for the pre-diagnosis of dysphagia based on the analysis of the swallowing acoustics, where the discrete wavelet transform plays an important role to increase sensitivity and specificity in the identification of dysphagic patients. (C) 2008 Elsevier Inc. All rights reserved.
Resumo:
Despite modern weed control practices, weeds continue to be a threat to agricultural production. Considering the variability of weeds, a classification methodology for the risk of infestation in agricultural zones using fuzzy logic is proposed. The inputs for the classification are attributes extracted from estimated maps for weed seed production and weed coverage using kriging and map analysis and from the percentage of surface infested by grass weeds, in order to account for the presence of weed species with a high rate of development and proliferation. The output for the classification predicts the risk of infestation of regions of the field for the next crop. The risk classification methodology described in this paper integrates analysis techniques which may help to reduce costs and improve weed control practices. Results for the risk classification of the infestation in a maize crop field are presented. To illustrate the effectiveness of the proposed system, the risk of infestation over the entire field is checked against the yield loss map estimated by kriging and also with the average yield loss estimated from a hyperbolic model.
Resumo:
This paper presents a study of a specific type of beam-to-column connection for precast concrete structures. Furthermore, an analytical model to determine the strength and the stiffness of the connection, based on test results of two prototypes, is proposed. To evaluate the influence of the strength and stiffness of the connection on the behaviour of the structure, the results of numerical simulations of a typical multi-storey building with semi-rigid connections are also presented and compared with the results using pinned and rigid connections. The main conclusions are: (a) the proposed design model can reasonably evaluate the studied connection strength; (b) the evaluation of strength is more accurate than that of stiffness; (c) for a typical structure, it is possible to increase the number of storeys of the structure from two to four with lower horizontal displacement at the top, and only a small increase of the column base bending moment by replacing the pinned connections with semi-rigid ones; and (d) although there is significant uncertainty in the connection stiffness, the results show that the displacements at the top of the structure, and the column base moments present low susceptibility deviations to this parameter.
Resumo:
In this paper, a framework for detection of human skin in digital images is proposed. This framework is composed of a training phase and a detection phase. A skin class model is learned during the training phase by processing several training images in a hybrid and incremental fuzzy learning scheme. This scheme combines unsupervised-and supervised-learning: unsupervised, by fuzzy clustering, to obtain clusters of color groups from training images; and supervised to select groups that represent skin color. At the end of the training phase, aggregation operators are used to provide combinations of selected groups into a skin model. In the detection phase, the learned skin model is used to detect human skin in an efficient way. Experimental results show robust and accurate human skin detection performed by the proposed framework.
Resumo:
The applicability of a meshfree approximation method, namely the EFG method, on fully geometrically exact analysis of plates is investigated. Based on a unified nonlinear theory of plates, which allows for arbitrarily large rotations and displacements, a Galerkin approximation via MLS functions is settled. A hybrid method of analysis is proposed, where the solution is obtained by the independent approximation of the generalized internal displacement fields and the generalized boundary tractions. A consistent linearization procedure is performed, resulting in a semi-definite generalized tangent stiffness matrix which, for hyperelastic materials and conservative loadings, is always symmetric (even for configurations far from the generalized equilibrium trajectory). Besides the total Lagrangian formulation, an updated version is also presented, which enables the treatment of rotations beyond the parameterization limit. An extension of the arc-length method that includes the generalized domain displacement fields, the generalized boundary tractions and the load parameter in the constraint equation of the hyper-ellipsis is proposed to solve the resulting nonlinear problem. Extending the hybrid-displacement formulation, a multi-region decomposition is proposed to handle complex geometries. A criterium for the classification of the equilibrium`s stability, based on the Bordered-Hessian matrix analysis, is suggested. Several numerical examples are presented, illustrating the effectiveness of the method. Differently from the standard finite element methods (FEM), the resulting solutions are (arbitrary) smooth generalized displacement and stress fields. (c) 2007 Elsevier Ltd. All rights reserved.
Resumo:
The properties of recycled aggregate produced from mixed (masonry and concrete) construction and demolition (C&D) waste are highly variable, and this restricts the use of such aggregate in structural concrete production. The development of classification techniques capable of reducing this variability is instrumental for quality control purposes and the production of high quality C&D aggregate. This paper investigates how the classification of C&D mixed coarse aggregate according to porosity influences the mechanical performance of concrete. Concretes using a variety of C&D aggregate porosity classes and different water/cement ratios were produced and the mechanical properties measured. For concretes produced with constant volume fractions of water, cement, natural sand and coarse aggregate from recycled mixed C&D waste, the compressive strength and Young modulus are direct exponential functions of the aggregate porosity. Sink and float technique is a simple laboratory density separation tool that facilitates the separation of cement particles with lower porosity, a difficult task when done only by visual sorting. For this experiment, separation using a 2.2 kg/dmA(3) suspension produced recycled aggregate (porosity less than 17%) which yielded good performance in concrete production. Industrial gravity separators may lead to the production of high quality recycled aggregate from mixed C&D waste for structural concrete applications.