921 resultados para Document classification,Naive Bayes classifier,Verb-object pairs
Resumo:
Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.
Resumo:
The presence of mutations associated with integrase inhibitor (INI) resistance among INI-naive patients may play an important clinical role in the use of those drugs Samples from 76 HIV-1-infected subjects naive to INIs were submitted to direct sequencing. No differences were found between naive (25%) subjects and subjects on HAART (75%). No primary mutation associated with raltegravir or elvitegravir resistance was found. However, 78% of sequences showed at least one accessory mutation associated with resistance. The analysis of the 76 IN sequences showed a high polymorphic level on this region among Brazilian HIV-1-infected subjects, including a high prevalence of aa substitutions related to INI resistance. The impact of these findings remains unclear and further studies are necessary to address these questions.
Resumo:
Objectives: Adults with major depressive disorder (MDD) are reported to have reduced orbitofrontal cortex (OFC) volumes, which could be related to decreased neuronal density. We conducted a study on medication naive children with MDD to determine whether abnormalities of OFC are present early in the illness course. Methods: Twenty seven medication naive pediatric Diagnostic and Statistical Manual of Mental Disorders, 4(th) edition (DSM-IV) MDD patients (mean age +/- SD = 14.4 +/- 2.2 years; 10 males) and 26 healthy controls (mean age +/- SD = 14.4 +/- 2.4 years; 12 males) underwent a 1.5T magnetic resonance imaging (MRI) with 3D spoiled gradient recalled acquisition. The OFC volumes were compared using analysis of covariance with age, gender, and total brain volume as covariates. Results: There was no significant difference in either total OFC volume or total gray matter OFC volume between MDD patients and healthy controls. Exploratory analysis revealed that patients had unexpectedly larger total right lateral (F = 4.2, df = 1, 48, p = 0.05) and right lateral gray matter (F = 4.6, df = 1, 48, p = 0.04) OFC volumes compared to healthy controls, but this finding was not significant following statistical correction for multiple comparisons. No other OFC subregions showed a significant difference. Conclusions: The lack of OFC volume abnormalities in pediatric MDD patients suggests the abnormalities previously reported for adults may develop later in life as a result of neural cell loss.
Resumo:
Objective: The striatum, including the putamen and caudate, plays an important role in executive and emotional processing and may be involved in the pathophysiology of mood disorders. Few studies have examined structural abnormalities of the striatum in pediatric major depressive disorder (MDD) patients. We report striatal volume abnormalities in medication-naive pediatric MDD compared to healthy comparison subjects. Method: Twenty seven medication-naive pediatric Diagnostic and Statistical Manual of Mental Disorders, 4(th) edition (DSM-IV) MDD and 26 healthy comparison subjects underwent volumetric magnetic resonance imaging (MRI). The putamen and caudate volumes were traced manually by a blinded rater, and the patient and control groups were compared using analysis of covariance adjusting for age, sex, intelligence quotient, and total brain volumes. Results: MDD patients had significantly smaller right striatum (6.0% smaller) and right caudate volumes (7.4% smaller) compared to the healthy subjects. Left caudate volumes were inversely correlated with severity of depression in MDD subjects. Age was inversely correlated with left and right putamen volumes in MDD patients but not in the healthy subjects. Conclusions: These findings provide fresh evidence for abnormalities in the striatum of medication-naive pediatric MDD patients and suggest the possible involvement of the striatum in the pathophysiology of MDD.
Resumo:
Aims. In this work, we describe the pipeline for the fast supervised classification of light curves observed by the CoRoT exoplanet CCDs. We present the classification results obtained for the first four measured fields, which represent a one-year in-orbit operation. Methods. The basis of the adopted supervised classification methodology has been described in detail in a previous paper, as is its application to the OGLE database. Here, we present the modifications of the algorithms and of the training set to optimize the performance when applied to the CoRoT data. Results. Classification results are presented for the observed fields IRa01, SRc01, LRc01, and LRa01 of the CoRoT mission. Statistics on the number of variables and the number of objects per class are given and typical light curves of high-probability candidates are shown. We also report on new stellar variability types discovered in the CoRoT data. The full classification results are publicly available.
Resumo:
A new genus and species of microteiid lizard is described based on a series of specimens obtained at Parque Nacional do Caparao (20 degrees 28'S, 41 degrees 49'W), southeastern Brazil, along the division line between the States of Minas Gerais and Espirito Santo. The new lizard occurs in isolated high-altitude, open, rocky habitats above the altitudinal lit-nits of the Atlantic forest. It is characterized by the presence of prefrontals, frontoparietals, parietals, interparietal, and occipital scales; ear opening and eyelid distinct; three pairs of genials; absence of collar; lanceolate and mucronate dorsal scales; six regular transverse and longitudinal series of smooth ventrals that are longer than wide, with the lateral ones narrower. Maximum parsimony (MP) and partitioned Bayesian (PBA) phylogenetic analyses based on morphological and molecular characters with all known genera of Gymnophthalminae (except for Scriptosaura) Plus Rhachisaurus recovered this new lizard in a clade having Colobodactylus and Heterodactylus as its closest relatives. Both analyses recovered the monophyly of Gymnophthalminae and Gymnophthalmini. The monophyly of the Heterodactylini received moderate support in MP analyses but was not recovered in PBA. To eliminate classification controversy between these results, the present concept of Heterodactylini is restricted to accommodate the new genus, Colobodactylus and Heterodactylus, and a new tribe Iphisiini is proposed to allocate Alexandresaurus, Iphisa, Colobosaura, Acratosaura, and Stenolepis. Current phylogenetic knowledge of Gymnophthalminae suggests that fossoriality and increase of body elongation arose as adaptive responses to avoid extreme surface temperatures, either cold or hot, depending on circumstances.
Resumo:
In Natural Language Processing (NLP) symbolic systems, several linguistic phenomena, for instance, the thematic role relationships between sentence constituents, such as AGENT, PATIENT, and LOCATION, can be accounted for by the employment of a rule-based grammar. Another approach to NLP concerns the use of the connectionist model, which has the benefits of learning, generalization and fault tolerance, among others. A third option merges the two previous approaches into a hybrid one: a symbolic thematic theory is used to supply the connectionist network with initial knowledge. Inspired on neuroscience, it is proposed a symbolic-connectionist hybrid system called BIO theta PRED (BIOlogically plausible thematic (theta) symbolic-connectionist PREDictor), designed to reveal the thematic grid assigned to a sentence. Its connectionist architecture comprises, as input, a featural representation of the words (based on the verb/noun WordNet classification and on the classical semantic microfeature representation), and, as output, the thematic grid assigned to the sentence. BIO theta PRED is designed to ""predict"" thematic (semantic) roles assigned to words in a sentence context, employing biologically inspired training algorithm and architecture, and adopting a psycholinguistic view of thematic theory.
Sensitivity to noise and ergodicity of an assembly line of cellular automata that classifies density
Resumo:
We investigate the sensitivity of the composite cellular automaton of H. Fuks [Phys. Rev. E 55, R2081 (1997)] to noise and assess the density classification performance of the resulting probabilistic cellular automaton (PCA) numerically. We conclude that the composite PCA performs the density classification task reliably only up to very small levels of noise. In particular, it cannot outperform the noisy Gacs-Kurdyumov-Levin automaton, an imperfect classifier, for any level of noise. While the original composite CA is nonergodic, analyses of relaxation times indicate that its noisy version is an ergodic automaton, with the relaxation times decaying algebraically over an extended range of parameters with an exponent very close (possibly equal) to the mean-field value.
Resumo:
Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.
Resumo:
In this work, we study the role of the ac Stark effects on the excitation of nS(1/2) cold Rydberg atoms produced in a rubidium magneto-optical trap. We have observed an atomic population in the nP(3/2) state after excitation of nS(1/2) for 29 <= n <= 37. Such an observation is normally attributed to binary collisions; however, the interaction between Rb nS(1/2) atoms is repulsive. To explain our results, the dipole-dipole interaction and ac Stark shifts from the excitation laser must be considered. We find that the Rydberg-atom-pair state asymptotically correlating to nP(3/2)+(n-1)P(3/2) is excited directly.
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
The problem of semialgebraic Lipschitz classification of quasihomogeneous polynomials on a Holder triangle is studied. For this problem, the ""moduli"" are described completely in certain combinatorial terms.
Resumo:
Quality control of toys for avoiding children exposure to potentially toxic elements is of utmost relevance and it is a common requirement in national and/or international norms for health and safety reasons. Laser-induced breakdown spectroscopy (LIBS) was recently evaluated at authors` laboratory for direct analysis of plastic toys and one of the main difficulties for the determination of Cd. Cr and Pb was the variety of mixtures and types of polymers. As most norms rely on migration (lixiviation) protocols, chemometric classification models from LIBS spectra were tested for sampling toys that present potential risk of Cd, Cr and Pb contamination. The classification models were generated from the emission spectra of 51 polymeric toys and by using Partial Least Squares - Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA) and K-Nearest Neighbor (KNN). The classification models and validations were carried out with 40 and 11 test samples, respectively. Best results were obtained when KNN was used, with corrected predictions varying from 95% for Cd to 100% for Cr and Pb. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Traditionally, chronotype classification is based on the Morningness-Eveningness Questionnaire (MEQ). It is implicit in the classification that intermediate individuals get intermediate scores to most of the MEQ questions. However, a small group of individuals has a different pattern of answers. In some questions, they answer as ""morning-types"" and in some others they answer as ""evening-types,"" resulting in an intermediate total score. ""Evening-type"" and ""Morning-type"" answers were set as A(1) and A(4), respectively. Intermediate answers were set as A(2) and A(3). The following algorithm was applied: Bimodality Index = (Sigma A(1) x Sigma A(4))(2) - (Sigma A(2) x Sigma A(3))(2). Neither-types that had positive bimodality scores were classified as bimodal. If our hypothesis is validated by objective data, an update of chronotype classification will be required. (Author correspondence: brunojm@ymail.com)