10 resultados para Document classification,Naive Bayes classifier,Verb-object pairs
em Universidade Federal do Rio Grande do Norte(UFRN)
Resumo:
Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated
Resumo:
Hebb proposed that synapses between neurons that fire synchronously are strengthened, forming cell assemblies and phase sequences. The former, on a shorter scale, are ensembles of synchronized cells that function transiently as a closed processing system; the latter, on a larger scale, correspond to the sequential activation of cell assemblies able to represent percepts and behaviors. Nowadays, the recording of large neuronal populations allows for the detection of multiple cell assemblies. Within Hebb’s theory, the next logical step is the analysis of phase sequences. Here we detected phase sequences as consecutive assembly activation patterns, and then analyzed their graph attributes in relation to behavior. We investigated action potentials recorded from the adult rat hippocampus and neocortex before, during and after novel object exploration (experimental periods). Within assembly graphs, each assembly corresponded to a node, and each edge corresponded to the temporal sequence of consecutive node activations. The sum of all assembly activations was proportional to firing rates, but the activity of individual assemblies was not. Assembly repertoire was stable across experimental periods, suggesting that novel experience does not create new assemblies in the adult rat. Assembly graph attributes, on the other hand, varied significantly across behavioral states and experimental periods, and were separable enough to correctly classify experimental periods (Naïve Bayes classifier; maximum AUROCs ranging from 0.55 to 0.99) and behavioral states (waking, slow wave sleep, and rapid eye movement sleep; maximum AUROCs ranging from 0.64 to 0.98). Our findings agree with Hebb’s view that neuronal assemblies correspond to primitive building blocks of representation, nearly unchanged in 10 the adult, while phase sequences are labile across behavioral states and change after novel experience. The results are compatible with a role for phase sequences in behavior and cognition
Resumo:
Hebb proposed that synapses between neurons that fire synchronously are strengthened, forming cell assemblies and phase sequences. The former, on a shorter scale, are ensembles of synchronized cells that function transiently as a closed processing system; the latter, on a larger scale, correspond to the sequential activation of cell assemblies able to represent percepts and behaviors. Nowadays, the recording of large neuronal populations allows for the detection of multiple cell assemblies. Within Hebb's theory, the next logical step is the analysis of phase sequences. Here we detected phase sequences as consecutive assembly activation patterns, and then analyzed their graph attributes in relation to behavior. We investigated action potentials recorded from the adult rat hippocampus and neocortex before, during and after novel object exploration (experimental periods). Within assembly graphs, each assembly corresponded to a node, and each edge corresponded to the temporal sequence of consecutive node activations. The sum of all assembly activations was proportional to firing rates, but the activity of individual assemblies was not. Assembly repertoire was stable across experimental periods, suggesting that novel experience does not create new assemblies in the adult rat. Assembly graph attributes, on the other hand, varied significantly across behavioral states and experimental periods, and were separable enough to correctly classify experimental periods (Naïve Bayes classifier; maximum AUROCs ranging from 0.55 to 0.99) and behavioral states (waking, slow wave sleep, and rapid eye movement sleep; maximum AUROCs ranging from 0.64 to 0.98). Our findings agree with Hebb's view that assemblies correspond to primitive building blocks of representation, nearly unchanged in the adult, while phase sequences are labile across behavioral states and change after novel experience. The results are compatible with a role for phase sequences in behavior and cognition.
Resumo:
Hebb proposed that synapses between neurons that fire synchronously are strengthened, forming cell assemblies and phase sequences. The former, on a shorter scale, are ensembles of synchronized cells that function transiently as a closed processing system; the latter, on a larger scale, correspond to the sequential activation of cell assemblies able to represent percepts and behaviors. Nowadays, the recording of large neuronal populations allows for the detection of multiple cell assemblies. Within Hebb's theory, the next logical step is the analysis of phase sequences. Here we detected phase sequences as consecutive assembly activation patterns, and then analyzed their graph attributes in relation to behavior. We investigated action potentials recorded from the adult rat hippocampus and neocortex before, during and after novel object exploration (experimental periods). Within assembly graphs, each assembly corresponded to a node, and each edge corresponded to the temporal sequence of consecutive node activations. The sum of all assembly activations was proportional to firing rates, but the activity of individual assemblies was not. Assembly repertoire was stable across experimental periods, suggesting that novel experience does not create new assemblies in the adult rat. Assembly graph attributes, on the other hand, varied significantly across behavioral states and experimental periods, and were separable enough to correctly classify experimental periods (Naïve Bayes classifier; maximum AUROCs ranging from 0.55 to 0.99) and behavioral states (waking, slow wave sleep, and rapid eye movement sleep; maximum AUROCs ranging from 0.64 to 0.98). Our findings agree with Hebb's view that assemblies correspond to primitive building blocks of representation, nearly unchanged in the adult, while phase sequences are labile across behavioral states and change after novel experience. The results are compatible with a role for phase sequences in behavior and cognition.
Resumo:
This paper attempts to investigate the discourse manifestations of the grammatical relation direct object with respect to the syntactic, semantic and pragmatic properties that underlie this element. The research adopts theoretical orientation of the functionalism from North American and Brazilian schools inspired in Givón (1995, 2001), Hopper and Thompson (1980), Chafe (1979), Furtado da Cunha, Oliveira, Martelotta (2003) inter alia. From functionalism, the research uses principles of iconicity, markedness and informativity and it analize categories of transitivity, grounding and animacy. This research is anchored in prototype model (TAYLOR 1995); construction grammar model (GOLDBERG 1996, 2002). Both theoretical orientations share the view that language is a malleable living organism subject to socio-cultural context. Grammar is then the result of created, maintained, and systematized linguistic patterns developed from and used for language use. According to a functional linguistics and cognitivist linguistics verbs are stored in the speakers lexicon in syntactic-semantic frames which are more frequent. These frames carry information concerning obligatory and optional arguments and the semantic roles these arguments take in the clause. The analysis focuses on the semantic type of the verbs and its relationship with the argument encoded as a direct object observing the aspectual nature of verbs. Direct objects are classified according to their morphology (lexical or pronominal noun phrase), semantic role, informational content and animacy. This study discusses pedagogical implications with relation to how the grammatical concepts touched on this paper are treated in school textbooks. The empirical data come from Corpus Discurso & Gramática: a língua falada e escrita na cidade do Natal (FURTADO DA CUNHA, 1998). This corpus is composed of texts that contain spoken and written modalities. These modalities are in turn organized according to different types: personal narratives, retold narrative, description of preferred place, procedural place, procedural description and report on argumentation. The sample data totals 40 texts produced by four language consultants of the last graduation date. The paper shows that the same syntactic structures (formed through Subject-Verb-Object) correspond to different semantic-pragmatic structures in relation to specific communicative purposes even verb is an event, process or state. The argument structure are not aleatory but are related to experience; that is the way humans conceptualize the world and talk about it
Resumo:
The objective of the researches in artificial intelligence is to qualify the computer to execute functions that are performed by humans using knowledge and reasoning. This work was developed in the area of machine learning, that it s the study branch of artificial intelligence, being related to the project and development of algorithms and techniques capable to allow the computational learning. The objective of this work is analyzing a feature selection method for ensemble systems. The proposed method is inserted into the filter approach of feature selection method, it s using the variance and Spearman correlation to rank the feature and using the reward and punishment strategies to measure the feature importance for the identification of the classes. For each ensemble, several different configuration were used, which varied from hybrid (homogeneous) to non-hybrid (heterogeneous) structures of ensemble. They were submitted to five combining methods (voting, sum, sum weight, multiLayer Perceptron and naïve Bayes) which were applied in six distinct database (real and artificial). The classifiers applied during the experiments were k- nearest neighbor, multiLayer Perceptron, naïve Bayes and decision tree. Finally, the performance of ensemble was analyzed comparatively, using none feature selection method, using a filter approach (original) feature selection method and the proposed method. To do this comparison, a statistical test was applied, which demonstrate that there was a significant improvement in the precision of the ensembles
Resumo:
The municipality of Areia Branca is within the mesoregion of West Potiguar and within the microregion of Mossoró, covering an area of 357,58 km2. Covering an area of weakness in terms of environmental, housing, together with the municipality of Grossos-RN, the estuary of River Apodi-Mossoró. The municipality of Areia Branca has historically suffered from a lack of planning regarding the use and occupation of land as some economic activities, attracted by the extremely favorable natural conditions, have exploited their natural resources improperly. The aim of this study is to quantify and analyze the environmental degradation in the municipality. Thus initially was performed a characterization of land use using remote sensing, geoprocessing and geographic information system GIS in order to generate data and information on the municipal scale, which may serve as input to the environmental planning and land use planning in the region. From this perspective, were used a Landsat 5 image TM sensor for the year 2010. In the processing of this image was used SPRING 5.2 and applied a supervised classification using the classifier regions, which was employed Bhattacharya Distance method with a threshold at 30%. Thus was obtained the land use map that was analyzed the spatial distribution of different types of the use that is occurring in the city, identifying areas that are being used incorrectly and the main types of environmental degradation. And further, were applied the methodology proposed by Beltrame (1994), Physical Diagnosis Conservationist under some adaptations for quantifying the level of degradation or conservation study area. As results, the indexes were obtained for the parameters in the proposed methodology, allowing quantitatively analyze the degradation potential of each sector. From this perspective, considering a scale of 0 to 100, sector A and sector B had value 31.20 units of risk of physical deterioration. And the C sector, has shown its value - 34.64 units degradation risk and should be considered a priority in relation to the achievement of conservation actions
Resumo:
This research aims to understand the social representations Teaching Work in groups of undergraduate students of Physics and Chemistry of the Federal University of Rio Grande do Norte. For this, the proposal was based on the three theoretical and methodological consensus Carvalho (2012) in the explanation of socio-genetic mechanisms constituents of dynamic consensus that has functionality to your organization. It Was used to achieve this goal, the theoretical-epistemological Serge Moscovici (1978, 2003), Jodelet (2011), Wagner (1998,( 2011) and Carvalho (2012). The corpus analyzed results from a qualitative and quantitative research, developed in three stages. The first two (2) questionnaires to fifty (50) of each undergraduate course, a questionnaire and another profile for collection of free associations concerning motes inductors "Give Lesson," "Student" and "Teacher". The second step in the procedure Multiple Classifications, Roazzi (1995), aimed for another thirty (30) undergraduate students for each course, as well as Document Analysis of Educational Projects Curriculum courses in Physics and Chemistry. The data analysis of the first stage focused on descriptive statistics and frequency and average order of the words associated with motes inductors. The results from the Multiple Classification Procedure submitted to multidimensional analysis (MSA multidimensional scalogram analysis) and SSA (Similarity Structure Analysis), were interpreted by the theoretical and methodological proposal of the three consensus, supported by analysis of the rhetorical nature of justifications classifications and categorizations of words, boosted in times of application of Procedure Multiple Classification. The data revealed that the groups surveyed were the same Social Representation with specific dynamic consensual. Thinking Teaching Work for these groups it is considered in three dimensions: the BE-DO-HAVE of teaching. In the group of Physics consensus was clear semantic, which expressed a dynamic in which the interpretations of "Teaching Work" peacefully coexist on perceptions of two concepts: An identity around the "BE" "Teacher" or "BE" "Educator" and the other, how they think about professional development. The type of group consensus Chemistry pointed to a consensual logic hierarchical order in which the gradual between the elements of BE-DO-HAVE attested conflicts and disagreements about the perceptual object "Teaching Work", around what value most, whether they are the attributes of personal or professional-technical dimension of teaching, in the course of professional development. The thesis to explain the mechanisms of socio-genetic Representation Social Teaching Work by theoretical and methodological proposal was confirmed
Resumo:
Equipment maintenance is the major cost factor in industrial plants, it is very important the development of fault predict techniques. Three-phase induction motors are key electrical equipments used in industrial applications mainly because presents low cost and large robustness, however, it isn t protected from other fault types such as shorted winding and broken bars. Several acquisition ways, processing and signal analysis are applied to improve its diagnosis. More efficient techniques use current sensors and its signature analysis. In this dissertation, starting of these sensors, it is to make signal analysis through Park s vector that provides a good visualization capability. Faults data acquisition is an arduous task; in this way, it is developed a methodology for data base construction. Park s transformer is applied into stationary reference for machine modeling of the machine s differential equations solution. Faults detection needs a detailed analysis of variables and its influences that becomes the diagnosis more complex. The tasks of pattern recognition allow that systems are automatically generated, based in patterns and data concepts, in the majority cases undetectable for specialists, helping decision tasks. Classifiers algorithms with diverse learning paradigms: k-Neighborhood, Neural Networks, Decision Trees and Naïves Bayes are used to patterns recognition of machines faults. Multi-classifier systems are used to improve classification errors. It inspected the algorithms homogeneous: Bagging and Boosting and heterogeneous: Vote, Stacking and Stacking C. Results present the effectiveness of constructed model to faults modeling, such as the possibility of using multi-classifiers algorithm on faults classification
Resumo:
The classifier support vector machine is used in several problems in various areas of knowledge. Basically the method used in this classier is to end the hyperplane that maximizes the distance between the groups, to increase the generalization of the classifier. In this work, we treated some problems of binary classification of data obtained by electroencephalography (EEG) and electromyography (EMG) using Support Vector Machine with some complementary techniques, such as: Principal Component Analysis to identify the active regions of the brain, the periodogram method which is obtained by Fourier analysis to help discriminate between groups and Simple Moving Average to eliminate some of the existing noise in the data. It was developed two functions in the software R, for the realization of training tasks and classification. Also, it was proposed two weights systems and a summarized measure to help on deciding in classification of groups. The application of these techniques, weights and the summarized measure in the classier, showed quite satisfactory results, where the best results were an average rate of 95.31% to visual stimuli data, 100% of correct classification for epilepsy data and rates of 91.22% and 96.89% to object motion data for two subjects.