28 resultados para Classificação de documentos
Resumo:
Reinforcement learning is a machine learning technique that, although finding a large number of applications, maybe is yet to reach its full potential. One of the inadequately tested possibilities is the use of reinforcement learning in combination with other methods for the solution of pattern classification problems. It is well documented in the literature the problems that support vector machine ensembles face in terms of generalization capacity. Algorithms such as Adaboost do not deal appropriately with the imbalances that arise in those situations. Several alternatives have been proposed, with varying degrees of success. This dissertation presents a new approach to building committees of support vector machines. The presented algorithm combines Adaboost algorithm with a layer of reinforcement learning to adjust committee parameters in order to avoid that imbalances on the committee components affect the generalization performance of the final hypothesis. Comparisons were made with ensembles using and not using the reinforcement learning layer, testing benchmark data sets widely known in area of pattern classification
Resumo:
Modern wireless systems employ adaptive techniques to provide high throughput while observing desired coverage, Quality of Service (QoS) and capacity. An alternative to further enhance data rate is to apply cognitive radio concepts, where a system is able to exploit unused spectrum on existing licensed bands by sensing the spectrum and opportunistically access unused portions. Techniques like Automatic Modulation Classification (AMC) could help or be vital for such scenarios. Usually, AMC implementations rely on some form of signal pre-processing, which may introduce a high computational cost or make assumptions about the received signal which may not hold (e.g. Gaussianity of noise). This work proposes a new method to perform AMC which uses a similarity measure from the Information Theoretic Learning (ITL) framework, known as correntropy coefficient. It is capable of extracting similarity measurements over a pair of random processes using higher order statistics, yielding in better similarity estimations than by using e.g. correlation coefficient. Experiments carried out by means of computer simulation show that the technique proposed in this paper presents a high rate success in classification of digital modulation, even in the presence of additive white gaussian noise (AWGN)
Resumo:
The pattern classification is one of the machine learning subareas that has the most outstanding. Among the various approaches to solve pattern classification problems, the Support Vector Machines (SVM) receive great emphasis, due to its ease of use and good generalization performance. The Least Squares formulation of SVM (LS-SVM) finds the solution by solving a set of linear equations instead of quadratic programming implemented in SVM. The LS-SVMs provide some free parameters that have to be correctly chosen to achieve satisfactory results in a given task. Despite the LS-SVMs having high performance, lots of tools have been developed to improve them, mainly the development of new classifying methods and the employment of ensembles, in other words, a combination of several classifiers. In this work, our proposal is to use an ensemble and a Genetic Algorithm (GA), search algorithm based on the evolution of species, to enhance the LSSVM classification. In the construction of this ensemble, we use a random selection of attributes of the original problem, which it splits the original problem into smaller ones where each classifier will act. So, we apply a genetic algorithm to find effective values of the LS-SVM parameters and also to find a weight vector, measuring the importance of each machine in the final classification. Finally, the final classification is obtained by a linear combination of the decision values of the LS-SVMs with the weight vector. We used several classification problems, taken as benchmarks to evaluate the performance of the algorithm and compared the results with other classifiers
Resumo:
This work holds the purpose of presenting an auxiliary way of bone density measurement through the attenuation of electromagnetic waves. In order to do so, an arrangement of two microstrip antennas with rectangular configuration has been used, operating in a frequency of 2,49 GHz, and fed by a microstrip line on a substrate of fiberglass with permissiveness of 4.4 and height of 0,9 cm. Simulations were done with silica, bone meal, silica and gypsum blocks samples to prove the variation on the attenuation level of different combinations. Because of their good reproduction of the human beings anomaly aspects, samples of bovine bone were used. They were subjected to weighing, measurement and microwave radiation. The samples had their masses altered after mischaracterization and the process was repeated. The obtained data were inserted in a neural network and its training was proceeded with the best results gathered by correct classification on 100% of the samples. It comes to the conclusion that through only one non-ionizing wave in the 2,49 GHz zone it is possible to evaluate the attenuation level in the bone tissue, and that with the appliance of neural network fed with obtained characteristics in the experiment it is possible to classify a sample as having low or high bone density
Resumo:
The increasing demand for high performance wireless communication systems has shown the inefficiency of the current model of fixed allocation of the radio spectrum. In this context, cognitive radio appears as a more efficient alternative, by providing opportunistic spectrum access, with the maximum bandwidth possible. To ensure these requirements, it is necessary that the transmitter identify opportunities for transmission and the receiver recognizes the parameters defined for the communication signal. The techniques that use cyclostationary analysis can be applied to problems in either spectrum sensing and modulation classification, even in low signal-to-noise ratio (SNR) environments. However, despite the robustness, one of the main disadvantages of cyclostationarity is the high computational cost for calculating its functions. This work proposes efficient architectures for obtaining cyclostationary features to be employed in either spectrum sensing and automatic modulation classification (AMC). In the context of spectrum sensing, a parallelized algorithm for extracting cyclostationary features of communication signals is presented. The performance of this features extractor parallelization is evaluated by speedup and parallel eficiency metrics. The architecture for spectrum sensing is analyzed for several configuration of false alarm probability, SNR levels and observation time for BPSK and QPSK modulations. In the context of AMC, the reduced alpha-profile is proposed as as a cyclostationary signature calculated for a reduced cyclic frequencies set. This signature is validated by a modulation classification architecture based on pattern matching. The architecture for AMC is investigated for correct classification rates of AM, BPSK, QPSK, MSK and FSK modulations, considering several scenarios of observation length and SNR levels. The numerical results of performance obtained in this work show the eficiency of the proposed architectures
Resumo:
The use of non-human primates in scientific research has contributed significantly to the biomedical area and, in the case of Callithrix jacchus, has provided important evidence on physiological mechanisms that help explain its biology, making the species a valuable experimental model in different pathologies. However, raising non-human primates in captivity for long periods of time is accompanied by behavioral disorders and chronic diseases, as well as progressive weight loss in most of the animals. The Primatology Center of the Universidade Federal do Rio Grande do Norte (UFRN) has housed a colony of C. jacchus for nearly 30 years and during this period these animals have been weighed systematically to detect possible alterations in their clinical conditions. This procedure has generated a volume of data on the weight of animals at different age ranges. These data are of great importance in the study of this variable from different perspectives. Accordingly, this paper presents three studies using weight data collected over 15 years (1985-2000) as a way of verifying the health status and development of the animals. The first study produced the first article, which describes the histopathological findings of animals with probable diagnosis of permanent wasting marmoset syndrome (WMS). All the animals were carriers of trematode parasites (Platynosomum spp) and had obstruction in the hepatobiliary system; it is suggested that this agent is one of the etiological factors of the syndrome. In the second article, the analysis focused on comparing environmental profile and cortisol levels between the animals with normal weight curve evolution and those with WMS. We observed a marked decrease in locomotion, increased use of lower cage extracts and hypocortisolemia. The latter is likely associated to an adaptation of the mechanisms that make up the hypothalamus-hypophysis-adrenal axis, as observed in other mammals under conditions of chronic malnutrition. Finally, in the third study, the animals with weight alterations were excluded from the sample and, using computational tools (K-means and SOM) in a non-supervised way, we suggest found new ontogenetic development classes for C. jacchus. These were redimensioned from five to eight classes: infant I, infant II, infant III, juvenile I, juvenile II, sub-adult, young adult and elderly adult, in order to provide a more suitable classification for more detailed studies that require better control over the animal development
Resumo:
Recently, Brazilian scientific production has increased greatly, due to demands for productivity from scientific agencies. However, this high increases requires a more qualified production, since it s essential that publications are relevant and original. In the psychological field, the assessment scientific journals of the CAPES/ANPEPP Commission had a strong effect on the scientific community and raised questions about the chosen evaluation method. Considering this impact, the aim of this research is a meta-analysis on the assessment of Psychological journals by CAPES to update the Qualis database. For this research, Psychology scientific editors (38 questionnaires were applied by e-mail) were consulted, also 5 librarians who work with scientific journals assessment (semi-structured interviews) and 8 members who acted as referees in the CAPES/ANPEPP Commission (open questions were sent by e-mail). The results are shown through 3 analysis: general evaluation of the Qualis process (including the Assessment Committee constitution), evaluation criteria used in the process and the effect of the evaluation on the scientific community (changes on the editing scene included). Some important points emerged: disagreement among different actors about the suitability of this evaluation model; the recognition of the improvement of scientific journals, mainly toward normalization and diffusion; the verification that the model does not point the quality of the journal, i.e., the content of the scientific articles published in the journal; the disagreement with the criteria used, seemed necessary and useful but needed to be discussed and cleared between the scientific community. Despite these points, the scientific journals evaluation still is the main method to assure quality for Psychology publications
Resumo:
In this work we used chemometric tools to classify and quantify the protein content in samples of milk powder. We applied the NIR diffuse reflectance spectroscopy combined with multivariate techniques. First, we carried out an exploratory method of samples by principal component analysis (PCA), then the classification of independent modeling of class analogy (SIMCA). Thus it became possible to classify the samples that were grouped by similarities in their composition. Finally, the techniques of partial least squares regression (PLS) and principal components regression (PCR) allowed the quantification of protein content in samples of milk powder, compared with the Kjeldahl reference method. A total of 53 samples of milk powder sold in the metropolitan areas of Natal, Salvador and Rio de Janeiro were acquired for analysis, in which after pre-treatment data, there were four models, which were employed for classification and quantification of samples. The methods employed after being assessed and validated showed good performance, good accuracy and reliability of the results, showing that the NIR technique can be a non invasive technique, since it produces no waste and saves time in analyzing the samples
Resumo:
When crosscutting concerns identification is performed from the beginning of development, on the activities involved in requirements engineering, there are many gains in terms of quality, cost and efficiency throughout the lifecycle of software development. This early identification supports the evolution of requirements, detects possible flaws in the requirements specification, improves traceability among requirements, provides better software modularity and prevents possible rework. However, despite these several advantages, the crosscutting concerns identification over requirements engineering faces several difficulties such as the lack of systematization and tools that support it. Furthermore, it is difficult to justify why some concerns are identified as crosscutting or not, since this identification is, most often, made without any methodology that systematizes and bases it. In this context, this paper proposes an approach based on Grounded Theory, called GT4CCI, for systematizing and basing the process of identifying crosscutting concerns in the initial stages of the software development process in the requirements document. Grounded Theory is a renowned methodology for qualitative analysis of data. Through the use of GT4CCI it is possible to better understand, track and document concerns, adding gains in terms of quality, reliability and modularity of the entire lifecycle of software
Resumo:
The techniques of Machine Learning are applied in classification tasks to acquire knowledge through a set of data or information. Some learning methods proposed in literature are methods based on semissupervised learning; this is represented by small percentage of labeled data (supervised learning) combined with a quantity of label and non-labeled examples (unsupervised learning) during the training phase, which reduces, therefore, the need for a large quantity of labeled instances when only small dataset of labeled instances is available for training. A commom problem in semi-supervised learning is as random selection of instances, since most of paper use a random selection technique which can cause a negative impact. Much of machine learning methods treat single-label problems, in other words, problems where a given set of data are associated with a single class; however, through the requirement existent to classify data in a lot of domain, or more than one class, this classification as called multi-label classification. This work presents an experimental analysis of the results obtained using semissupervised learning in troubles of multi-label classification using reliability parameter as an aid in the classification data. Thus, the use of techniques of semissupervised learning and besides methods of multi-label classification, were essential to show the results
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Resumo:
The aim of this study was to investigate the social representation of technological education teachers at the Federal Technological Education Network. The survey was conducted from 2007 to 2010, and the respondents were 275 teachers, 135 of the Federal Center for Technological Education (CEFET in portuguese) in the state of Amazonas, in Manaus unit headquarters; 140 of the CEFET in the state of Rio Grande do Norte, a unit based in Natal. We adopt the concept of technological education as the top level of professional education, that is to say, the undergraduate programs of short duration called technological courses. The Federal Technological Education Network gathers hundreds of related institutions, coordinated and supervised by the Office of Vocational and Technological Education of the Ministry of Education. Although many of these institutions offer courses in technology education, no research addressing this subject from the perspective of Social Representations Theory (SRT) was found in the literature. We seek to unravel the social representation of technological education of the teachers by adopting the procedural approach of SRT. This is a qualitative approach, focusing on significant aspects of the representative activity and the formation mechanisms of the representation. Therefore, we search the socio-genesis of the representation in the articulations between discourses, social institutions and practices. We initiated the research through applying critical reading and an analytical perspective on the historical and regulatory documents of technological education in Brazil, from the early twentieth century to the present day. We adopt the Procedure for Multiple Classifications (PMC) from the Free Words Association Technique (FWAT) to access the elements of representational content. For the analysis of the data obtained with FWAT and selection of major words / phrases pertinent to the semantic field of education technology, we used Hamlet II software. For the data analysis of PMC and Free Classification (FC) we used the SPSS ® (Statistical Package for the Social Sciences) version 17.0 and used the method of multidimensional scaling - Multidimensional scaling - (MDS). The output from the central MDS takes the form of a set of scatterplots - "perceptual maps" - of which the points are the elements of the representational content. For the FC data analysis we used the Scalogram Multidimensional Analysis (SMA) - which makes use of the original data in its raw form and allows categorical data to be interpreted in the map as measures of (di)similarity. In order to help with the understanding of the settings of the perceptual maps of FC, we used the Content Analysis of the discourse fragments of the teachers interviewed. The results confirm our initial hypothesis regarding the presence of a single plot among the socio-cognitive study subjects, which is the basis for a social representation of technological education in line with the historic assumption of the dichotomy between mental and manual labor. In spite of the three merging representational elements of the representational content, the perceptual maps compiled from the MSA statistics corroborates the dichotomy, with the exception of the map relating to the subgroup of teachers belonging to the humanities
Resumo:
Data classification is a task with high applicability in a lot of areas. Most methods for treating classification problems found in the literature dealing with single-label or traditional problems. In recent years has been identified a series of classification tasks in which the samples can be labeled at more than one class simultaneously (multi-label classification). Additionally, these classes can be hierarchically organized (hierarchical classification and hierarchical multi-label classification). On the other hand, we have also studied a new category of learning, called semi-supervised learning, combining labeled data (supervised learning) and non-labeled data (unsupervised learning) during the training phase, thus reducing the need for a large amount of labeled data when only a small set of labeled samples is available. Thus, since both the techniques of multi-label and hierarchical multi-label classification as semi-supervised learning has shown favorable results with its use, this work is proposed and used to apply semi-supervised learning in hierarchical multi-label classication tasks, so eciently take advantage of the main advantages of the two areas. An experimental analysis of the proposed methods found that the use of semi-supervised learning in hierarchical multi-label methods presented satisfactory results, since the two approaches were statistically similar results