915 resultados para Pattern recognition algorithms
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
In general, pattern recognition techniques require a high computational burden for learning the discriminating functions that are responsible to separate samples from distinct classes. As such, there are several studies that make effort to employ machine learning algorithms in the context of big data classification problems. The research on this area ranges from Graphics Processing Units-based implementations to mathematical optimizations, being the main drawback of the former approaches to be dependent on the graphic video card. Here, we propose an architecture-independent optimization approach for the optimum-path forest (OPF) classifier, that is designed using a theoretical formulation that relates the minimum spanning tree with the minimum spanning forest generated by the OPF over the training dataset. The experiments have shown that the approach proposed can be faster than the traditional one in five public datasets, being also as accurate as the original OPF. (C) 2014 Elsevier B. V. All rights reserved.
Resumo:
A description is provided of the software algorithms developed for the CMS tracker both for reconstructing charged-particle trajectories in proton-proton interactions and for using the resulting tracks to estimate the positions of the LHC luminous region and individual primary-interaction vertices. Despite the very hostile environment at the LHC, the performance obtained with these algorithms is found to be excellent. For t (t) over bar events under typical 2011 pileup conditions, the average track-reconstruction efficiency for promptly-produced charged particles with transverse momenta of p(T) > 0.9GeV is 94% for pseudorapidities of vertical bar eta vertical bar < 0.9 and 85% for 0.9 < vertical bar eta vertical bar < 2.5. The inefficiency is caused mainly by hadrons that undergo nuclear interactions in the tracker material. For isolated muons, the corresponding efficiencies are essentially 100%. For isolated muons of p(T) = 100GeV emitted at vertical bar eta vertical bar < 1.4, the resolutions are approximately 2.8% in p(T), and respectively, 10 m m and 30 mu m in the transverse and longitudinal impact parameters. The position resolution achieved for reconstructed primary vertices that correspond to interesting pp collisions is 10-12 mu m in each of the three spatial dimensions. The tracking and vertexing software is fast and flexible, and easily adaptable to other functions, such as fast tracking for the trigger, or dedicated tracking for electrons that takes into account bremsstrahlung.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Pós-graduação em Engenharia Elétrica - FEIS
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
This paper compares the effectiveness of the Tsallis entropy over the classic Boltzmann-Gibbs-Shannon entropy for general pattern recognition, and proposes a multi-q approach to improve pattern analysis using entropy. A series of experiments were carried out for the problem of classifying image patterns. Given a dataset of 40 pattern classes, the goal of our image case study is to assess how well the different entropies can be used to determine the class of a newly given image sample. Our experiments show that the Tsallis entropy using the proposed multi-q approach has great advantages over the Boltzmann-Gibbs-Shannon entropy for pattern classification, boosting image recognition rates by a factor of 3. We discuss the reasons behind this success, shedding light on the usefulness of the Tsallis entropy and the multi-q approach. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
Facial expression recognition is one of the most challenging research areas in the image recognition ¯eld and has been actively studied since the 70's. For instance, smile recognition has been studied due to the fact that it is considered an important facial expression in human communication, it is therefore likely useful for human–machine interaction. Moreover, if a smile can be detected and also its intensity estimated, it will raise the possibility of new applications in the future
Resumo:
Automatically recognizing faces captured under uncontrolled environments has always been a challenging topic in the past decades. In this work, we investigate cohort score normalization that has been widely used in biometric verification as means to improve the robustness of face recognition under challenging environments. In particular, we introduce cohort score normalization into undersampled face recognition problem. Further, we develop an effective cohort normalization method specifically for the unconstrained face pair matching problem. Extensive experiments conducted on several well known face databases demonstrate the effectiveness of cohort normalization on these challenging scenarios. In addition, to give a proper understanding of cohort behavior, we study the impact of the number and quality of cohort samples on the normalization performance. The experimental results show that bigger cohort set size gives more stable and often better results to a point before the performance saturates. And cohort samples with different quality indeed produce different cohort normalization performance. Recognizing faces gone after alterations is another challenging problem for current face recognition algorithms. Face image alterations can be roughly classified into two categories: unintentional (e.g., geometrics transformations introduced by the acquisition devide) and intentional alterations (e.g., plastic surgery). We study the impact of these alterations on face recognition accuracy. Our results show that state-of-the-art algorithms are able to overcome limited digital alterations but are sensitive to more relevant modifications. Further, we develop two useful descriptors for detecting those alterations which can significantly affect the recognition performance. In the end, we propose to use the Structural Similarity (SSIM) quality map to detect and model variations due to plastic surgeries. Extensive experiments conducted on a plastic surgery face database demonstrate the potential of SSIM map for matching face images after surgeries.
Resumo:
In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general. However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of "intelligence". The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them. CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems. HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain. In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.