535 resultados para "Haar classifiers"
Resumo:
Today, due to globalization of the world the size of data set is increasing, it is necessary to discover the knowledge. The discovery of knowledge can be typically in the form of association rules, classification rules, clustering, discovery of frequent episodes and deviation detection. Fast and accurate classifiers for large databases are an important task in data mining. There is growing evidence that integrating classification and association rules mining, classification approaches based on heuristic, greedy search like decision tree induction. Emerging associative classification algorithms have shown good promises on producing accurate classifiers. In this paper we focus on performance of associative classification and present a parallel model for classifier building. For classifier building some parallel-distributed algorithms have been proposed for decision tree induction but so far no such work has been reported for associative classification.
Resumo:
Electrocardiography (ECG) has been recently proposed as biometric trait for identification purposes. Intra-individual variations of ECG might affect identification performance. These variations are mainly due to Heart Rate Variability (HRV). In particular, HRV causes changes in the QT intervals along the ECG waveforms. This work is aimed at analysing the influence of seven QT interval correction methods (based on population models) on the performance of ECG-fiducial-based identification systems. In addition, we have also considered the influence of training set size, classifier, classifier ensemble as well as the number of consecutive heartbeats in a majority voting scheme. The ECG signals used in this study were collected from thirty-nine subjects within the Physionet open access database. Public domain software was used for fiducial points detection. Results suggested that QT correction is indeed required to improve the performance. However, there is no clear choice among the seven explored approaches for QT correction (identification rate between 0.97 and 0.99). MultiLayer Perceptron and Support Vector Machine seemed to have better generalization capabilities, in terms of classification performance, with respect to Decision Tree-based classifiers. No such strong influence of the training-set size and the number of consecutive heartbeats has been observed on the majority voting scheme.
Resumo:
In this report we summarize the state-of-the-art of speech emotion recognition from the signal processing point of view. On the bases of multi-corporal experiments with machine-learning classifiers, the observation is made that existing approaches for supervised machine learning lead to database dependent classifiers which can not be applied for multi-language speech emotion recognition without additional training because they discriminate the emotion classes following the used training language. As there are experimental results showing that Humans can perform language independent categorisation, we made a parallel between machine recognition and the cognitive process and tried to discover the sources of these divergent results. The analysis suggests that the main difference is that the speech perception allows extraction of language independent features although language dependent features are incorporated in all levels of the speech signal and play as a strong discriminative function in human perception. Based on several results in related domains, we have suggested that in addition, the cognitive process of emotion-recognition is based on categorisation, assisted by some hierarchical structure of the emotional categories, existing in the cognitive space of all humans. We propose a strategy for developing language independent machine emotion recognition, related to the identification of language independent speech features and the use of additional information from visual (expression) features.
Resumo:
This paper addresses the task of learning classifiers from streams of labelled data. In this case we can face the problem that the underlying concepts can change over time. The paper studies two mechanisms developed for dealing with changing concepts. Both are based on the time window idea. The first one forgets gradually, by assigning to the examples weight that gradually decreases over time. The second one uses a statistical test to detect changes in concept and then optimizes the size of the time window, aiming to maximise the classification accuracy on the new examples. Both methods are general in nature and can be used with any learning algorithm. The objectives of the conducted experiments were to compare the mechanisms and explore whether they can be combined to achieve a synergetic e ect. Results from experiments with three basic learning algorithms (kNN, ID3 and NBC) using four datasets are reported and discussed.
Resumo:
Bayesian algorithms pose a limit to the performance learning algorithms can achieve. Natural selection should guide the evolution of information processing systems towards those limits. What can we learn from this evolution and what properties do the intermediate stages have? While this question is too general to permit any answer, progress can be made by restricting the class of information processing systems under study. We present analytical and numerical results for the evolution of on-line algorithms for learning from examples for neural network classifiers, which might include or not a hidden layer. The analytical results are obtained by solving a variational problem to determine the learning algorithm that leads to maximum generalization ability. Simulations using evolutionary programming, for programs that implement learning algorithms, confirm and expand the results. The principal result is not just that the evolution is towards a Bayesian limit. Indeed it is essentially reached. In addition we find that evolution is driven by the discovery of useful structures or combinations of variables and operators. In different runs the temporal order of the discovery of such combinations is unique. The main result is that combinations that signal the surprise brought by an example arise always before combinations that serve to gauge the performance of the learning algorithm. This latter structures can be used to implement annealing schedules. The temporal ordering can be understood analytically as well by doing the functional optimization in restricted functional spaces. We also show that there is data suggesting that the appearance of these traits also follows the same temporal ordering in biological systems. © 2006 American Institute of Physics.
Resumo:
The problem of recognition on finite set of events is considered. The generalization ability of classifiers for this problem is studied within the Bayesian approach. The method for non-uniform prior distribution specification on recognition tasks is suggested. It takes into account the assumed degree of intersection between classes. The results of the analysis are applied for pruning of classification trees.
Resumo:
Full text: The idea of producing proteins from recombinant DNA hatched almost half a century ago. In his PhD thesis, Peter Lobban foresaw the prospect of inserting foreign DNA (from any source, including mammalian cells) into the genome of a λ phage in order to detect and recover protein products from Escherichia coli [ 1 and 2]. Only a few years later, in 1977, Herbert Boyer and his colleagues succeeded in the first ever expression of a peptide-coding gene in E. coli — they produced recombinant somatostatin [ 3] followed shortly after by human insulin. The field has advanced enormously since those early days and today recombinant proteins have become indispensable in advancing research and development in all fields of the life sciences. Structural biology, in particular, has benefitted tremendously from recombinant protein biotechnology, and an overwhelming proportion of the entries in the Protein Data Bank (PDB) are based on heterologously expressed proteins. Nonetheless, synthesizing, purifying and stabilizing recombinant proteins can still be thoroughly challenging. For example, the soluble proteome is organized to a large part into multicomponent complexes (in humans often comprising ten or more subunits), posing critical challenges for recombinant production. A third of all proteins in cells are located in the membrane, and pose special challenges that require a more bespoke approach. Recent advances may now mean that even these most recalcitrant of proteins could become tenable structural biology targets on a more routine basis. In this special issue, we examine progress in key areas that suggests this is indeed the case. Our first contribution examines the importance of understanding quality control in the host cell during recombinant protein production, and pays particular attention to the synthesis of recombinant membrane proteins. A major challenge faced by any host cell factory is the balance it must strike between its own requirements for growth and the fact that its cellular machinery has essentially been hijacked by an expression construct. In this context, Bill and von der Haar examine emerging insights into the role of the dependent pathways of translation and protein folding in defining high-yielding recombinant membrane protein production experiments for the common prokaryotic and eukaryotic expression hosts. Rather than acting as isolated entities, many membrane proteins form complexes to carry out their functions. To understand their biological mechanisms, it is essential to study the molecular structure of the intact membrane protein assemblies. Recombinant production of membrane protein complexes is still a formidable, at times insurmountable, challenge. In these cases, extraction from natural sources is the only option to prepare samples for structural and functional studies. Zorman and co-workers, in our second contribution, provide an overview of recent advances in the production of multi-subunit membrane protein complexes and highlight recent achievements in membrane protein structural research brought about by state-of-the-art near-atomic resolution cryo-electron microscopy techniques. E. coli has been the dominant host cell for recombinant protein production. Nonetheless, eukaryotic expression systems, including yeasts, insect cells and mammalian cells, are increasingly gaining prominence in the field. The yeast species Pichia pastoris, is a well-established recombinant expression system for a number of applications, including the production of a range of different membrane proteins. Byrne reviews high-resolution structures that have been determined using this methylotroph as an expression host. Although it is not yet clear why P. pastoris is suited to producing such a wide range of membrane proteins, its ease of use and the availability of diverse tools that can be readily implemented in standard bioscience laboratories mean that it is likely to become an increasingly popular option in structural biology pipelines. The contribution by Columbus concludes the membrane protein section of this volume. In her overview of post-expression strategies, Columbus surveys the four most common biochemical approaches for the structural investigation of membrane proteins. Limited proteolysis has successfully aided structure determination of membrane proteins in many cases. Deglycosylation of membrane proteins following production and purification analysis has also facilitated membrane protein structure analysis. Moreover, chemical modifications, such as lysine methylation and cysteine alkylation, have proven their worth to facilitate crystallization of membrane proteins, as well as NMR investigations of membrane protein conformational sampling. Together these approaches have greatly facilitated the structure determination of more than 40 membrane proteins to date. It may be an advantage to produce a target protein in mammalian cells, especially if authentic post-translational modifications such as glycosylation are required for proper activity. Chinese Hamster Ovary (CHO) cells and Human Embryonic Kidney (HEK) 293 cell lines have emerged as excellent hosts for heterologous production. The generation of stable cell-lines is often an aspiration for synthesizing proteins expressed in mammalian cells, in particular if high volumetric yields are to be achieved. In his report, Buessow surveys recent structures of proteins produced using stable mammalian cells and summarizes both well-established and novel approaches to facilitate stable cell-line generation for structural biology applications. The ambition of many biologists is to observe a protein's structure in the native environment of the cell itself. Until recently, this seemed to be more of a dream than a reality. Advances in nuclear magnetic resonance (NMR) spectroscopy techniques, however, have now made possible the observation of mechanistic events at the molecular level of protein structure. Smith and colleagues, in an exciting contribution, review emerging ‘in-cell NMR’ techniques that demonstrate the potential to monitor biological activities by NMR in real time in native physiological environments. A current drawback of NMR as a structure determination tool derives from size limitations of the molecule under investigation and the structures of large proteins and their complexes are therefore typically intractable by NMR. A solution to this challenge is the use of selective isotope labeling of the target protein, which results in a marked reduction of the complexity of NMR spectra and allows dynamic processes even in very large proteins and even ribosomes to be investigated. Kerfah and co-workers introduce methyl-specific isotopic labeling as a molecular tool-box, and review its applications to the solution NMR analysis of large proteins. Tyagi and Lemke next examine single-molecule FRET and crosslinking following the co-translational incorporation of non-canonical amino acids (ncAAs); the goal here is to move beyond static snap-shots of proteins and their complexes and to observe them as dynamic entities. The encoding of ncAAs through codon-suppression technology allows biomolecules to be investigated with diverse structural biology methods. In their article, Tyagi and Lemke discuss these approaches and speculate on the design of improved host organisms for ‘integrative structural biology research’. Our volume concludes with two contributions that resolve particular bottlenecks in the protein structure determination pipeline. The contribution by Crepin and co-workers introduces the concept of polyproteins in contemporary structural biology. Polyproteins are widespread in nature. They represent long polypeptide chains in which individual smaller proteins with different biological function are covalently linked together. Highly specific proteases then tailor the polyprotein into its constituent proteins. Many viruses use polyproteins as a means of organizing their proteome. The concept of polyproteins has now been exploited successfully to produce hitherto inaccessible recombinant protein complexes. For instance, by means of a self-processing synthetic polyprotein, the influenza polymerase, a high-value drug target that had remained elusive for decades, has been produced, and its high-resolution structure determined. In the contribution by Desmyter and co-workers, a further, often imposing, bottleneck in high-resolution protein structure determination is addressed: The requirement to form stable three-dimensional crystal lattices that diffract incident X-ray radiation to high resolution. Nanobodies have proven to be uniquely useful as crystallization chaperones, to coax challenging targets into suitable crystal lattices. Desmyter and co-workers review the generation of nanobodies by immunization, and highlight the application of this powerful technology to the crystallography of important protein specimens including G protein-coupled receptors (GPCRs). Recombinant protein production has come a long way since Peter Lobban's hypothesis in the late 1960s, with recombinant proteins now a dominant force in structural biology. The contributions in this volume showcase an impressive array of inventive approaches that are being developed and implemented, ever increasing the scope of recombinant technology to facilitate the determination of elusive protein structures. Powerful new methods from synthetic biology are further accelerating progress. Structure determination is now reaching into the living cell with the ultimate goal of observing functional molecular architectures in action in their native physiological environment. We anticipate that even the most challenging protein assemblies will be tackled by recombinant technology in the near future.
Resumo:
Membrane protein structural biology is critically dependent upon the supply of high-quality protein. Over the last few years, the value of crystallising biochemically characterised, recombinant targets that incorporate stabilising mutations has been established. Nonetheless, obtaining sufficient yields of many recombinant membrane proteins is still a major challenge. Solutions are now emerging based on an improved understanding of recombinant host cells; as a 'cell factory' each cell is tasked with managing limited resources to simultaneously balance its own growth demands with those imposed by an expression plasmid. This review examines emerging insights into the role of translation and protein folding in defining high-yielding recombinant membrane protein production in a range of host cells.
Resumo:
When combining remote sensing imagery with statistical classifiers to obtain categorical thematic maps it is not usual to provide data about the spatial distribution of the error and uncertainty of the resulting maps. This paper describes, in the context of GeoViQua FP7 project, feasible approaches for methods based on several steps such as hybrid classifiers. Both for “per pixel” and “per polygon” strategies, the proposal is based on the use of the available ground truth, which is used to properly model the spatial distribution of the errors. Results allow mapping the classification success with a very high level of reliability (R2>0,94), providing users a sound knowledge of the accuracy at every area of the map.
Resumo:
Identification of humans via ECG is being increasingly studied because it can have several advantages over the traditional biometric identification techniques. However, difficulties arise because of the heartrate variability. In this study we analysed the influence of QT interval correction on the performance of an identification system based on temporal and amplitude features of ECG. In particular we tested MLP, Naive Bayes and 3-NN classifiers on the Fantasia database. Results indicate that QT correction can significantly improve the overall system performance. © 2013 IEEE.
Resumo:
his article presents some of the results of the Ph.D. thesis Class Association Rule Mining Using MultiDimensional Numbered Information Spaces by Iliya Mitov (Institute of Mathematics and Informatics, BAS), successfully defended at Hasselt University, Faculty of Science on 15 November 2011 in Belgium
Resumo:
Resource discovery is one of the key services in digitised cultural heritage collections. It requires intelligent mining in heterogeneous digital content as well as capabilities in large scale performance; this explains the recent advances in classification methods. Associative classifiers are convenient data mining tools used in the field of cultural heritage, by applying their possibilities to taking into account the specific combinations of the attribute values. Usually, the associative classifiers prioritize the support over the confidence. The proposed classifier PGN questions this common approach and focuses on confidence first by retaining only 100% confidence rules. The classification tasks in the field of cultural heritage usually deal with data sets with many class labels. This variety is caused by the richness of accumulated culture during the centuries. Comparisons of classifier PGN with other classifiers, such as OneR, JRip and J48, show the competitiveness of PGN in recognizing multi-class datasets on collections of masterpieces from different West and East European Fine Art authors and movements.
Resumo:
User queries over image collections, based on semantic similarity, can be processed in several ways. In this paper, we propose to reuse the rules produced by rule-based classifiers in their recognition models as query pattern definitions for searching image collections.
Resumo:
ACM Computing Classification System (1998): H.2.8, H.3.3.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014