869 resultados para hierarchical classification system
Resumo:
Stream-mining approach is defined as a set of cutting-edge techniques designed to process streams of data in real time, in order to extract knowledge. In the particular case of classification, stream-mining has to adapt its behaviour to the volatile underlying data distributions, what has been called concept drift. Moreover, it is important to note that concept drift may lead to situations where predictive models become invalid and have therefore to be updated to represent the actual concepts that data poses. In this context, there is a specific type of concept drift, known as recurrent concept drift, where the concepts represented by data have already appeared in the past. In those cases the learning process could be saved or at least minimized by applying a previously trained model. This could be extremely useful in ubiquitous environments that are characterized by the existence of resource constrained devices. To deal with the aforementioned scenario, meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems (IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. By means of using meta-models as a recurrent drift detection mechanism, the ability to share concepts representations among different data mining processes is open. That kind of exchanges could improve the accuracy of the resultant local model as such model may benefit from patterns similar to the local concept that were observed in other scenarios, but not yet locally. This would also improve the efficiency of training instances used during the classification process, as long as the exchange of models would aid in the application of already trained recurrent models, that have been previously seen by any of the collaborative devices. Which it is to say that the scope of recurrence detection and representation is broaden. In fact the detection, representation and exchange of concept drift patterns would be extremely useful for the law enforcement activities fighting against cyber crime. Being the information exchange one of the main pillars of cooperation, national units would benefit from the experience and knowledge gained by third parties. Moreover, in the specific scope of critical infrastructures protection it is crucial to count with information exchange mechanisms, both from a strategical and technical scope. The exchange of concept drift detection schemes in cyber security environments would aid in the process of preventing, detecting and effectively responding to threads in cyber space. Furthermore, as a complement of meta-models, a mechanism to assess the similarity between classification models is also needed when dealing with recurrent concepts. In this context, when reusing a previously trained model a rough comparison between concepts is usually made, applying boolean logic. The introduction of fuzzy logic comparisons between models could lead to a better efficient reuse of previously seen concepts, by applying not just equal models, but also similar ones. This work faces the aforementioned open issues by means of: the MMPRec system, that integrates a meta-model mechanism and a fuzzy similarity function; a collaborative environment to share meta-models between different devices; a recurrent drift generator that allows to test the usefulness of recurrent drift systems, as it is the case of MMPRec. Moreover, this thesis presents an experimental validation of the proposed contributions using synthetic and real datasets.
Resumo:
KIF (kinesin superfamily) proteins are microtubule-dependent molecular motors that play important roles in intracellular transport and cell division. The extent to which KIFs are involved in various transporting phenomena, as well as their regulation mechanism, are unknown. The identification of 16 new KIFs in this report doubles the existing number of KIFs known in the mouse. Conserved nucleotide sequences in the motor domain were amplified by PCR using cDNAs of mouse nervous tissue, kidney, and small intestine as templates. The new KIFs were studied with respect to their expression patterns in different tissues, chromosomal location, and molecular evolution. Our results suggest that (i) there is no apparent tendency among related subclasses of KIFs of cosegregation in chromosomal mapping, and (ii) according to their tissue distribution patterns, KIFs can be divided into two classes–i.e., ubiquitous and specific tissue-dominant. Further characterization of KIFs may elucidate unknown fundamental phenomena underlying intracellular transport. Finally, we propose a straightforward nomenclature system for the members of the mouse kinesin superfamily.
Resumo:
Planning a goal-directed sequence of behavior is a higher function of the human brain that relies on the integrity of prefrontal cortical areas. In the Tower of London test, a puzzle in which beads sliding on pegs must be moved to match a designated goal configuration, patients with lesioned prefrontal cortex show deficits in planning a goal-directed sequence of moves. We propose a neuronal network model of sequence planning that passes this test and, when lesioned, fails in a way that mimics prefrontal patients’ behavior. Our model comprises a descending planning system with hierarchically organized plan, operation, and gesture levels, and an ascending evaluative system that analyzes the problem and computes internal reward signals that index the correct/erroneous status of the plan. Multiple parallel pathways connecting the evaluative and planning systems amend the plan and adapt it to the current problem. The model illustrates how specialized hierarchically organized neuronal assemblies may collectively emulate central executive or supervisory functions of the human brain.
Resumo:
Sequence-specific transactivation by p53 is essential to its role as a tumor suppressor. A modified tetracycline-inducible system was established to search for transcripts that were activated soon after p53 induction. Among 9,954 unique transcripts identified by serial analysis of gene expression, 34 were increased more than 10-fold; 31 of these had not previously been known to be regulated by p53. The transcription patterns of these genes, as well as previously described p53-regulated genes, were evaluated and classified in a panel of widely studied colorectal cancer cell lines. “Class I” genes were uniformly induced by p53 in all cell lines; “class II” genes were induced in a subset of the lines; and “class III” genes were not induced in any of the lines. These genes were also distinguished by the timing of their induction, their induction by clinically relevant chemotherapeutic agents, the absolute requirement for p53 in this induction, and their inducibility by p73, a p53 homolog. The results revealed substantial heterogeneity in the transcriptional responses to p53, even in cells derived from a single epithelial cell type, and pave the way to a deeper understanding of p53 tumor suppressor action.
Resumo:
The Dali Domain Dictionary (http://www.ebi.ac.uk/dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families.
Resumo:
VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html.
Resumo:
The iProClass database is an integrated resource that provides comprehensive family relationships and structural and functional features of proteins, with rich links to various databases. It is extended from ProClass, a protein family database that integrates PIR superfamilies and PROSITE motifs. The iProClass currently consists of more than 200 000 non-redundant PIR and SWISS-PROT proteins organized with more than 28 000 superfamilies, 2600 domains, 1300 motifs, 280 post-translational modification sites and links to more than 30 databases of protein families, structures, functions, genes, genomes, literature and taxonomy. Protein and family summary reports provide rich annotations, including membership information with length, taxonomy and keyword statistics, full family relationships, comprehensive enzyme and PDB cross-references and graphical feature display. The database facilitates classification-driven annotation for protein sequence databases and complete genomes, and supports structural and functional genomic research. The iProClass is implemented in Oracle 8i object-relational system and available for sequence search and report retrieval at http://pir.georgetow n.edu/iproclass/.
Resumo:
Despite the vast research examining the evolution of Caribbean education systems, little is chronologically tied to the postcolonial theoretical perspectives of specific island-state systems, such as the Jamaican education system and its relationship with the underground shadow education system. This dissertation study sought to address the gaps in the literature by critically positioning postcolonial theories in education to examine the macro- and micro-level impacts of extra lessons on secondary education in Jamaica. The following postcolonial theoretical (PCT) tenets in education were contextualized from a review of the literature: (a) PCT in education uses colonial discourse analysis to critically deconstruct and decolonize imperialistic and colonial representations of knowledge throughout history; (b) PCT in education uses an anti-colonial discursive framework to re-position indigenous knowledge in schools, colleges, and universities to challenge hegemonic knowledge; (c) PCT in education involves the "unlearning" of dominant, normative ideologies, the use of self-reflexivity, and deconstruction; and (d) PCT in education calls for critical pedagogical approaches that reject the banking concept of education and introduces inclusive pedagogy to facilitate "the passage from naïve to critical transitivity" (Freire, 1973, p. 32). Specifically, using a transformative mixed-methods design, grounded and informed by a postcolonial theoretical lens, I quantitatively uncovered and then qualitatively highlighted how if at all extra lessons can improve educational outcomes for students at the secondary level in Jamaica. Accordingly, the quantitative data was used to test the hypotheses that the practice of extra lessons in schools is related to student academic achievement and the practice of critical-inclusive pedagogy in extra lessons is related to academic achievement. The two-level hierarchical linear model analysis revealed that hours spent in extra lessons, average household monthly income, and critical-inclusive pedagogical tents were the best predictors for academic achievement. Alternatively, the holistic multi-case study explored how extra-lessons produces increased academic achievement. The data revealed new ways of knowledge construction and critical pedagogical approaches to galvanize systemic change in secondary education. Furthermore, the data showed that extra lessons can improve educational outcomes for students at the secondary level if the conditions for learning are met. This study sets the stage for new forms of knowledge construction and implications for policy change.
Resumo:
The present is marked by the availability of large volumes of heterogeneous data, whose management is extremely complex. While the treatment of factual data has been widely studied, the processing of subjective information still poses important challenges. This is especially true in tasks that combine Opinion Analysis with other challenges, such as the ones related to Question Answering. In this paper, we describe the different approaches we employed in the NTCIR 8 MOAT monolingual English (opinionatedness, relevance, answerness and polarity) and cross-lingual English-Chinese tasks, implemented in our OpAL system. The results obtained when using different settings of the system, as well as the error analysis performed after the competition, offered us some clear insights on the best combination of techniques, that balance between precision and recall. Contrary to our initial intuitions, we have also seen that the inclusion of specialized Natural Language Processing tools dealing with Temporality or Anaphora Resolution lowers the system performance, while the use of topic detection techniques using faceted search with Wikipedia and Latent Semantic Analysis leads to satisfactory system performance, both for the monolingual setting, as well as in a multilingual one.
Resumo:
In the chemical textile domain experts have to analyse chemical components and substances that might be harmful for their usage in clothing and textiles. Part of this analysis is performed searching opinions and reports people have expressed concerning these products in the Social Web. However, this type of information on the Internet is not as frequent for this domain as for others, so its detection and classification is difficult and time-consuming. Consequently, problems associated to the use of chemical substances in textiles may not be detected early enough, and could lead to health problems, such as allergies or burns. In this paper, we propose a framework able to detect, retrieve, and classify subjective sentences related to the chemical textile domain, that could be integrated into a wider health surveillance system. We also describe the creation of several datasets with opinions from this domain, the experiments performed using machine learning techniques and different lexical resources such as WordNet, and the evaluation focusing on the sentiment classification, and complaint detection (i.e., negativity). Despite the challenges involved in this domain, our approach obtains promising results with an F-score of 65% for polarity classification and 82% for complaint detection.
Resumo:
Prototype Selection (PS) algorithms allow a faster Nearest Neighbor classification by keeping only the most profitable prototypes of the training set. In turn, these schemes typically lower the performance accuracy. In this work a new strategy for multi-label classifications tasks is proposed to solve this accuracy drop without the need of using all the training set. For that, given a new instance, the PS algorithm is used as a fast recommender system which retrieves the most likely classes. Then, the actual classification is performed only considering the prototypes from the initial training set belonging to the suggested classes. Results show that this strategy provides a large set of trade-off solutions which fills the gap between PS-based classification efficiency and conventional kNN accuracy. Furthermore, this scheme is not only able to, at best, reach the performance of conventional kNN with barely a third of distances computed, but it does also outperform the latter in noisy scenarios, proving to be a much more robust approach.
Resumo:
Feature selection is an important and active issue in clustering and classification problems. By choosing an adequate feature subset, a dataset dimensionality reduction is allowed, thus contributing to decreasing the classification computational complexity, and to improving the classifier performance by avoiding redundant or irrelevant features. Although feature selection can be formally defined as an optimisation problem with only one objective, that is, the classification accuracy obtained by using the selected feature subset, in recent years, some multi-objective approaches to this problem have been proposed. These either select features that not only improve the classification accuracy, but also the generalisation capability in case of supervised classifiers, or counterbalance the bias toward lower or higher numbers of features that present some methods used to validate the clustering/classification in case of unsupervised classifiers. The main contribution of this paper is a multi-objective approach for feature selection and its application to an unsupervised clustering procedure based on Growing Hierarchical Self-Organising Maps (GHSOMs) that includes a new method for unit labelling and efficient determination of the winning unit. In the network anomaly detection problem here considered, this multi-objective approach makes it possible not only to differentiate between normal and anomalous traffic but also among different anomalies. The efficiency of our proposals has been evaluated by using the well-known DARPA/NSL-KDD datasets that contain extracted features and labelled attacks from around 2 million connections. The selected feature sets computed in our experiments provide detection rates up to 99.8% with normal traffic and up to 99.6% with anomalous traffic, as well as accuracy values up to 99.12%.
Resumo:
v. 7 (1806)
Resumo:
1-2
Resumo:
2