955 resultados para Cluster Ensemble Learning


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Algorithms for concept drift handling are important for various applications including video analysis and smart grids. In this paper we present decision tree ensemble classication method based on the Random Forest algorithm for concept drift. The weighted majority voting ensemble aggregation rule is employed based on the ideas of Accuracy Weighted Ensemble (AWE) method. Base learner weight in our case is computed for each sample evaluation using base learners accuracy and intrinsic proximity measure of Random Forest. Our algorithm exploits both temporal weighting of samples and ensemble pruning as a forgetting strategy. We present results of empirical comparison of our method with îriginal random forest with incorporated replace-the-looser forgetting andother state-of-the-art concept-drift classiers like AWE2.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Os classificadores múltiplos são processos que utilizam um conjunto de modelos, cada um deles obtido pela aplicação de um processo de aprendizagem para um problema dado. Combinam vários classificadores individuais, em que para cada um deles são utilizados dados de treino para gerar limites de decisão diferentes. As decisões produzidas pelos classificadores individuais contém erros, que são combinados pelos classificadores múltiplos de forma a reduzir o erro total. Estes têm vindo a ganhar uma crescente importância devido principalmente ao facto de permitirem obter um melhor desempenho quando comparado com o obtido por qualquer um dos modelos que o compõem, principalmente quando as correlações entre os erros cometidos pelos modelos de base são baixos. A investigação nesta área tem crescido, tornando-se uma área de investigação importante. No entanto, para que o desempenho seja melhor do que o desempenho obtido por cada classificador individualmente, é necessário que cada um deles produza uma decisão diferente originando uma diversidade de classificação. Esta diversidade pode ser obtida tanto pela utilização de diferentes conjuntos de dados para o treino individual de cada classificador, como também pela utilização de diferentes parâmetros de formação de diferentes classificadores. Apesar disso, a utilização de classificadores múltiplos para aplicações no mundo real pode apresentar-se como dispendiosa e morosa. Tem-se notado nos dias de hoje que o desenvolvimento web tem vindo a crescer exponencialmente, assim como o uso de bases de dados. Desta forma, combinando a forte utilização da linguagem R para cálculos estatísticos com a crescente utilização das tecnologias web, foi implementado um protótipo que facilitasse a utilização dos classificadores múltiplos, mais precisamente, foi desenvolvida uma aplicação web que permitisse o teste para aprendizagem com classificadores múltiplos, sendo utilizadas as tecnologias PHP, R e MySQL. Com esta aplicação pretende-se que seja possível testar algoritmos independentes do software em que estejam desenvolvidos, não sendo necessariamente escritos em R. Nesta Dissertação foi utilizada a expressão “classificadores múltiplos” por ser a mais comum, apesar de ser redutora e existirem outros termos mais genéricos como por exemplo modelos múltiplos e ensemble learning.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

While news stories are an important traditional medium to broadcast and consume news, microblogging has recently emerged as a place where people can dis- cuss, disseminate, collect or report information about news. However, the massive information in the microblogosphere makes it hard for readers to keep up with these real-time updates. This is especially a problem when it comes to breaking news, where people are more eager to know “what is happening”. Therefore, this dis- sertation is intended as an exploratory effort to investigate computational methods to augment human effort when monitoring the development of breaking news on a given topic from a microblog stream by extractively summarizing the updates in a timely manner. More specifically, given an interest in a topic, either entered as a query or presented as an initial news report, a microblog temporal summarization system is proposed to filter microblog posts from a stream with three primary concerns: topical relevance, novelty, and salience. Considering the relatively high arrival rate of microblog streams, a cascade framework consisting of three stages is proposed to progressively reduce quantity of posts. For each step in the cascade, this dissertation studies methods that improve over current baselines. In the relevance filtering stage, query and document expansion techniques are applied to mitigate sparsity and vocabulary mismatch issues. The use of word embedding as a basis for filtering is also explored, using unsupervised and supervised modeling to characterize lexical and semantic similarity. In the novelty filtering stage, several statistical ways of characterizing novelty are investigated and ensemble learning techniques are used to integrate results from these diverse techniques. These results are compared with a baseline clustering approach using both standard and delay-discounted measures. In the salience filtering stage, because of the real-time prediction requirement a method of learning verb phrase usage from past relevant news reports is used in conjunction with some standard measures for characterizing writing quality. Following a Cranfield-like evaluation paradigm, this dissertation includes a se- ries of experiments to evaluate the proposed methods for each step, and for the end- to-end system. New microblog novelty and salience judgments are created, building on existing relevance judgments from the TREC Microblog track. The results point to future research directions at the intersection of social media, computational jour- nalism, information retrieval, automatic summarization, and machine learning.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

E-Learning frameworks are conceptual tools to organize networks of elearning services. Most frameworks cover areas that go beyond the scope of e-learning, from course to financial management, and neglects the typical activities in everyday life of teachers and students at schools such as the creation, delivery, resolution and evaluation of assignments. This paper presents the Ensemble framework - an e-learning framework exclusively focused on the teaching-learning process through the coordination of pedagogical services. The framework presents an abstract data, integration and evaluation model based on content and communications specifications. These specifications must base the implementation of networks in specialized domains with complex evaluations. In this paper we specialize the framework for two domains with complex evaluation: computer programming and computer-aided design (CAD). For each domain we highlight two Ensemble hotspots: data and evaluations procedures. In the former we formally describe the exercise and present possible extensions. In the latter, we describe the automatic evaluation procedures.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article offers a review of the literature on interprofessional education (EIP), a form of education which brings together members of two or more professions in a joint training. In this course, participants gain knowledge through other professionals and about them. The goal of EIP is to improve collaboration between health professionals and the quality of patient care. The EIP is booming worldwide and seems for from a mere fad. This expansion can be explained by several factors: the increasing importance attributed to the quality of care and patient safety, care changes (aging population and increasing chronic diseases) and the shortage of health professionals. The expectations of the EIP are large, while the evidence supporting its effectiveness is being built.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer queried and sampling is focused on division of clusters showing mixed labels. The model is tested on a VHR image in a multiclass classification setting. The method clearly outperforms random sampling in a transductive setting, but cannot generalize to unseen data, since it aims at optimizing the classification of a given cluster structure.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Parkinson's disease (PD) is the second most common neurodegenerative disorder (after Alzheimer's disease) and directly affects upto 5 million people worldwide. The stages (Hoehn and Yaar) of disease has been predicted by many methods which will be helpful for the doctors to give the dosage according to it. So these methods were brought up based on the data set which includes about seventy patients at nine clinics in Sweden. The purpose of the work is to analyze unsupervised technique with supervised neural network techniques in order to make sure the collected data sets are reliable to make decisions. The data which is available was preprocessed before calculating the features of it. One of the complex and efficient feature called wavelets has been calculated to present the data set to the network. The dimension of the final feature set has been reduced using principle component analysis. For unsupervised learning k-means gives the closer result around 76% while comparing with supervised techniques. Back propagation and J4 has been used as supervised model to classify the stages of Parkinson's disease where back propagation gives the variance percentage of 76-82%. The results of both these models have been analyzed. This proves that the data which are collected are reliable to predict the disease stages in Parkinson's disease.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: To evaluate the sensitivity and specificity of machine learning classifiers (MLCs) for glaucoma diagnosis using Spectral Domain OCT (SD-OCT) and standard automated perimetry (SAP). METHODS: Observational cross-sectional study. Sixty two glaucoma patients and 48 healthy individuals were included. All patients underwent a complete ophthalmologic examination, achromatic standard automated perimetry (SAP) and retinal nerve fiber layer (RNFL) imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec Inc., Dublin, California). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters and global indices of SAP. Subsequently, the following MLCs were tested using parameters from the SD-OCT and SAP: Bagging (BAG), Naive-Bayes (NB), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RAN), Ensemble Selection (ENS), Classification Tree (CTREE), Ada Boost M1(ADA),Support Vector Machine Linear (SVML) and Support Vector Machine Gaussian (SVMG). Areas under the receiver operating characteristic curves (aROC) obtained for isolated SAP and OCT parameters were compared with MLCs using OCT+SAP data. RESULTS: Combining OCT and SAP data, MLCs' aROCs varied from 0.777(CTREE) to 0.946 (RAN).The best OCT+SAP aROC obtained with RAN (0.946) was significantly larger the best single OCT parameter (p<0.05), but was not significantly different from the aROC obtained with the best single SAP parameter (p=0.19). CONCLUSION: Machine learning classifiers trained on OCT and SAP data can successfully discriminate between healthy and glaucomatous eyes. The combination of OCT and SAP measurements improved the diagnostic accuracy compared with OCT data alone.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As e-learning gradually evolved many specialized and disparate systems appeared to fulfil the needs of teachers and students, such as repositories of learning objects, authoring tools, intelligent tutors and automatic evaluators. This heterogeneity raises interoperability issues giving the standardization of content an important role in e-learning. This article presents a survey on current e-learning content aggregation standards focusing on their internal organization and packaging. This study is part of an effort to choose the most suitable specifications and standards for an e-learning framework called Ensemble defined as a conceptual tool to organize a network of e-learning systems and services for domains with complex evaluation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Local Tourist Systems (LTS) can be analyzed according to an investigation structure that derives from industrial economics on industrial districts, local productive systems or learning regions. LTS concept is a useful analytical tool that can seize the resorts diversity and organization. Resorts can be conceived both as clusters or industrial districts, either with a perfect agreement between productive sphere and local community or a mere industrial juxtaposition without any economic or social connection. On the other hand tourist clusters analysis has cross referred almost exclusively to socio-economic criteria. Environmental issues were almost disregarded. Approaches swing from the “greening” of products and practices to initiatives focused on an integrated approach, linking environment and tourist development. This paper tries to discuss how to favor – inside a tourist destination - the creation of clusters grounded on sustainable tourism. The case studies (the 5 Alentejo Natural Reserves: Estuário do Sado; Lagoas de Santo André e da Sancha; Vale do Guadiana; Sudoeste Alentejano e Costa Vicentina; Serra de S. Mamede) are analyzed under the light of how microstructures groups can allow a territorial sustainable tourist development. The issues of “resources and competences” and “governance” ar

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Currently, the teaching-learning process in domains, such as computer programming, is characterized by an extensive curricula and a high enrolment of students. This poses a great workload for faculty and teaching assistants responsible for the creation, delivery, and assessment of student exercises. The main goal of this chapter is to foster practice-based learning in complex domains. This objective is attained with an e-learning framework—called Ensemble—as a conceptual tool to organize and facilitate technical interoperability among services. The Ensemble framework is used on a specific domain: computer programming. Content issues are tacked with a standard format to describe programming exercises as learning objects. Communication is achieved with the extension of existing specifications for the interoperation with several systems typically found in an e-learning environment. In order to evaluate the acceptability of the proposed solution, an Ensemble instance was validated on a classroom experiment with encouraging results.