937 resultados para data integration


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, two evolutionary artificial neural network (EANN) models that are based on integration of two supervised adaptive resonance theory (ART)-based artificial neural networks with a hybrid genetic algorithm (HGA) are proposed. The search process of the proposed EANN models is guided by a knowledge base established by ART with respect to the training data samples. The EANN models explore the search space for “coarse” solutions, and such solutions are then refined using the local search process of the HGA. The performances of the proposed EANN models are evaluated and compared with those from other classifiers using more than ten benchmark data sets. The applicability of the EANN models to a real medical classification task is also demonstrated. The results from the experimental studies demonstrate the effectiveness and usefulness of the proposed EANN models in undertaking pattern classification problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Australian Museums Online (AMOL) was the earliest attempt to make Australia’s distributed cultural collections accessible from a single online resource. Despite early successes, significant achievements and the considerable value it offered certain groups, the project ran into operational difficulties and was eventually discontinued. By using Actor-Network Theory and analysing the global and local actor-networks, it is revealed that although the project originated from large, state museums, buy-in was restricted to individuals, rather than institutions and the most significant value was for smaller, regional institutions. Furthermore, although the global networks that governed the project could translate their visions through the local production networks, because the network’s underlying weaknesses were never addressed, over time this destablised the global networks. This case study offers advice for projects attempting to consolidate data sources from disparate sources, and highlights the importance of individual actors in championing the project.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The work presented in this paper focuses on fitting of a neural mass model to EEG data. Neurophysiology inspired mathematical models were developed for simulating brain's electrical activity imaged through Electroencephalography (EEG) more than three decades ago. At the present well informative models which even describe the functional integration of cortical regions also exists. However, a very limited amount of work is reported in literature on the subject of model fitting to actual EEG data. Here, we present a Bayesian approach for parameter estimation of the EEG model via a marginalized Markov Chain Monte Carlo (MCMC) approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the issues associated with pattern classification using data based machine learning systems is the “curse of dimensionality”. In this paper, the circle-segments method is proposed as a feature selection method to identify important input features before the entire data set is provided for learning with machine learning systems. Specifically, four machine learning systems are deployed for classification, viz. Multilayer Perceptron (MLP), Support Vector Machine (SVM), Fuzzy ARTMAP (FAM), and k-Nearest Neighbour (kNN). The integration between the circle-segments method and the machine learning systems has been applied to two case studies comprising one benchmark and one real data sets. Overall, the results after feature selection using the circle segments method demonstrate improvements in performance even with more than 50% of the input features eliminated from the original data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Previous research has indicated that undergraduate student learning can be enhanced through active involvement in research. Furthermore, creating an academic environment where teaching and research are intimately linked can facilitate the induction of students into a community of learners where new knowledge is created, explored and critiqued. Scaffolding and supporting student learning via engagement in authentic research experiences can work to ensure graduating students have the capacity to generate and investigate important questions that contributes to the development of new knowledge. This paper presents a case study that outlines curriculum design and pedagogical strategies aimed at integrating teaching and research within the first year of an undergraduate course. First year Food and Nutrition students were asked to partake in a research project where they were asked to complete a series of diet and food related questionnaires, analyse, interpret and critique the resulting data. Students were supported through this learning activity via small group tutorial support and question and answer sessions within the learning management system. Anonymous evaluation of the teaching and learning experience was conducted at the end of the teaching period and the results indicate that the students welcomed the opportunity to engage in an authentic, research based learning activity. Students’ found the assessment tasks were clearly explained to them (88% agreeing), and felt well supported in approaching this research based assessment task. Furthermore, the qualitative comments indicated that the students’ found the learning environment to be meaningful and relevant. This case study indicates that it is possible to effectively incorporate authentic research experiences within the curriculum of a first year course. The experiential, inquiry based learning approach used supported the students’ participation in a systematic, rigorous data collection process required in a structured research environment and blended these requirements with authentic learning of discipline specific skills and knowledge.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The calculation of the first few moments of elution peaks is necessary to determine: the amount of component in the sample (peak area or zeroth moment), the retention factor (first moment), and the column efficiency (second moment). It is a time consuming and tedious task for the analyst to perform these calculations, thus data analysis is generally completed by data stations associated to modern chromatographs. However, data acquisition software is a black box which provides no information to chromatographers on how their data are treated. These results are too important to be accepted on blind faith. The location of the peak integration boundaries is most important. In this manuscript, we explore the relationships between the size of the integration area, the relative position of the peak maximum within this area, and the accuracy of the calculated moments. We found that relationships between these parameters do exist and that computers can be programmed with relatively simple routines to automatize the extraction of key peak parameters and to select acceptable integration boundaries. It was also found that the most accurate results are obtained when the S/N exceeds 200.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we demonstrate our signature based detector for self-propagating worms. We use a set of worm and benign traffic traces of several endpoints to build benign and worm profiles. These profiles were arranged into separate n-ary trees. We also demonstrate our anomaly detector that was used to deal with tied matches between worm and benign trees. We analyzed the performance of each detector and also with their integration. Results show that our signature based detector can detect very high true positive. Meanwhile, the anomaly detector did not achieve high true positive. Both detectors, when used independently, suffer high false positive. However, when both detectors were integrated they maintained a high detection rate of true positive and minimized the false positive

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Healthcare plays an important role in promoting the general health and well-being of people around the world. The difficulty in healthcare data classification arises from the uncertainty and the high-dimensional nature of the medical data collected. This paper proposes an integration of fuzzy standard additive model (SAM) with genetic algorithm (GA), called GSAM, to deal with uncertainty and computational challenges. GSAM learning process comprises three continual steps: rule initialization by unsupervised learning using the adaptive vector quantization clustering, evolutionary rule optimization by GA and parameter tuning by the gradient descent supervised learning. Wavelet transformation is employed to extract discriminative features for high-dimensional datasets. GSAM becomes highly capable when deployed with small number of wavelet features as its computational burden is remarkably reduced. The proposed method is evaluated using two frequently-used medical datasets: the Wisconsin breast cancer and Cleveland heart disease from the UCI Repository for machine learning. Experiments are organized with a five-fold cross validation and performance of classification techniques are measured by a number of important metrics: accuracy, F-measure, mutual information and area under the receiver operating characteristic curve. Results demonstrate the superiority of the GSAM compared to other machine learning methods including probabilistic neural network, support vector machine, fuzzy ARTMAP, and adaptive neuro-fuzzy inference system. The proposed approach is thus helpful as a decision support system for medical practitioners in the healthcare practice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a method to classify EEG signals using features extracted by an integration of wavelet transform and the nonparametric Wilcoxon test. Orthogonal Haar wavelet coefficients are ranked based on the Wilcoxon test’s statistics. The most prominent discriminant wavelets are assembled to form a feature set that serves as inputs to the naïve Bayes classifier. Two benchmark datasets, named Ia and Ib, downloaded from the brain–computer interface (BCI) competition II are employed for the experiments. Classification performance is evaluated using accuracy, mutual information, Gini coefficient and F-measure. Widely used classifiers, including feedforward neural network, support vector machine, k-nearest neighbours, ensemble learning Adaboost and adaptive neuro-fuzzy inference system, are also implemented for comparisons. The proposed combination of Haar wavelet features and naïve Bayes classifier considerably dominates the competitive classification approaches and outperforms the best performance on the Ia and Ib datasets reported in the BCI competition II. Application of naïve Bayes also provides a low computational cost approach that promotes the implementation of a potential real-time BCI system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Human genome sequencing has enabled the association of phenotypes with genetic loci, but our ability to effectively translate this data to the clinic has not kept pace. Over the past 60 years, pharmaceutical companies have successfully demonstrated the safety and efficacy of over 1,200 novel therapeutic drugs via costly clinical studies. While this process must continue, better use can be made of the existing valuable data. In silico tools such as candidate gene prediction systems allow rapid identification of disease genes by identifying the most probable candidate genes linked to genetic markers of the disease or phenotype under investigation. Integration of drug-target data with candidate gene prediction systems can identify novel phenotypes which may benefit from current therapeutics. Such a drug repositioning tool can save valuable time and money spent on preclinical studies and phase I clinical trials.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces an automated medical data classification method using wavelet transformation (WT) and interval type-2 fuzzy logic system (IT2FLS). Wavelet coefficients, which serve as inputs to the IT2FLS, are a compact form of original data but they exhibits highly discriminative features. The integration between WT and IT2FLS aims to cope with both high-dimensional data challenge and uncertainty. IT2FLS utilizes a hybrid learning process comprising unsupervised structure learning by the fuzzy c-means (FCM) clustering and supervised parameter tuning by genetic algorithm. This learning process is computationally expensive, especially when employed with high-dimensional data. The application of WT therefore reduces computational burden and enhances performance of IT2FLS. Experiments are implemented with two frequently used medical datasets from the UCI Repository for machine learning: the Wisconsin breast cancer and Cleveland heart disease. A number of important metrics are computed to measure the performance of the classification. They consist of accuracy, sensitivity, specificity and area under the receiver operating characteristic curve. Results demonstrate a significant dominance of the wavelet-IT2FLS approach compared to other machine learning methods including probabilistic neural network, support vector machine, fuzzy ARTMAP, and adaptive neuro-fuzzy inference system. The proposed approach is thus useful as a decision support system for clinicians and practitioners in the medical practice. copy; 2015 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Defining the geographic extent of suitable fishing grounds at a scale relevant to resource exploitation for commercial benthic species can be problematic. Bathymetric light detection and ranging (LiDAR) systems provide an opportunity to enhance ecosystem-based fisheries management strategies for coastally distributed benthic fisheries. In this study we define the spatial extent of suitable fishing grounds for the blacklip abalone (Haliotis rubra) along 200 linear kilometers of coastal waters for the first time, demonstrating the potential for integration of remotely-sensed data with commercial catch information. Variables representing seafloor structure, generated from airborne bathymetric LiDAR were combined with spatially-explicit fishing event data, to characterize the geographic footprint of the western Victorian abalone fishery, in south-east Australia. A MaxEnt modeling approach determined that bathymetry, rugosity and complexity were the three most important predictors in defining suitable fishing grounds (AUC = 0.89). Suitable fishing grounds predicted by the model showed a good relationship with catch statistics within each sub-zone of the fishery, suggesting that model outputs may be a useful surrogate for potential catch.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract
Background: Coronary artery disease (CAD), one of the leading causes of death globally, is influenced by both environmental and genetic risk factors. Gene-centric genome-wide association studies (GWAS) involving cases and controls have been remarkably successful in identifying genetic loci contributing to CAD. Modern in silico platforms, such as candidate gene prediction tools, permit a systematic analysis of GWAS data to identify candidate genes for complex diseases like CAD. Subsequent integration of drug-target data from drug databases with the predicted candidate genes can potentially identify novel therapeutics suitable for repositioning towards treatment of CAD.
Methods: Previously, we were able to predict 264 candidate genes and 104 potential therapeutic targets for CAD using Gentrepid (www.gentrepid.org), a candidate gene prediction platform with two bioinformatic modules to reanalyze Wellcome Trust Case-Control Consortium GWAS data. In an expanded study, using five bioinformatics modules on the same data, Gentrepid predicted 647 candidate genes and successfully replicated 55% of the candidate genes identified by the more powerful CARDIoGRAMplusC4D consortium meta-analysis. Hence, Gentrepid was capable of enhancing lower quality genotype-phenotype data, using an independent knowledgebase of existing biological data. Here, we used our methodology to integrate drug data from three drug databases: the Therapeutic Target Database, PharmGKB and Drug Bank, with the 647 candidate gene predictions from Gentrepid. We utilized known CAD targets, the scientific literature, existing drug data and the CARDIoGRAMplusC4D meta-analysis study as benchmarks to validate Gentrepid predictions for CAD.
Results: Our analysis identified a total of 184 predicted candidate genes as novel therapeutic targets for CAD, and 981 novel therapeutics feasible for repositioning in clinical trials towards treatment of CAD. The benchmarks based on known CAD targets and the scientific literature showed that our results were significant (p < 0.05).
Conclusions: We have demonstrated that available drugs may potentially be repositioned as novel therapeutics for the treatment of CAD. Drug repositioning can save valuable time and money spent on preclinical and phase I clinical studies.