856 resultados para sire classification


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to gain knowledge from large databases, scalable data mining technologies are needed. Data are captured on a large scale and thus databases are increasing at a fast pace. This leads to the utilisation of parallel computing technologies in order to cope with large amounts of data. In the area of classification rule induction, parallelisation of classification rules has focused on the divide and conquer approach, also known as the Top Down Induction of Decision Trees (TDIDT). An alternative approach to classification rule induction is separate and conquer which has only recently been in the focus of parallelisation. This work introduces and evaluates empirically a framework for the parallel induction of classification rules, generated by members of the Prism family of algorithms. All members of the Prism family of algorithms follow the separate and conquer approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A two-stage linear-in-the-parameter model construction algorithm is proposed aimed at noisy two-class classification problems. The purpose of the first stage is to produce a prefiltered signal that is used as the desired output for the second stage which constructs a sparse linear-in-the-parameter classifier. The prefiltering stage is a two-level process aimed at maximizing a model's generalization capability, in which a new elastic-net model identification algorithm using singular value decomposition is employed at the lower level, and then, two regularization parameters are optimized using a particle-swarm-optimization algorithm at the upper level by minimizing the leave-one-out (LOO) misclassification rate. It is shown that the LOO misclassification rate based on the resultant prefiltered signal can be analytically computed without splitting the data set, and the associated computational cost is minimal due to orthogonality. The second stage of sparse classifier construction is based on orthogonal forward regression with the D-optimality algorithm. Extensive simulations of this approach for noisy data sets illustrate the competitiveness of this approach to classification of noisy data problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper explores the development of multi-feature classification techniques used to identify tremor-related characteristics in the Parkinsonian patient. Local field potentials were recorded from the subthalamic nucleus and the globus pallidus internus of eight Parkinsonian patients through the implanted electrodes of a Deep brain stimulation (DBS) device prior to device internalization. A range of signal processing techniques were evaluated with respect to their tremor detection capability and used as inputs in a multi-feature neural network classifier to identify the activity of Parkinsonian tremor. The results of this study show that a trained multi-feature neural network is able, under certain conditions, to achieve excellent detection accuracy on patients unseen during training. Overall the tremor detection accuracy was mixed, although an accuracy of over 86% was achieved in four out of the eight patients.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Airborne lidar provides accurate height information of objects on the earth and has been recognized as a reliable and accurate surveying tool in many applications. In particular, lidar data offer vital and significant features for urban land-cover classification, which is an important task in urban land-use studies. In this article, we present an effective approach in which lidar data fused with its co-registered images (i.e. aerial colour images containing red, green and blue (RGB) bands and near-infrared (NIR) images) and other derived features are used effectively for accurate urban land-cover classification. The proposed approach begins with an initial classification performed by the Dempster–Shafer theory of evidence with a specifically designed basic probability assignment function. It outputs two results, i.e. the initial classification and pseudo-training samples, which are selected automatically according to the combined probability masses. Second, a support vector machine (SVM)-based probability estimator is adopted to compute the class conditional probability (CCP) for each pixel from the pseudo-training samples. Finally, a Markov random field (MRF) model is established to combine spatial contextual information into the classification. In this stage, the initial classification result and the CCP are exploited. An efficient belief propagation (EBP) algorithm is developed to search for the global minimum-energy solution for the maximum a posteriori (MAP)-MRF framework in which three techniques are developed to speed up the standard belief propagation (BP) algorithm. Lidar and its co-registered data acquired by Toposys Falcon II are used in performance tests. The experimental results prove that fusing the height data and optical images is particularly suited for urban land-cover classification. There is no training sample needed in the proposed approach, and the computational cost is relatively low. An average classification accuracy of 93.63% is achieved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Koppen climate classification was applied to the output of atmospheric general circulation models and coupled atmosphere-ocean circulation models. The classification was used to validate model control runs of the present climate and to analyse greenhouse gas warming simulations The most prominent results of the global warming con~putationsw ere a retreat of regions of permafrost and the increase of areas with tropical rainy climates and dry climates.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract Background: The analysis of the Auditory Brainstem Response (ABR) is of fundamental importance to the investigation of the auditory system behaviour, though its interpretation has a subjective nature because of the manual process employed in its study and the clinical experience required for its analysis. When analysing the ABR, clinicians are often interested in the identification of ABR signal components referred to as Jewett waves. In particular, the detection and study of the time when these waves occur (i.e., the wave latency) is a practical tool for the diagnosis of disorders affecting the auditory system. Significant differences in inter-examiner results may lead to completely distinct clinical interpretations of the state of the auditory system. In this context, the aim of this research was to evaluate the inter-examiner agreement and variability in the manual classification of ABR. Methods: A total of 160 ABR data samples were collected, for four different stimulus intensity (80dBHL, 60dBHL, 40dBHL and 20dBHL), from 10 normal-hearing subjects (5 men and 5 women, from 20 to 52 years). Four examiners with expertise in the manual classification of ABR components participated in the study. The Bland-Altman statistical method was employed for the assessment of inter-examiner agreement and variability. The mean, standard deviation and error for the bias, which is the difference between examiners’ annotations, were estimated for each pair of examiners. Scatter plots and histograms were employed for data visualization and analysis. Results: In most comparisons the differences between examiner’s annotations were below 0.1 ms, which is clinically acceptable. In four cases, it was found a large error and standard deviation (>0.1 ms) that indicate the presence of outliers and thus, discrepancies between examiners. Conclusions: Our results quantify the inter-examiner agreement and variability of the manual analysis of ABR data, and they also allows for the determination of different patterns of manual ABR analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work proposes a unified neurofuzzy modelling scheme. To begin with, the initial fuzzy base construction method is based on fuzzy clustering utilising a Gaussian mixture model (GMM) combined with the analysis of covariance (ANOVA) decomposition in order to obtain more compact univariate and bivariate membership functions over the subspaces of the input features. The mean and covariance of the Gaussian membership functions are found by the expectation maximisation (EM) algorithm with the merit of revealing the underlying density distribution of system inputs. The resultant set of membership functions forms the basis of the generalised fuzzy model (GFM) inference engine. The model structure and parameters of this neurofuzzy model are identified via the supervised subspace orthogonal least square (OLS) learning. Finally, instead of providing deterministic class label as model output by convention, a logistic regression model is applied to present the classifier’s output, in which the sigmoid type of logistic transfer function scales the outputs of the neurofuzzy model to the class probability. Experimental validation results are presented to demonstrate the effectiveness of the proposed neurofuzzy modelling scheme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, various types of fault detection methods for fuel cells are compared. For example, those that use a model based approach or a data driven approach or a combination of the two. The potential advantages and drawbacks of each method are discussed and comparisons between methods are made. In particular, classification algorithms are investigated, which separate a data set into classes or clusters based on some prior knowledge or measure of similarity. In particular, the application of classification methods to vectors of reconstructed currents by magnetic tomography or to vectors of magnetic field measurements directly is explored. Bases are simulated using the finite integration technique (FIT) and regularization techniques are employed to overcome ill-posedness. Fisher's linear discriminant is used to illustrate these concepts. Numerical experiments show that the ill-posedness of the magnetic tomography problem is a part of the classification problem on magnetic field measurements as well. This is independent of the particular working mode of the cell but influenced by the type of faulty behavior that is studied. The numerical results demonstrate the ill-posedness by the exponential decay behavior of the singular values for three examples of fault classes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Since their inception, Twitter and related microblogging systems have provided a rich source of information for researchers and have attracted interest in their affordances and use. Since 2009 PubMed has included 123 journal articles on medicine and Twitter, but no overview exists as to how the field uses Twitter in research. // Objective: This paper aims to identify published work relating to Twitter indexed by PubMed, and then to classify it. This classification will provide a framework in which future researchers will be able to position their work, and to provide an understanding of the current reach of research using Twitter in medical disciplines. Limiting the study to papers indexed by PubMed ensures the work provides a reproducible benchmark. // Methods: Papers, indexed by PubMed, on Twitter and related topics were identified and reviewed. The papers were then qualitatively classified based on the paper’s title and abstract to determine their focus. The work that was Twitter focused was studied in detail to determine what data, if any, it was based on, and from this a categorization of the data set size used in the studies was developed. Using open coded content analysis additional important categories were also identified, relating to the primary methodology, domain and aspect. // Results: As of 2012, PubMed comprises more than 21 million citations from biomedical literature, and from these a corpus of 134 potentially Twitter related papers were identified, eleven of which were subsequently found not to be relevant. There were no papers prior to 2009 relating to microblogging, a term first used in 2006. Of the remaining 123 papers which mentioned Twitter, thirty were focussed on Twitter (the others referring to it tangentially). The early Twitter focussed papers introduced the topic and highlighted the potential, not carrying out any form of data analysis. The majority of published papers used analytic techniques to sort through thousands, if not millions, of individual tweets, often depending on automated tools to do so. Our analysis demonstrates that researchers are starting to use knowledge discovery methods and data mining techniques to understand vast quantities of tweets: the study of Twitter is becoming quantitative research. // Conclusions: This work is to the best of our knowledge the first overview study of medical related research based on Twitter and related microblogging. We have used five dimensions to categorise published medical related research on Twitter. This classification provides a framework within which researchers studying development and use of Twitter within medical related research, and those undertaking comparative studies of research relating to Twitter in the area of medicine and beyond, can position and ground their work.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The bewildering complexity of cortical microcircuits at the single cell level gives rise to surprisingly robust emergent activity patterns at the level of laminar and columnar local field potentials (LFPs) in response to targeted local stimuli. Here we report the results of our multivariate data-analytic approach based on simultaneous multi-site recordings using micro-electrode-array chips for investigation of the microcircuitary of rat somatosensory (barrel) cortex. We find high repeatability of stimulus-induced responses, and typical spatial distributions of LFP responses to stimuli in supragranular, granular, and infragranular layers, where the last form a particularly distinct class. Population spikes appear to travel with about 33 cm/s from granular to infragranular layers. Responses within barrel related columns have different profiles than those in neighbouring columns to the left or interchangeably to the right. Variations between slices occur, but can be minimized by strictly obeying controlled experimental protocols. Cluster analysis on normalized recordings indicates specific spatial distributions of time series reflecting the location of sources and sinks independent of the stimulus layer. Although the precise correspondences between single cell activity and LFPs are still far from clear, a sophisticated neuroinformatics approach in combination with multi-site LFP recordings in the standardized slice preparation is suitable for comparing normal conditions to genetically or pharmacologically altered situations based on real cortical microcircuitry.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Major Depressive Disorder (MDD) has been associated with biased processing and abnormal regulation of negative and positive information, which may result from compromised coordinated activity of prefrontal and subcortical brain regions involved in evaluating emotional information. We tested whether patients with MDD show distributed changes in functional connectivity with a set of independently derived brain networks that have shown high correspondence with different task demands, including stimulus salience and emotional processing. We further explored if connectivity during emotional word processing related to the tendency to engage in positive or negative emotional states. In this study, 25 medication-free MDD patients without current or past comorbidity and matched controls (n=25) performed an emotional word-evaluation task during functional MRI. Using a dual regression approach, individual spatial connectivity maps representing each subject’s connectivity with each standard network were used to evaluate between-group differences and effects of positive and negative emotionality (extraversion and neuroticism, respectively, as measured with the NEO-FFI). Results showed decreased functional connectivity of the medial prefrontal cortex, ventrolateral prefrontal cortex, and ventral striatum with the fronto-opercular salience network in MDD patients compared to controls. In patients, abnormal connectivity was related to extraversion, but not neuroticism. These results confirm the hypothesis of a relative (para)limbic-cortical decoupling that may explain dysregulated affect in MDD. As connectivity of these regions with the salience network was related to extraversion, but not to general depression severity or negative emotionality, dysfunction of this network may be responsible for the failure to sustain engagement in rewarding behavior.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Full-waveform laser scanning data acquired with a Riegl LMS-Q560 instrument were used to classify an orange orchard into orange trees, grass and ground using waveform parameters alone. Gaussian decomposition was performed on this data capture from the National Airborne Field Experiment in November 2006 using a custom peak-detection procedure and a trust-region-reflective algorithm for fitting Gauss functions. Calibration was carried out using waveforms returned from a road surface, and the backscattering coefficient c was derived for every waveform peak. The processed data were then analysed according to the number of returns detected within each waveform and classified into three classes based on pulse width and c. For single-peak waveforms the scatterplot of c versus pulse width was used to distinguish between ground, grass and orange trees. In the case of multiple returns, the relationship between first (or first plus middle) and last return c values was used to separate ground from other targets. Refinement of this classification, and further sub-classification into grass and orange trees was performed using the c versus pulse width scatterplots of last returns. In all cases the separation was carried out using a decision tree with empirical relationships between the waveform parameters. Ground points were successfully separated from orange tree points. The most difficult class to separate and verify was grass, but those points in general corresponded well with the grass areas identified in the aerial photography. The overall accuracy reached 91%, using photography and relative elevation as ground truth. The overall accuracy for two classes, orange tree and combined class of grass and ground, yielded 95%. Finally, the backscattering coefficient c of single-peak waveforms was also used to derive reflectance values of the three classes. The reflectance of the orange tree class (0.31) and ground class (0.60) are consistent with published values at the wavelength of the Riegl scanner (1550 nm). The grass class reflectance (0.46) falls in between the other two classes as might be expected, as this class has a mixture of the contributions of both vegetation and ground reflectance properties.