865 resultados para ensemble classifiers
Resumo:
Accurate and detailed measurement of an individual's physical activity is a key requirement for helping researchers understand the relationship between physical activity and health. Accelerometers have become the method of choice for measuring physical activity due to their small size, low cost, convenience and their ability to provide objective information about physical activity. However, interpreting accelerometer data once it has been collected can be challenging. In this work, we applied machine learning algorithms to the task of physical activity recognition from triaxial accelerometer data. We employed a simple but effective approach of dividing the accelerometer data into short non-overlapping windows, converting each window into a feature vector, and treating each feature vector as an i.i.d training instance for a supervised learning algorithm. In addition, we improved on this simple approach with a multi-scale ensemble method that did not need to commit to a single window size and was able to leverage the fact that physical activities produced time series with repetitive patterns and discriminative features for physical activity occurred at different temporal scales.
Resumo:
Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.
Resumo:
High-Order Co-Clustering (HOCC) methods have attracted high attention in recent years because of their ability to cluster multiple types of objects simultaneously using all available information. During the clustering process, HOCC methods exploit object co-occurrence information, i.e., inter-type relationships amongst different types of objects as well as object affinity information, i.e., intra-type relationships amongst the same types of objects. However, it is difficult to learn accurate intra-type relationships in the presence of noise and outliers. Existing HOCC methods consider the p nearest neighbours based on Euclidean distance for the intra-type relationships, which leads to incomplete and inaccurate intra-type relationships. In this paper, we propose a novel HOCC method that incorporates multiple subspace learning with a heterogeneous manifold ensemble to learn complete and accurate intra-type relationships. Multiple subspace learning reconstructs the similarity between any pair of objects that belong to the same subspace. The heterogeneous manifold ensemble is created based on two-types of intra-type relationships learnt using p-nearest-neighbour graph and multiple subspaces learning. Moreover, in order to make sure the robustness of clustering process, we introduce a sparse error matrix into matrix decomposition and develop a novel iterative algorithm. Empirical experiments show that the proposed method achieves improved results over the state-of-art HOCC methods for FScore and NMI.
Resumo:
Vanessa Mafe-Keane was invited to participate as choreographer in Iranian singer Shirin Madg 's project, Rebirth: Combined art performance. This project integrated singing, music, visual-art, film, dance and is based on the dissident poetry of female Iranian poet, Forough Farrokhzad. The choreographic dance movement focused on simple, lyrical, flowing classical dance forms that also incorporated everyday gestures and actions performed by two Queensland dancers, Caitlin MacKenzie and Abby Johnson. The choreographic intention was not to attempt to re-create Iranian dance practices instead, to draw inspiration and reference specific movement qualities. This was achieved through the subtle inclusion of spinning movements and focusing attention on the dancers’ arms and upper torso. This fusion became an underlying theme reflected throughout the choreographic component. Additionally, this project presented an opportunity to draw on past experiences and problem-solve ways to construct choreographic work where the dancers and the musical assemble group could be staged side by side. This experience highlighted differing approaches to rehearsal protocols within disciplines, the practicalities of staging different artists, understanding musical cues and the diversity of audience engagement. Performances: BEMAC Multicultural Centre, Brisbane 06 February 2015 and Helensvale Cultural Centre, Gold Coast 07 February 2015
Resumo:
This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case,and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex symbolic sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets.
Resumo:
Isothermal-isobaric ensemble Monte Carlo simulation studies of adamantane have been carried out at different temperatures. Thermodynamic properties and radial distribution functions calculated by employing a simple potential model based on sitesite interactions show good agreement with experiment and suggest that the solid is orientationally disordered at high temperatures.
Resumo:
The significance of treating rainfall as a chaotic system instead of a stochastic system for a better understanding of the underlying dynamics has been taken up by various studies recently. However, an important limitation of all these approaches is the dependence on a single method for identifying the chaotic nature and the parameters involved. Many of these approaches aim at only analyzing the chaotic nature and not its prediction. In the present study, an attempt is made to identify chaos using various techniques and prediction is also done by generating ensembles in order to quantify the uncertainty involved. Daily rainfall data of three regions with contrasting characteristics (mainly in the spatial area covered), Malaprabha, Mahanadi and All-India for the period 1955-2000 are used for the study. Auto-correlation and mutual information methods are used to determine the delay time for the phase space reconstruction. Optimum embedding dimension is determined using correlation dimension, false nearest neighbour algorithm and also nonlinear prediction methods. The low embedding dimensions obtained from these methods indicate the existence of low dimensional chaos in the three rainfall series. Correlation dimension method is done on th phase randomized and first derivative of the data series to check whether the saturation of the dimension is due to the inherent linear correlation structure or due to low dimensional dynamics. Positive Lyapunov exponents obtained prove the exponential divergence of the trajectories and hence the unpredictability. Surrogate data test is also done to further confirm the nonlinear structure of the rainfall series. A range of plausible parameters is used for generating an ensemble of predictions of rainfall for each year separately for the period 1996-2000 using the data till the preceding year. For analyzing the sensitiveness to initial conditions, predictions are done from two different months in a year viz., from the beginning of January and June. The reasonably good predictions obtained indicate the efficiency of the nonlinear prediction method for predicting the rainfall series. Also, the rank probability skill score and the rank histograms show that the ensembles generated are reliable with a good spread and skill. A comparison of results of the three regions indicates that although they are chaotic in nature, the spatial averaging over a large area can increase the dimension and improve the predictability, thus destroying the chaotic nature. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Background: A nucleosome is the fundamental repeating unit of the eukaryotic chromosome. It has been shown that the positioning of a majority of nucleosomes is primarily controlled by factors other than the intrinsic preference of the DNA sequence. One of the key questions in this context is the role, if any, that can be played by the variability of nucleosomal DNA structure. Results: In this study, we have addressed this question by analysing the variability at the dinucleotide and trinucleotide as well as longer length scales in a dataset of nucleosome X-ray crystal structures. We observe that the nucleosome structure displays remarkable local level structural versatility within the B-DNA family. The nucleosomal DNA also incorporates a large number of kinks. Conclusions: Based on our results, we propose that the local and global level versatility of B-DNA structure may be a significant factor modulating the formation of nucleosomes in the vicinity of high-plasticity genes, and in varying the probability of binding by regulatory proteins. Hence, these factors should be incorporated in the prediction algorithms and there may not be a unique `template' for predicting putative nucleosome sequences. In addition, the multimodal distribution of dinucleotide parameters for some steps and the presence of a large number of kinks in the nucleosomal DNA structure indicate that the linear elastic model, used by several algorithms to predict the energetic cost of nucleosome formation, may lead to incorrect results.
Resumo:
An exact numerical calculation of ensemble-averaged length-scale-dependent conductance for the one-dimensional Anderson model is shown to support an earlier conjecture for a conductance minimum. The numerical results can be understood in terms of the Thouless expression for the conductance and the Wigner level-spacing statistics.
Resumo:
In this paper, we propose a novel dexterous technique for fast and accurate recognition of online handwritten Kannada and Tamil characters. Based on the primary classifier output and prior knowledge, the best classifier is chosen from set of three classifiers for second stage classification. Prior knowledge is obtained through analysis of the confusion matrix of primary classifier which helped in identifying the multiple sets of confused characters. Further, studies were carried out to check the performance of secondary classifiers in disambiguating among the confusion sets. Using this technique we have achieved an average accuracy of 92.6% for Kannada characters on the MILE lab dataset and 90.2% for Tamil characters on the HP Labs dataset.
Resumo:
Perfect or even mediocre weather predictions over a long period are almost impossible because of the ultimate growth of a small initial error into a significant one. Even though the sensitivity of initial conditions limits the predictability in chaotic systems, an ensemble of prediction from different possible initial conditions and also a prediction algorithm capable of resolving the fine structure of the chaotic attractor can reduce the prediction uncertainty to some extent. All of the traditional chaotic prediction methods in hydrology are based on single optimum initial condition local models which can model the sudden divergence of the trajectories with different local functions. Conceptually, global models are ineffective in modeling the highly unstable structure of the chaotic attractor. This paper focuses on an ensemble prediction approach by reconstructing the phase space using different combinations of chaotic parameters, i.e., embedding dimension and delay time to quantify the uncertainty in initial conditions. The ensemble approach is implemented through a local learning wavelet network model with a global feed-forward neural network structure for the phase space prediction of chaotic streamflow series. Quantification of uncertainties in future predictions are done by creating an ensemble of predictions with wavelet network using a range of plausible embedding dimensions and delay times. The ensemble approach is proved to be 50% more efficient than the single prediction for both local approximation and wavelet network approaches. The wavelet network approach has proved to be 30%-50% more superior to the local approximation approach. Compared to the traditional local approximation approach with single initial condition, the total predictive uncertainty in the streamflow is reduced when modeled with ensemble wavelet networks for different lead times. Localization property of wavelets, utilizing different dilation and translation parameters, helps in capturing most of the statistical properties of the observed data. The need for taking into account all plausible initial conditions and also bringing together the characteristics of both local and global approaches to model the unstable yet ordered chaotic attractor of a hydrologic series is clearly demonstrated.
Resumo:
The basic characteristic of a chaotic system is its sensitivity to the infinitesimal changes in its initial conditions. A limit to predictability in chaotic system arises mainly due to this sensitivity and also due to the ineffectiveness of the model to reveal the underlying dynamics of the system. In the present study, an attempt is made to quantify these uncertainties involved and thereby improve the predictability by adopting a multivariate nonlinear ensemble prediction. Daily rainfall data of Malaprabha basin, India for the period 1955-2000 is used for the study. It is found to exhibit a low dimensional chaotic nature with the dimension varying from 5 to 7. A multivariate phase space is generated, considering a climate data set of 16 variables. The chaotic nature of each of these variables is confirmed using false nearest neighbor method. The redundancy, if any, of this atmospheric data set is further removed by employing principal component analysis (PCA) method and thereby reducing it to eight principal components (PCs). This multivariate series (rainfall along with eight PCs) is found to exhibit a low dimensional chaotic nature with dimension 10. Nonlinear prediction employing local approximation method is done using univariate series (rainfall alone) and multivariate series for different combinations of embedding dimensions and delay times. The uncertainty in initial conditions is thus addressed by reconstructing the phase space using different combinations of parameters. The ensembles generated from multivariate predictions are found to be better than those from univariate predictions. The uncertainty in predictions is decreased or in other words predictability is increased by adopting multivariate nonlinear ensemble prediction. The restriction on predictability of a chaotic series can thus be altered by quantifying the uncertainty in the initial conditions and also by including other possible variables, which may influence the system. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Wetlands are the most productive and biologically diverse but very fragile ecosystems. They are vulnerable to even small changes in their biotic and abiotic factors. In recent years, there has been concern over the continuous degradation of wetlands due to unplanned developmental activities. This necessitates inventorying, mapping, and monitoring of wetlands to implement sustainable management approaches. The principal objective of this work is to evolve a strategy to identify and monitor wetlands using temporal remote sensing (RS) data. Pattern classifiers were used to extract wetlands automatically from NIR bands of MODIS, Landsat MSS and Landsat TM remote sensing data. MODIS provided data for 2002 to 2007, while for 1973 and 1992 IR Bands of Landsat MSS and TM (79m and 30m spatial resolution) data were used. Principal components of IR bands of MODIS (250 m) were fused with IRS LISS-3 NIR (23.5 m). To extract wetlands, statistical unsupervised learning of IR bands for the respective temporal data was performed using Bayesian approach based on prior probability, mean and covariance. Temporal analysis of wetlands indicates a sharp decline of 58% in Greater Bangalore attributing to intense urbanization processes, evident from a 466% increase in built-up area from 1973 to 2007.