847 resultados para Classification Methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Frog protection has become increasingly essential due to the rapid decline of its biodiversity. Therefore, it is valuable to develop new methods for studying this biodiversity. In this paper, a novel feature extraction method is proposed based on perceptual wavelet packet decomposition for classifying frog calls in noisy environments. Pre-processing and syllable segmentation are first applied to the frog call. Then, a spectral peak track is extracted from each syllable if possible. Track duration, dominant frequency and oscillation rate are directly extracted from the track. With k-means clustering algorithm, the calculated dominant frequency of all frog species is clustered into k parts, which produce a frequency scale for wavelet packet decomposition. Based on the adaptive frequency scale, wavelet packet decomposition is applied to the frog calls. Using the wavelet packet decomposition coefficients, a new feature set named perceptual wavelet packet decomposition sub-band cepstral coefficients is extracted. Finally, a k-nearest neighbour (k-NN) classifier is used for the classification. The experiment results show that the proposed features can achieve an average classification accuracy of 97.45% which outperforms syllable features (86.87%) and Mel-frequency cepstral coefficients (MFCCs) feature (90.80%).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Being able to accurately predict the risk of falling is crucial in patients with Parkinson’s dis- ease (PD). This is due to the unfavorable effect of falls, which can lower the quality of life as well as directly impact on survival. Three methods considered for predicting falls are decision trees (DT), Bayesian networks (BN), and support vector machines (SVM). Data on a 1-year prospective study conducted at IHBI, Australia, for 51 people with PD are used. Data processing are conducted using rpart and e1071 packages in R for DT and SVM, con- secutively; and Bayes Server 5.5 for the BN. The results show that BN and SVM produce consistently higher accuracy over the 12 months evaluation time points (average sensitivity and specificity > 92%) than DT (average sensitivity 88%, average specificity 72%). DT is prone to imbalanced data so needs to adjust for the misclassification cost. However, DT provides a straightforward, interpretable result and thus is appealing for helping to identify important items related to falls and to generate fallers’ profiles.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates – an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates. Methods Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model. Results The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable. Conclusion The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The purpose of this presentation is to outline the relevance of the categorization of the load regime data to assess the functional output and usage of the prosthesis of lower limb amputees. The objectives are • To highlight the need for categorisation of activities of daily living • To present a categorization of load regime applied on residuum, • To present some descriptors of the four types of activity that could be detected, • To provide an example the results for a case. Methods The load applied on the osseointegrated fixation of one transfemoral amputee was recorded using a portable kinetic system for 5 hours. The load applied on the residuum was divided in four types of activities corresponding to inactivity, stationary loading, localized locomotion and directional locomotion as detailed in previously publications. Results The periods of directional locomotion, localized locomotion, and stationary loading occurred 44%, 34%, and 22% of recording time and each accounted for 51%, 38%, and 12% of the duration of the periods of activity, respectively. The absolute maximum force during directional locomotion, localized locomotion, and stationary loading was 19%, 15%, and 8% of the body weight on the anteroposterior axis, 20%, 19%, and 12% on the mediolateral axis, and 121%, 106%, and 99% on the long axis. A total of 2,783 gait cycles were recorded. Discussion Approximately 10% more gait cycles and 50% more of the total impulse than conventional analyses were identified. The proposed categorization and apparatus have the potential to complement conventional instruments, particularly for difficult cases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a Chance-constraint Programming approach for constructing maximum-margin classifiers which are robust to interval-valued uncertainty in training examples. The methodology ensures that uncertain examples are classified correctly with high probability by employing chance-constraints. The main contribution of the paper is to pose the resultant optimization problem as a Second Order Cone Program by using large deviation inequalities, due to Bernstein. Apart from support and mean of the uncertain examples these Bernstein based relaxations make no further assumptions on the underlying uncertainty. Classifiers built using the proposed approach are less conservative, yield higher margins and hence are expected to generalize better than existing methods. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle interval-valued uncertainty than state-of-the-art.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Laboratory confirmation methods are important in bovine cysticerosis diagnosis as other pathologies can result in morphologically similar lesions resulting in false identifications. We developed a probe-based real-time PCR assay to identify Taenia saginata in suspect cysts encountered at meat inspection and compared its use with the traditional method of identification, histology, as well as a published nested PCR. The assay simultaneously detects T. saginata DNA and a bovine internal control using the cytochrome c oxidase subunit 1 gene of each species and shows specificity against parasites causing lesions morphologically similar to those of T. saginata. The assay was sufficiently sensitive to detect 1 fg (Ct 35.09 +/- 0.95) of target DNA using serially-diluted plasmid DNA in reactions spiked with bovine DNA as well as in all viable and caseated positive control cysts. A loss in PCR sensitivity was observed with increasing cyst degeneration as seen in other molecular methods. In comparison to histology, the assay offered greater sensitivity and accuracy with 10/19 (53%) T. saginata positives detected by real-time PCR and none by histology. When the results were compared with the reference PCR, the assay was less sensitive but offered advantages of faster turnaround times and reduced contamination risk. Estimates of the assay's repeatability and reproducibility showed the assay is highly reliable with reliability coefficients greater than 0.94. Crown Copyright (C) 2013 Published by Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Environmental changes have put great pressure on biological systems leading to the rapid decline of biodiversity. To monitor this change and protect biodiversity, animal vocalizations have been widely explored by the aid of deploying acoustic sensors in the field. Consequently, large volumes of acoustic data are collected. However, traditional manual methods that require ecologists to physically visit sites to collect biodiversity data are both costly and time consuming. Therefore it is essential to develop new semi-automated and automated methods to identify species in automated audio recordings. In this study, a novel feature extraction method based on wavelet packet decomposition is proposed for frog call classification. After syllable segmentation, the advertisement call of each frog syllable is represented by a spectral peak track, from which track duration, dominant frequency and oscillation rate are calculated. Then, a k-means clustering algorithm is applied to the dominant frequency, and the centroids of clustering results are used to generate the frequency scale for wavelet packet decomposition (WPD). Next, a new feature set named adaptive frequency scaled wavelet packet decomposition sub-band cepstral coefficients is extracted by performing WPD on the windowed frog calls. Furthermore, the statistics of all feature vectors over each windowed signal are calculated for producing the final feature set. Finally, two well-known classifiers, a k-nearest neighbour classifier and a support vector machine classifier, are used for classification. In our experiments, we use two different datasets from Queensland, Australia (18 frog species from commercial recordings and field recordings of 8 frog species from James Cook University recordings). The weighted classification accuracy with our proposed method is 99.5% and 97.4% for 18 frog species and 8 frog species respectively, which outperforms all other comparable methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis consists of an introduction, four research articles and an appendix. The thesis studies relations between two different approaches to continuum limit of models of two dimensional statistical mechanics at criticality. The approach of conformal field theory (CFT) could be thought of as the algebraic classification of some basic objects in these models. It has been succesfully used by physicists since 1980's. The other approach, Schramm-Loewner evolutions (SLEs), is a recently introduced set of mathematical methods to study random curves or interfaces occurring in the continuum limit of the models. The first and second included articles argue on basis of statistical mechanics what would be a plausible relation between SLEs and conformal field theory. The first article studies multiple SLEs, several random curves simultaneously in a domain. The proposed definition is compatible with a natural commutation requirement suggested by Dubédat. The curves of multiple SLE may form different topological configurations, ``pure geometries''. We conjecture a relation between the topological configurations and CFT concepts of conformal blocks and operator product expansions. Example applications of multiple SLEs include crossing probabilities for percolation and Ising model. The second article studies SLE variants that represent models with boundary conditions implemented by primary fields. The most well known of these, SLE(kappa, rho), is shown to be simple in terms of the Coulomb gas formalism of CFT. In the third article the space of local martingales for variants of SLE is shown to carry a representation of Virasoro algebra. Finding this structure is guided by the relation of SLEs and CFTs in general, but the result is established in a straightforward fashion. This article, too, emphasizes multiple SLEs and proposes a possible way of treating pure geometries in terms of Coulomb gas. The fourth article states results of applications of the Virasoro structure to the open questions of SLE reversibility and duality. Proofs of the stated results are provided in the appendix. The objective is an indirect computation of certain polynomial expected values. Provided that these expected values exist, in generic cases they are shown to possess the desired properties, thus giving support for both reversibility and duality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Protein Kinase-Like Non-kinases (PKLNKs), which are closely related to protein kinases, lack the crucial catalytic aspartate in the catalytic loop, and hence cannot function as protein kinase, have been analysed. Using various sensitive sequence analysis methods, we have recognized 82 PKLNKs from four higher eukaryotic organisms, namely, Homo sapiens, Mus musculus, Rattus norvegicus, and Drosophila melanogaster. On the basis of their domain combination and function, PKLNKs have been classified mainly into four categories: (1) Ligand binding PKLNKs, (2) PKLNKs with extracellular protein-protein interaction domain, (3) PKLNKs involved in dimerization, and (4) PKLNKs with cytoplasmic protein-protein interaction module. While members of the first two classes of PKLNKs have transmembrane domain tethered to the PKLNK domain, members of the other two classes of PKLNKs are cytoplasmic in nature. The current classification scheme hopes to provide a convenient framework to classify the PKLNKs from other eukaryotes which would be helpful in deciphering their roles in cellular processes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Climate change contributes directly or indirectly to changes in species distributions, and there is very high confidence that recent climate warming is already affecting ecosystems. The Arctic has already experienced the greatest regional warming in recent decades, and the trend is continuing. However, studies on the northern ecosystems are scarce compared to more southerly regions. Better understanding of the past and present environmental change is needed to be able to forecast the future. Multivariate methods were used to explore the distributional patterns of chironomids in 50 shallow (≤ 10m) lakes in relation to 24 variables determined in northern Fennoscandia at the ecotonal area from the boreal forest in the south to the orohemiarctic zone in the north. Highest taxon richness was noted at middle elevations around 400 m a.s.l. Significantly lower values were observed from cold lakes situated in the tundra zone. Lake water alkalinity had the strongest positive correlation with the taxon richness. Many taxa had preference for lakes either on tundra area or forested area. The variation in the chironomid abundance data was best correlated with sediment organic content (LOI), lake water total organic carbon content, pH and air temperature, with LOI being the strongest variable. Three major lake groups were separated on the basis of their chironomid assemblages: (i) small and shallow organic-rich lakes, (ii) large and base-rich lakes, and (iii) cold and clear oligotrophic tundra lakes. Environmental variables best discriminating the lake groups were LOI, taxon richness, and Mg. When repeated, this kind of an approach could be useful and efficient in monitoring the effects of global change on species ranges. Many species of fast spreading insects, including chironomids, show a remarkable ability to track environmental changes. Based on this ability, past environmental conditions have been reconstructed using their chitinous remains in the lake sediment profiles. In order to study the Holocene environmental history of subarctic aquatic systems, and quantitatively reconstruct the past temperatures at or near the treeline, long sediment cores covering the last 10000 years (the Holocene) were collected from three lakes. Lower temperature values than expected based on the presence of pine in the catchment during the mid-Holocene were reconstructed from a lake with great water volume and depth. The lake provided thermal refuge for profundal, cold adapted taxa during the warm period. In a shallow lake, the decrease in the reconstructed temperatures during the late Holocene may reflect the indirect response of the midges to climate change through, e.g., pH change. The results from three lakes indicated that the response of chironomids to climate have been more or less indirect. However, concurrent shifts in assemblages of chironomids and vegetation in two lakes during the Holocene time period indicated that the midges together with the terrestrial vegetation had responded to the same ultimate cause, which most likely was the Holocene climate change. This was also supported by the similarity in the long-term trends in faunal succession for the chironomid assemblages in several lakes in the area. In northern Finnish Lapland the distribution of chironomids were significantly correlated with physical and limnological factors that are most likely to change as a result of future climate change. The indirect and individualistic response of aquatic systems, as reconstructed using the chironomid assemblages, to the climate change in the past suggests that in the future, the lake ecosystems in the north do not respond in one predictable way to the global climate change. Lakes in the north may respond to global climate change in various ways that are dependent on the initial characters of the catchment area and the lake.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present study deals with the application of cluster analysis, Fuzzy Cluster Analysis (FCA) and Kohonen Artificial Neural Networks (KANN) methods for classification of 159 meteorological stations in India into meteorologically homogeneous groups. Eight parameters, namely latitude, longitude, elevation, average temperature, humidity, wind speed, sunshine hours and solar radiation, are considered as the classification criteria for grouping. The optimal number of groups is determined as 14 based on the Davies-Bouldin index approach. It is observed that the FCA approach performed better than the other two methodologies for the present study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis consists of an introduction, four research articles and an appendix. The thesis studies relations between two different approaches to continuum limit of models of two dimensional statistical mechanics at criticality. The approach of conformal field theory (CFT) could be thought of as the algebraic classification of some basic objects in these models. It has been succesfully used by physicists since 1980's. The other approach, Schramm-Loewner evolutions (SLEs), is a recently introduced set of mathematical methods to study random curves or interfaces occurring in the continuum limit of the models. The first and second included articles argue on basis of statistical mechanics what would be a plausible relation between SLEs and conformal field theory. The first article studies multiple SLEs, several random curves simultaneously in a domain. The proposed definition is compatible with a natural commutation requirement suggested by Dubédat. The curves of multiple SLE may form different topological configurations, ``pure geometries''. We conjecture a relation between the topological configurations and CFT concepts of conformal blocks and operator product expansions. Example applications of multiple SLEs include crossing probabilities for percolation and Ising model. The second article studies SLE variants that represent models with boundary conditions implemented by primary fields. The most well known of these, SLE(kappa, rho), is shown to be simple in terms of the Coulomb gas formalism of CFT. In the third article the space of local martingales for variants of SLE is shown to carry a representation of Virasoro algebra. Finding this structure is guided by the relation of SLEs and CFTs in general, but the result is established in a straightforward fashion. This article, too, emphasizes multiple SLEs and proposes a possible way of treating pure geometries in terms of Coulomb gas. The fourth article states results of applications of the Virasoro structure to the open questions of SLE reversibility and duality. Proofs of the stated results are provided in the appendix. The objective is an indirect computation of certain polynomial expected values. Provided that these expected values exist, in generic cases they are shown to possess the desired properties, thus giving support for both reversibility and duality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this study was to evaluate and test methods which could improve local estimates of a general model fitted to a large area. In the first three studies, the intention was to divide the study area into sub-areas that were as homogeneous as possible according to the residuals of the general model, and in the fourth study, the localization was based on the local neighbourhood. According to spatial autocorrelation (SA), points closer together in space are more likely to be similar than those that are farther apart. Local indicators of SA (LISAs) test the similarity of data clusters. A LISA was calculated for every observation in the dataset, and together with the spatial position and residual of the global model, the data were segmented using two different methods: classification and regression trees (CART) and the multiresolution segmentation algorithm (MS) of the eCognition software. The general model was then re-fitted (localized) to the formed sub-areas. In kriging, the SA is modelled with a variogram, and the spatial correlation is a function of the distance (and direction) between the observation and the point of calculation. A general trend is corrected with the residual information of the neighbourhood, whose size is controlled by the number of the nearest neighbours. Nearness is measured as Euclidian distance. With all methods, the root mean square errors (RMSEs) were lower, but with the methods that segmented the study area, the deviance in single localized RMSEs was wide. Therefore, an element capable of controlling the division or localization should be included in the segmentation-localization process. Kriging, on the other hand, provided stable estimates when the number of neighbours was sufficient (over 30), thus offering the best potential for further studies. Even CART could be combined with kriging or non-parametric methods, such as most similar neighbours (MSN).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.