881 resultados para Landmark-based spectral clustering
Resumo:
The goal of this paper is to study and propose a new technique for noise reduction used during the reconstruction of speech signals, particularly for biomedical applications. The proposed method is based on Kalman filtering in the time domain combined with spectral subtraction. Comparison with discrete Kalman filter in the frequency domain shows better performance of the proposed technique. The performance is evaluated by using the segmental signal-to-noise ratio and the Itakura-Saito`s distance. Results have shown that Kalman`s filter in time combined with spectral subtraction is more robust and efficient, improving the Itakura-Saito`s distance by up to four times. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
We develop a test of evolutionary change that incorporates a null hypothesis of homogeneity, which encompasses time invariance in the variance and autocovariance structure of residuals from estimated econometric relationships. The test framework is based on examining whether shifts in spectral decomposition between two frames of data are significant. Rejection of the null hypothesis will point not only to weak nonstationarity but to shifts in the structure of the second-order moments of the limiting distribution of the random process. This would indicate that the second-order properties of any underlying attractor set has changed in a statistically significant way, pointing to the presence of evolutionary change. A demonstration of the test's applicability to a real-world macroeconomic problem is accomplished by applying the test to the Australian Building Society Deposits (ABSD) model.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
In this paper we exploit the nonlinear property of the SiC multilayer devices to design an optical processor for error detection that enables reliable delivery of spectral data of four-wave mixing over unreliable communication channels. The SiC optical processor is realized by using double pin/pin a-SiC:H photodetector with front and back biased optical gating elements. Visible pulsed signals are transmitted together at different bit sequences. The combined optical signal is analyzed. Data show that the background acts as selector that picks one or more states by splitting portions of the input multi optical signals across the front and back photodiodes. Boolean operations such as EXOR and three bit addition are demonstrated optically, showing that when one or all of the inputs are present, the system will behave as an XOR gate representing the SUM. When two or three inputs are on, the system acts as AND gate indicating the present of the CARRY bit. Additional parity logic operations are performed using four incoming pulsed communication channels that are transmitted and checked for errors together. As a simple example of this approach, we describe an all-optical processor for error detection and then provide an experimental demonstration of this idea. (C) 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Resumo:
The SiC optical processor for error detection and correction is realized by using double pin/pin a-SiC:H photodetector with front and back biased optical gating elements. Data shows that the background act as selector that pick one or more states by splitting portions of the input multi optical signals across the front and back photodiodes. Boolean operations such as exclusive OR (EXOR) and three bit addition are demonstrated optically with a combination of such switching devices, showing that when one or all of the inputs are present the output will be amplified, the system will behave as an XOR gate representing the SUM. When two or three inputs are on, the system acts as AND gate indicating the present of the CARRY bit. Additional parity logic operations are performed by use of the four incoming pulsed communication channels that are transmitted and checked for errors together. As a simple example of this approach, we describe an all optical processor for error detection and correction and then, provide an experimental demonstration of this fault tolerant reversible system, in emerging nanotechnology.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.
Resumo:
This study focuses on the implementation of several pair trading strategies across three emerging markets, with the objective of comparing the results obtained from the different strategies and assessing if pair trading benefits from a more volatile environment. The results show that, indeed, there are higher potential profits arising from emerging markets. However, the higher excess return will be partially offset by higher transaction costs, which will be a determinant factor to the profitability of pair trading strategies. Also, a new clustering approach based on the Principal Component Analysis was tested as an alternative to the more standard clustering by Industry Groups. The new clustering approach delivers promising results, consistently reducing volatility to a greater extent than the Industry Group approach, with no significant harm to the excess returns.
Resumo:
When a pregnant woman is guided to a hospital for obstetrics purposes, many outcomes are possible, depending on her current conditions. An improved understanding of these conditions could provide a more direct medical approach by categorizing the different types of patients, enabling a faster response to risk situations, and therefore increasing the quality of services. In this case study, the characteristics of the patients admitted in the maternity care unit of Centro Hospitalar of Porto are acknowledged, allowing categorizing the patient women through clustering techniques. The main goal is to predict the patients’ route through the maternity care, adapting the services according to their conditions, providing the best clinical decisions and a cost-effective treatment to patients. The models developed presented very interesting results, being the best clustering evaluation index: 0.65. The evaluation of the clustering algorithms proved the viability of using clustering based data mining models to characterize pregnant patients, identifying which conditions can be used as an alert to prevent the occurrence of medical complications.
Resumo:
Lecture Notes in Computer Science, 9273
Resumo:
Magdeburg, Univ., Fak. für Informatik, Habil.-Schr., 2006
Resumo:
...In dieser Arbeit untersuche ich den ”Fluch der Dimensionen” mittels dem Begriff der Distanzkonzentration. Ich zeige, dass dieser Effekt im Datenmodell mittels der paarweisen Kovarianzkoeffizienten der Randverteilungen beschrieben werden kann. Zusätzlich vergleiche ich 10 prototypbasierte Clusteralgorithmen mittels 800.000 Clusterergebnissen von künstlich erzeugten Datensätzen. Ich erforsche, wie und warum Clusteralgorithmen von der Anzahl der Merkmale beeinflusst werden. Mit den Clusterergebnissen untersuche ich außerdem, wie gut 5 der populärsten Clusterqualitätsmaße die tatsächliche Clusterqualität schätzen.
Resumo:
OBJECTIVE: This study assessed clustering of multiple risk behaviors (i.e., low leisure-time physical activity, low fruits/vegetables intake, and high alcohol consumption) with level of cigarette consumption. METHODS: Data from the 2002 Swiss Health Survey, a population-based cross-sectional telephone survey assessing health and self-reported risk behaviors, were used. 18,005 subjects (8052 men and 9953 women) aged 25 years old or more participated. RESULTS: Smokers more frequently had low leisure time physical activity, low fruits/vegetables intake, and high alcohol consumption than non- and ex-smokers. Frequency of each risk behavior increased steadily with cigarette consumption. Clustering of risk behaviors increased with cigarette consumption in both men and women. For men, the odds ratios of multiple (> or =2) risk behaviors other than smoking, adjusted for age, nationality, and educational level, were 1.14 (95% confidence interval: 0.97, 1.33) for ex-smokers, 1.24 (0.93, 1.64) for light smokers (1-9 cigarettes/day), 1.72 (1.36, 2.17) for moderate smokers (10-19 cigarettes/day), and 3.07 (2.59, 3.64) for heavy smokers (> or =20 cigarettes/day) versus non-smokers. Similar odds ratios were found for women for corresponding groups, i.e., 1.01 (0.86, 1.19), 1.26 (1.00, 1.58), 1.62 (1.33, 1.98), and 2.75 (2.30, 3.29). CONCLUSIONS: Counseling and intervention with smokers should take into account the strong clustering of risk behaviors with level of cigarette consumption.