960 resultados para Semi-supervised clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Samples of Forsythia suspensa from raw (Laoqiao) and ripe (Qingqiao) fruit were analyzed with the use of HPLC-DAD and the EIS-MS techniques. Seventeen peaks were detected, and of these, twelve were identified. Most were related to the glucopyranoside molecular fragment. Samples collected from three geographical areas (Shanxi, Henan and Shandong Provinces), were discriminated with the use of hierarchical clustering analysis (HCA), discriminant analysis (DA), and principal component analysis (PCA) models, but only PCA was able to provide further information about the relationships between objects and loadings; eight peaks were related to the provinces of sample origin. The supervised classification models-K-nearest neighbor (KNN), least squares support vector machines (LS-SVM), and counter propagation artificial neural network (CP-ANN) methods, indicated successful classification but KNN produced 100% classification rate. Thus, the fruit were discriminated on the basis of their places of origin.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods which have been recently employed to analyse PNSD data, however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K-means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K-means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K-means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectra to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A FitzHugh-Nagumo monodomain model has been used to describe the propagation of the electrical potential in heterogeneous cardiac tissue. In this paper, we consider a two-dimensional fractional FitzHugh-Nagumo monodomain model on an irregular domain. The model consists of a coupled Riesz space fractional nonlinear reaction-diffusion model and an ordinary differential equation, describing the ionic fluxes as a function of the membrane potential. Secondly, we use a decoupling technique and focus on solving the Riesz space fractional nonlinear reaction-diffusion model. A novel spatially second-order accurate semi-implicit alternating direction method (SIADM) for this model on an approximate irregular domain is proposed. Thirdly, stability and convergence of the SIADM are proved. Finally, some numerical examples are given to support our theoretical analysis and these numerical techniques are employed to simulate a two-dimensional fractional Fitzhugh-Nagumo model on both an approximate circular and an approximate irregular domain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we derive a new nonlinear two-sided space-fractional diffusion equation with variable coefficients from the fractional Fick’s law. A semi-implicit difference method (SIDM) for this equation is proposed. The stability and convergence of the SIDM are discussed. For the implementation, we develop a fast accurate iterative method for the SIDM by decomposing the dense coefficient matrix into a combination of Toeplitz-like matrices. This fast iterative method significantly reduces the storage requirement of O(n2)O(n2) and computational cost of O(n3)O(n3) down to n and O(nlogn)O(nlogn), where n is the number of grid points. The method retains the same accuracy as the underlying SIDM solved with Gaussian elimination. Finally, some numerical results are shown to verify the accuracy and efficiency of the new method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The monoanionic ligand 1,1,3,3 tetracyano-2 ethoxypropenide (tcnoet) is reported with its Cu(II)–bpy complex of formula [Cu2(µ-tcnoet)2(tcnoet)2(bpy)2]. The structure has been determined using X-ray diffraction and features an alternating chain with bridging tcnoet ligands. One ligand acts as a bidentate, dinucleating ligand with one short Cu–N and one medium Cu–N bond, whereas the other tcnoet is largely monodentate, albeit with a very weak interdimer Cu–N bond. Despite the arrangement in dinuclear units, further arranged into linear chains through the non-bridging tcnoet ligand, the compound shows no significant magnetic exchange, as deduced from magnetic susceptibility down to 4 K. Ligand-field, IR and EPR spectra in the solid state and in frozen solution are reported and are consistent with the overall structure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Environmental sensors collect massive amounts of audio data. This thesis investigates computational methods to support human analysts in identifying faunal vocalisations from that audio. A series of experiments was conducted to trial the effectiveness of novel user interfaces. This research examines the rapid scanning of spectrograms, decision support tools for users, and cleaning methods for folksonomies. Together, these investigations demonstrate that providing computational support to human analysts increases their efficiency and accuracy; this allows bioacoustics projects to efficiently utilise their valuable human analysts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel technique for conducting robust voice activity detection (VAD) in high-noise recordings. We use Gaussian mixture modeling (GMM) to train two generic models; speech and non-speech. We then score smaller segments of a given (unseen) recording against each of these GMMs to obtain two respective likelihood scores for each segment. These scores are used to compute a dissimilarity measure between pairs of segments and to carry out complete-linkage clustering of the segments into speech and non-speech clusters. We compare the accuracy of our method against state-of-the-art and standardised VAD techniques to demonstrate an absolute improvement of 15% in half-total error rate (HTER) over the best performing baseline system and across the QUT-NOISE-TIMIT database. We then apply our approach to the Audio-Visual Database of American English (AVDBAE) to demonstrate the performance of our algorithm in using visual, audio-visual or a proposed fusion of these features.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genetic correlation (rg) analysis determines how much of the correlation between two measures is due to common genetic influences. In an analysis of 4 Tesla diffusion tensor images (DTI) from 531 healthy young adult twins and their siblings, we generalized the concept of genetic correlation to determine common genetic influences on white matter integrity, measured by fractional anisotropy (FA), at all points of the brain, yielding an NxN genetic correlation matrix rg(x,y) between FA values at all pairs of voxels in the brain. With hierarchical clustering, we identified brain regions with relatively homogeneous genetic determinants, to boost the power to identify causal single nucleotide polymorphisms (SNP). We applied genome-wide association (GWA) to assess associations between 529,497 SNPs and FA in clusters defined by hubs of the clustered genetic correlation matrix. We identified a network of genes, with a scale-free topology, that influences white matter integrity over multiple brain regions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Imaging genetics aims to discover how variants in the human genome influence brain measures derived from images. Genome-wide association scans (GWAS) can screen the genome for common differences in our DNA that relate to brain measures. In small samples, GWAS has low power as individual gene effects are weak and one must also correct for multiple comparisons across the genome and the image. Here we extend recent work on genetic clustering of images, to analyze surface-based models of anatomy using GWAS. We performed spherical harmonic analysis of hippocampal surfaces, automatically extracted from brain MRI scans of 1254 subjects. We clustered hippocampal surface regions with common genetic influences by examining genetic correlations (r(g)) between the normalized deformation values at all pairs of surface points. Using genetic correlations to cluster surface measures, we were able to boost effect sizes for genetic associations, compared to clustering with traditional phenotypic correlations using Pearson's r.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To understand factors that affect brain connectivity and integrity, it is beneficial to automatically cluster white matter (WM) fibers into anatomically recognizable tracts. Whole brain tractography, based on diffusion-weighted MRI, generates vast sets of fibers throughout the brain; clustering them into consistent and recognizable bundles can be difficult as there are wide individual variations in the trajectory and shape of WM pathways. Here we introduce a novel automated tract clustering algorithm based on label fusion - a concept from traditional intensity-based segmentation. Streamline tractography generates many incorrect fibers, so our top-down approach extracts tracts consistent with known anatomy, by mapping multiple hand-labeled atlases into a new dataset. We fuse clustering results from different atlases, using a mean distance fusion scheme. We reliably extracted the major tracts from 105-gradient high angular resolution diffusion images (HARDI) of 198 young normal twins. To compute population statistics, we use a pointwise correspondence method to match, compare, and average WM tracts across subjects. We illustrate our method in a genetic study of white matter tract heritability in twins.