979 resultados para clustering techniques


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Image categorization by means of bag of visual words has received increasing attention by the image processing and vision communities in the last years. In these approaches, each image is represented by invariant points of interest which are mapped to a Hilbert Space representing a visual dictionary which aims at comprising the most discriminative features in a set of images. Notwithstanding, the main problem of such approaches is to find a compact and representative dictionary. Finding such representative dictionary automatically with no user intervention is an even more difficult task. In this paper, we propose a method to automatically find such dictionary by employing a recent developed graph-based clustering algorithm called Optimum-Path Forest, which does not make any assumption about the visual dictionary's size and is more efficient and effective than the state-of-the-art techniques used for dictionary generation. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays, organizations face the problem of keeping their information protected, available and trustworthy. In this context, machine learning techniques have also been extensively applied to this task. Since manual labeling is very expensive, several works attempt to handle intrusion detection with traditional clustering algorithms. In this paper, we introduce a new pattern recognition technique called Optimum-Path Forest (OPF) clustering to this task. Experiments on three public datasets have showed that OPF classifier may be a suitable tool to detect intrusions on computer networks, since it outperformed some state-of-the-art unsupervised techniques. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces the Optimum-Path Forest (OPF) classifier for static video summarization, being its results comparable to the ones obtained by some state-of-the-art video summarization techniques. The experimental section has been conducted using several image descriptors in two public datasets, followed by an analysis of OPF robustness regarding one ad-hoc parameter. Future works are guided to improve OPF effectiveness on each distinct video category.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Image categorization by means of bag of visual words has received increasing attention by the image processing and vision communities in the last years. In these approaches, each image is represented by invariant points of interest which are mapped to a Hilbert Space representing a visual dictionary which aims at comprising the most discriminative features in a set of images. Notwithstanding, the main problem of such approaches is to find a compact and representative dictionary. Finding such representative dictionary automatically with no user intervention is an even more difficult task. In this paper, we propose a method to automatically find such dictionary by employing a recent developed graph-based clustering algorithm called Optimum-Path Forest, which does not make any assumption about the visual dictionary's size and is more efficient and effective than the state-of-the-art techniques used for dictionary generation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we deal with the problem of boosting the Optimum-Path Forest (OPF) clustering approach using evolutionary-based optimization techniques. As the OPF classifier performs an exhaustive search to find out the size of sample's neighborhood that allows it to reach the minimum graph cut as a quality measure, we compared several optimization techniques that can obtain close graph cut values to the ones obtained by brute force. Experiments in two public datasets in the context of unsupervised network intrusion detection have showed the evolutionary optimization techniques can find suitable values for the neighborhood faster than the exhaustive search. Additionally, we have showed that it is not necessary to employ many agents for such task, since the neighborhood size is defined by discrete values, with constrain the set of possible solution to a few ones.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Satellite remote sensing has proved to be an effective support in timely detection and monitoring of marine oil pollution, mainly due to illegal ship discharges. In this context, we have developed a new methodology and technique for optical oil spill detection, which make use of MODIS L2 and MERIS L1B satellite top of atmosphere (TOA) reflectance imagery, for the first time in a highly automated way. The main idea was combining wide swaths and short revisit times of optical sensors with SAR observations, generally used in oil spill monitoring. This arises from the necessity to overcome the SAR reduced coverage and long revisit time of the monitoring area. This can be done now, given the MODIS and MERIS higher spatial resolution with respect to older sensors (250-300 m vs. 1 km), which consents the identification of smaller spills deriving from illicit discharge at sea. The procedure to obtain identifiable spills in optical reflectance images involves removal of oceanic and atmospheric natural variability, in order to enhance oil-water contrast; image clustering, which purpose is to segment the oil spill eventually presents in the image; finally, the application of a set of criteria for the elimination of those features which look like spills (look-alikes). The final result is a classification of oil spill candidate regions by means of a score based on the above criteria.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As the performance gap between microprocessors and memory continues to increase, main memory accesses result in long latencies which become a factor limiting system performance. Previous studies show that main memory access streams contain significant localities and SDRAM devices provide parallelism through multiple banks and channels. These locality and parallelism have not been exploited thoroughly by conventional memory controllers. In this thesis, SDRAM address mapping techniques and memory access reordering mechanisms are studied and applied to memory controller design with the goal of reducing observed main memory access latency. The proposed bit-reversal address mapping attempts to distribute main memory accesses evenly in the SDRAM address space to enable bank parallelism. As memory accesses to unique banks are interleaved, the access latencies are partially hidden and therefore reduced. With the consideration of cache conflict misses, bit-reversal address mapping is able to direct potential row conflicts to different banks, further improving the performance. The proposed burst scheduling is a novel access reordering mechanism, which creates bursts by clustering accesses directed to the same rows of the same banks. Subjected to a threshold, reads are allowed to preempt writes and qualified writes are piggybacked at the end of the bursts. A sophisticated access scheduler selects accesses based on priorities and interleaves accesses to maximize the SDRAM data bus utilization. Consequentially burst scheduling reduces row conflict rate, increasing and exploiting the available row locality. Using a revised SimpleScalar and M5 simulator, both techniques are evaluated and compared with existing academic and industrial solutions. With SPEC CPU2000 benchmarks, bit-reversal reduces the execution time by 14% on average over traditional page interleaving address mapping. Burst scheduling also achieves a 15% reduction in execution time over conventional bank in order scheduling. Working constructively together, bit-reversal and burst scheduling successfully achieve a 19% speedup across simulated benchmarks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Prevotella nigrescens, Prevotella intermedia and Porphyromonas gingivalis are oral pathogens from the family Bacteroidaceae, regularly isolated from cases of gingivitis and periodontitis. In this study, the phylogenetic variability of these three bacterial species was investigated by means of 16S rRNA (rrs) gene sequence comparisons of a set of epidemiologically and geographically diverse isolates. For each of the three species, the rrs gene sequences of 11 clinical isolates as well as the corresponding type strains was determined. Comparison of all rrs sequences obtained with those of closely related species revealed a clear clustering of species, with only a little intraspecies variability but a clear difference in the rrs gene with respect to the next related taxon. The results indicate that the three species form stable, homogeneous genetic groups, which favours an rrs-based species identification of these oral pathogens. This is especially useful given the 7% sequence divergence between Prevotella intermedia and Prevotella nigrescens, since phenotypic distinction between the two Prevotella species is inconsistent or involves techniques not applicable in routine identification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVES In dental research multiple site observations within patients or taken at various time intervals are commonplace. These clustered observations are not independent; statistical analysis should be amended accordingly. This study aimed to assess whether adjustment for clustering effects during statistical analysis was undertaken in five specialty dental journals. METHODS Thirty recent consecutive issues of Orthodontics (OJ), Periodontology (PJ), Endodontology (EJ), Maxillofacial (MJ) and Paediatric Dentristry (PDJ) journals were hand searched. Articles requiring adjustment accounting for clustering effects were identified and statistical techniques used were scrutinized. RESULTS Of 559 studies considered to have inherent clustering effects, adjustment for this was made in the statistical analysis in 223 (39.1%). Studies published in the Periodontology specialty accounted for clustering effects in the statistical analysis more often than articles published in other journals (OJ vs. PJ: OR=0.21, 95% CI: 0.12, 0.37, p<0.001; MJ vs. PJ: OR=0.02, 95% CI: 0.00, 0.07, p<0.001; PDJ vs. PJ: OR=0.14, 95% CI: 0.07, 0.28, p<0.001; EJ vs. PJ: OR=0.11, 95% CI: 0.06, 0.22, p<0.001). A positive correlation was found between increasing prevalence of clustering effects in individual specialty journals and correct statistical handling of clustering (r=0.89). CONCLUSIONS The majority of studies in 5 dental specialty journals (60.9%) examined failed to account for clustering effects in statistical analysis where indicated, raising the possibility of inappropriate decreases in p-values and the risk of inappropriate inferences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the problem of fitting a union of subspaces to a collection of data points drawn from one or more subspaces and corrupted by noise and/or gross errors. We pose this problem as a non-convex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and self-expressive dictionary plus a matrix of noise and/or gross errors. By self-expressive we mean a dictionary whose atoms can be expressed as linear combinations of themselves with low-rank coefficients. In the case of noisy data, our key contribution is to show that this non-convex matrix decomposition problem can be solved in closed form from the SVD of the noisy data matrix. The solution involves a novel polynomial thresholding operator on the singular values of the data matrix, which requires minimal shrinkage. For one subspace, a particular case of our framework leads to classical PCA, which requires no shrinkage. For multiple subspaces, the low-rank coefficients obtained by our framework can be used to construct a data affinity matrix from which the clustering of the data according to the subspaces can be obtained by spectral clustering. In the case of data corrupted by gross errors, we solve the problem using an alternating minimization approach, which combines our polynomial thresholding operator with the more traditional shrinkage-thresholding operator. Experiments on motion segmentation and face clustering show that our framework performs on par with state-of-the-art techniques at a reduced computational cost.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Randomised controlled trials (RCTs) of psychotherapeutic interventions assume that specific techniques are used in treatments, which are responsible for changes in the client's symptoms. This assumption also holds true for meta-analyses, where evidence for specific interventions and techniques is compiled. However, it has also been argued that different treatments share important techniques and that an upcoming consensus about useful treatment strategies is leading to a greater integration of treatments. This makes assumptions about the effectiveness of specific interventions ingredients questionable if the shared (common) techniques are more often used in interventions than are the unique techniques. This study investigated the unique or shared techniques in RCTs of cognitive-behavioural therapy (CBT) and short-term psychodynamic psychotherapy (STPP). Psychotherapeutic techniques were coded from 42 masked treatment descriptions of RCTs in the field of depression (1979-2010). CBT techniques were often used in studies identified as either CBT or STPP. However, STPP techniques were only used in STPP-identified studies. Empirical clustering of treatment descriptions did not confirm the original distinction of CBT versus STPP, but instead showed substantial heterogeneity within both approaches. Extraction of psychotherapeutic techniques from the treatment descriptions is feasible and could be used as a content-based approach to classify treatments in systematic reviews and meta-analyses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a noninvasive technique for quantitative assessment of the integrity of blood-brain barrier and blood-spinal cord barrier (BSCB) in the presence of central nervous system pathologies. However, the results of DCE-MRI show substantial variability. The high variability can be caused by a number of factors including inaccurate T1 estimation, insufficient temporal resolution and poor contrast-to-noise ratio. My thesis work is to develop improved methods to reduce the variability of DCE-MRI results. To obtain fast and accurate T1 map, the Look-Locker acquisition technique was implemented with a novel and truly centric k-space segmentation scheme. In addition, an original multi-step curve fitting procedure was developed to increase the accuracy of T1 estimation. A view sharing acquisition method was implemented to increase temporal resolution, and a novel normalization method was introduced to reduce image artifacts. Finally, a new clustering algorithm was developed to reduce apparent noise in the DCE-MRI data. The performance of these proposed methods was verified by simulations and phantom studies. As part of this work, the proposed techniques were applied to an in vivo DCE-MRI study of experimental spinal cord injury (SCI). These methods have shown robust results and allow quantitative assessment of regions with very low vascular permeability. In conclusion, applications of the improved DCE-MRI acquisition and analysis methods developed in this thesis work can improve the accuracy of the DCE-MRI results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cognitive wireless sensor network (CWSN) is a new paradigm, integrating cognitive features in traditional wireless sensor networks (WSNs) to mitigate important problems such as spectrum occupancy. Security in cognitive wireless sensor networks is an important problem since these kinds of networks manage critical applications and data. The specific constraints of WSN make the problem even more critical, and effective solutions have not yet been implemented. Primary user emulation (PUE) attack is the most studied specific attack deriving from new cognitive features. This work discusses a new approach, based on anomaly behavior detection and collaboration, to detect the primary user emulation attack in CWSN scenarios. Two non-parametric algorithms, suitable for low-resource networks like CWSNs, have been used in this work: the cumulative sum and data clustering algorithms. The comparison is based on some characteristics such as detection delay, learning time, scalability, resources, and scenario dependency. The algorithms have been tested using a cognitive simulator that provides important results in this area. Both algorithms have shown to be valid in order to detect PUE attacks, reaching a detection rate of 99% and less than 1% of false positives using collaboration.