23 resultados para Clustering analysis

em Aston University Research Archive


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Magnetoencephalography (MEG), a non-invasive technique for characterizing brain electrical activity, is gaining popularity as a tool for assessing group-level differences between experimental conditions. One method for assessing task-condition effects involves beamforming, where a weighted sum of field measurements is used to tune activity on a voxel-by-voxel basis. However, this method has been shown to produce inhomogeneous smoothness differences as a function of signal-to-noise across a volumetric image, which can then produce false positives at the group level. Here we describe a novel method for group-level analysis with MEG beamformer images that utilizes the peak locations within each participant's volumetric image to assess group-level effects. We compared our peak-clustering algorithm with SnPM using simulated data. We found that our method was immune to artefactual group effects that can arise as a result of inhomogeneous smoothness differences across a volumetric image. We also used our peak-clustering algorithm on experimental data and found that regions were identified that corresponded with task-related regions identified in the literature. These findings suggest that our technique is a robust method for group-level analysis with MEG beamformer images.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Principal components analysis (PCA) has been described for over 50 years; however, it is rarely applied to the analysis of epidemiological data. In this study PCA was critically appraised in its ability to reveal relationships between pulsed-field gel electrophoresis (PFGE) profiles of methicillin- resistant Staphylococcus aureus (MRSA) in comparison to the more commonly employed cluster analysis and representation by dendrograms. The PFGE type following SmaI chromosomal digest was determined for 44 multidrug-resistant hospital-acquired methicillin-resistant S. aureus (MR-HA-MRSA) isolates, two multidrug-resistant community-acquired MRSA (MR-CA-MRSA), 50 hospital-acquired MRSA (HA-MRSA) isolates (from the University Hospital Birmingham, NHS Trust, UK) and 34 community-acquired MRSA (CA-MRSA) isolates (from general practitioners in Birmingham, UK). Strain relatedness was determined using Dice band-matching with UPGMA clustering and PCA. The results indicated that PCA revealed relationships between MRSA strains, which were more strongly correlated with known epidemiology, most likely because, unlike cluster analysis, PCA does not have the constraint of generating a hierarchic classification. In addition, PCA provides the opportunity for further analysis to identify key polymorphic bands within complex genotypic profiles, which is not always possible with dendrograms. Here we provide a detailed description of a PCA method for the analysis of PFGE profiles to complement further the epidemiological study of infectious disease. © 2005 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When object databases arrived on the scene some ten years ago, they provided database capabilities for previously neglected, complex applications, such as CAD, but were burdened with one inherent teething problem, poor performance. Physical database design is one tool that can provide performance improvements and it is the general area of concern for this thesis. Clustering is one fruitful design technique which can provide improvements in performance. However, clustering in object databases has not been explored in depth and so has not been truly exploited. Further, clustering, although a physical concern, can be determined from the logical model. The object model is richer than previous models, notably the relational model, and so it is anticipated that the opportunities with respect to clustering are greater. This thesis provides a thorough analysis of object clustering strategies with a view to highlighting any links between the object logical and physical model and improving performance. This is achieved by considering all possible types of object logical model construct and the implementation of those constructs in terms of theoretical clusterings strategies to produce actual clustering arrangements. This analysis results in a greater understanding of object clustering strategies, aiding designers in the development process and providing some valuable rules of thumb to support the design process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The spatial patterns of discrete beta-amyloid (Abeta) deposits in brain tissue from patients with Alzheimer disease (AD) were studied using a statistical method based on linear regression, the results being compared with the more conventional variance/mean (V/M) method. Both methods suggested that Abeta deposits occurred in clusters (400 to <12,800 mu m in diameter) in all but 1 of the 42 tissues examined. In many tissues, a regular periodicity of the Abeta deposit clusters parallel to the tissue boundary was observed. In 23 of 42 (55%) tissues, the two methods revealed essentially the same spatial patterns of Abeta deposits; in 15 of 42 (36%), the regression method indicated the presence of clusters at a scale not revealed by the V/M method; and in 4 of 42 (9%), there was no agreement between the two methods. Perceived advantages of the regression method are that there is a greater probability of detecting clustering at multiple scales, the dimension of larger Abeta clusters can be estimated more accurately, and the spacing between the clusters may be estimated. However, both methods may be useful, with the regression method providing greater resolution and the V/M method providing greater simplicity and ease of interpretation. Estimates of the distance between regularly spaced Abeta clusters were in the range 2,200-11,800 mu m, depending on tissue and cluster size. The regular periodicity of Abeta deposit clusters in many tissues would be consistent with their development in relation to clusters of neurons that give rise to specific neuronal projections.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The spatial pattern of cellular neurofibrillary tangles (NFT) was studied in the supra- and infragranular layers of various cortical regions in cases of Alzheimer's disease (AD). The objective was to test the hypothesis that NFT formation was associated with the cells of origin of specific cortico-cortical projections. The novel feature of the study was that pattern analysis enabled the dimension and spacing of NFT clusters along the cortical ribbon to be estimated. In the majority of brain regions studied, NFT occurred in clusters of neurons which were regularly spaced along the cortical strip. This pattern is consistent with the predicted distribution of the cells of origin of specific cortico-cortico projections. Mean NFT cluster size varied from 250 to > 12800 microns in different cortical tissues suggesting either variation in the size of the cell clusters or a dynamic process in the development of NFT in relation to these cell clusters. The formation of NFT in cell clusters which may give rise to the feed-forward and feed-back cortico-cortical projections suggests a possible route of spread of NFT pathology in AD between cortical regions and from the cortex to subcortical areas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Discrete, microscopic lesions are developed in the brain in a number of neurodegenerative diseases. These lesions may not be randomly distributed in the tissue but exhibit a spatial pattern, i.e., a departure from randomness towards regularlity or clustering. The spatial pattern of a lesion may reflect its development in relation to other brain lesions or to neuroanatomical structures. Hence, a study of spatial pattern may help to elucidate the pathogenesis of a lesion. A number of statistical methods can be used to study the spatial patterns of brain lesions. They range from simple tests of whether the distribution of a lesion departs from random to more complex methods which can detect clustering and the size, distribution and spacing of clusters. This paper reviews the uses and limitations of these methods as applied to neurodegenerative disorders, and in particular to senile plaque formation in Alzheimer's disease.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To determine the factors influencing the distribution of -amyloid (Abeta) deposits in Alzheimer's disease (AD), the spatial patterns of the diffuse, primitive, and classic A deposits were studied from the superior temporal gyrus (STG) to sector CA4 of the hippocampus in six sporadic cases of the disease. In cortical gyri and in the CA sectors of the hippocampus, the Abeta deposits were distributed either in clusters 200-6400 microm in diameter that were regularly distributed parallel to the tissue boundary or in larger clusters greater than 6400 microm in diameter. In some regions, smaller clusters of Abeta deposits were aggregated into larger 'superclusters'. In many cortical gyri, the density of Abeta deposits was positively correlated with distance below the gyral crest. In the majority of regions, clusters of the diffuse, primitive, and classic deposits were not spatially correlated with each other. In two cases, double immunolabelled to reveal the Abeta deposits and blood vessels, the classic Abeta deposits were clustered around the larger diameter vessels. These results suggest a complex pattern of Abeta deposition in the temporal lobe in sporadic AD. A regular distribution of Abeta deposit clusters may reflect the degeneration of specific cortico-cortical and cortico-hippocampal pathways and the influence of the cerebral blood vessels. Large-scale clustering may reflect the aggregation of deposits in the depths of the sulci and the coalescence of smaller clusters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis seeks to describe the development of an inexpensive and efficient clustering technique for multivariate data analysis. The technique starts from a multivariate data matrix and ends with graphical representation of the data and pattern recognition discriminant function. The technique also results in distances frequency distribution that might be useful in detecting clustering in the data or for the estimation of parameters useful in the discrimination between the different populations in the data. The technique can also be used in feature selection. The technique is essentially for the discovery of data structure by revealing the component parts of the data. lhe thesis offers three distinct contributions for cluster analysis and pattern recognition techniques. The first contribution is the introduction of transformation function in the technique of nonlinear mapping. The second contribution is the us~ of distances frequency distribution instead of distances time-sequence in nonlinear mapping, The third contribution is the formulation of a new generalised and normalised error function together with its optimal step size formula for gradient method minimisation. The thesis consists of five chapters. The first chapter is the introduction. The second chapter describes multidimensional scaling as an origin of nonlinear mapping technique. The third chapter describes the first developing step in the technique of nonlinear mapping that is the introduction of "transformation function". The fourth chapter describes the second developing step of the nonlinear mapping technique. This is the use of distances frequency distribution instead of distances time-sequence. The chapter also includes the new generalised and normalised error function formulation. Finally, the fifth chapter, the conclusion, evaluates all developments and proposes a new program. for cluster analysis and pattern recognition by integrating all the new features.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The 21-day experimental gingivitis model, an established noninvasive model of inflammation in response to increasing bacterial accumulation in humans, is designed to enable the study of both the induction and resolution of inflammation. Here, we have analyzed gingival crevicular fluid, an oral fluid comprising a serum transudate and tissue exudates, by LC-MS/MS using Fourier transform ion cyclotron resonance mass spectrometry and iTRAQ isobaric mass tags, to establish meta-proteomic profiles of inflammation-induced changes in proteins in healthy young volunteers. Across the course of experimentally induced gingivitis, we identified 16 bacterial and 186 human proteins. Although abundances of the bacterial proteins identified did not vary temporally, Fusobacterium outer membrane proteins were detected. Fusobacterium species have previously been associated with periodontal health or disease. The human proteins identified spanned a wide range of compartments (both extracellular and intracellular) and functions, including serum proteins, proteins displaying antibacterial properties, and proteins with functions associated with cellular transcription, DNA binding, the cytoskeleton, cell adhesion, and cilia. PolySNAP3 clustering software was used in a multilayered analytical approach. Clusters of proteins that associated with changes to the clinical parameters included neuronal and synapse associated proteins.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Emerging vehicular comfort applications pose a host of completely new set of requirements such as maintaining end-to-end connectivity, packet routing, and reliable communication for internet access while on the move. One of the biggest challenges is to provide good quality of service (QoS) such as low packet delay while coping with the fast topological changes. In this paper, we propose a clustering algorithm based on minimal path loss ratio (MPLR) which should help in spectrum efficiency and reduce data congestion in the network. The vehicular nodes which experience minimal path loss are selected as the cluster heads. The performance of the MPLR clustering algorithm is calculated by rate of change of cluster heads, average number of clusters and average cluster size. Vehicular traffic models derived from the Traffic Wales data are fed as input to the motorway simulator. A mathematical analysis for the rate of change of cluster head is derived which validates the MPLR algorithm and is compared with the simulated results. The mathematical and simulated results are in good agreement indicating the stability of the algorithm and the accuracy of the simulator. The MPLR system is also compared with V2R system with MPLR system performing better. © 2013 IEEE.