856 resultados para Data Driven Clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources an dWeb services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial ‘mashups’ to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and ‘correlation’ of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The clustering pattern of diffuse, primitive and classic β-amyloid (Aβ) deposits was studied in the upper laminae of the frontal cortex of 9 patients with sporadic Alzheimer's disease (AD). Aβ stained tissue was counterstained with collagen type IV antiserum to determine whether the clusters of Aβ deposits were related to blood vessels. In all patients, Aβ deposits and blood vessels were clustered, with in many patients, a regular periodicity of clusters along the cortex parallel to the pia. The classic Aβ deposit clusters coincided with those of the larger blood vessels in all patients and with clusters of smaller blood vessels in 4 patients. Diffuse deposit clusters were related to blood vessels in 3 patients. Primitive deposit clusters were either unrelated to or negatively correlated with the blood vessels in six patients. Hence, Aβ deposit subtypes differ in their relationship to blood vessels. The data suggest a direct and specific role for the larger blood vessels in the formation of amyloid cores in AD. © 1995.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The spatial distribution patterns of the diffuse, primitive, and classic beta-amyloid (Abeta) deposits were studied in areas of the medial temporal lobe in 12 cases of Down's Syndrome (DS) 35 to 67 years of age. Large clusters of diffuse deposits were present in the youngest patients; cluster size then declined with patient age but increased again in the oldest patients. By contrast, the cluster sizes of the primitive and classic deposits increased with age to a maximum in patients 45 to 55 and 60 years of age respectively and declined in size in the oldest patients. In the parahippocampal gyrus (PHG), the clusters of the primitive deposits were most highly clustered in cases of intermediate age. The data suggest a developmental sequence in DS in which Abeta is deposited initially in the form of large clusters of diffuse deposits that are then gradually replaced by clusters of primitive and classic deposits. The oldest patients were an exception to this sequence in that the pattern of clustering resembled that of the youngest patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aims of the project were twofold: 1) To investigate classification procedures for remotely sensed digital data, in order to develop modifications to existing algorithms and propose novel classification procedures; and 2) To investigate and develop algorithms for contextual enhancement of classified imagery in order to increase classification accuracy. The following classifiers were examined: box, decision tree, minimum distance, maximum likelihood. In addition to these the following algorithms were developed during the course of the research: deviant distance, look up table and an automated decision tree classifier using expert systems technology. Clustering techniques for unsupervised classification were also investigated. Contextual enhancements investigated were: mode filters, small area replacement and Wharton's CONAN algorithm. Additionally methods for noise and edge based declassification and contextual reclassification, non-probabilitic relaxation and relaxation based on Markov chain theory were developed. The advantages of per-field classifiers and Geographical Information Systems were investigated. The conclusions presented suggest suitable combinations of classifier and contextual enhancement, given user accuracy requirements and time constraints. These were then tested for validity using a different data set. A brief examination of the utility of the recommended contextual algorithms for reducing the effects of data noise was also carried out.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

University students encounter difficulties with academic English because of its vocabulary, phraseology, and variability, and also because academic English differs in many respects from general English, the language which they have experienced before starting their university studies. Although students have been provided with many dictionaries that contain some helpful information on words used in academic English, these dictionaries remain focused on the uses of words in general English. There is therefore a gap in the dictionary market for a dictionary for university students, and this thesis provides a proposal for such a dictionary (called the Dictionary of Academic English; DOAE) in the form of a model which depicts how the dictionary should be designed, compiled, and offered to students. The model draws on state-of-the-art techniques in lexicography, dictionary-use research, and corpus linguistics. The model demanded the creation of a completely new corpus of academic language (Corpus of Academic Journal Articles; CAJA). The main advantages of the corpus are its large size (83.5 million words) and balance. Having access to a large corpus of academic language was essential for a corpus-driven approach to data analysis. A good corpus balance in terms of domains enabled a detailed domain-labelling of senses, patterns, collocates, etc. in the dictionary database, which was then used to tailor the output according to the needs of different types of student. The model proposes an online dictionary that is designed as an online dictionary from the outset. The proposed dictionary is revolutionary in the way it addresses the needs of different types of student. It presents students with a dynamic dictionary whose contents can be customised according to the user's native language, subject of study, variant spelling preferences, and/or visual preferences (e.g. black and white).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We develop and study the concept of dataflow process networks as used for exampleby Kahn to suit exact computation over data types related to real numbers, such as continuous functions and geometrical solids. Furthermore, we consider communicating these exact objectsamong processes using protocols of a query-answer nature as introduced in our earlier work. This enables processes to provide valid approximations with certain accuracy and focusing on certainlocality as demanded by the receiving processes through queries. We define domain-theoretical denotational semantics of our networks in two ways: (1) directly, i. e. by viewing the whole network as a composite process and applying the process semantics introduced in our earlier work; and (2) compositionally, i. e. by a fixed-point construction similarto that used by Kahn from the denotational semantics of individual processes in the network. The direct semantics closely corresponds to the operational semantics of the network (i. e. it iscorrect) but very difficult to study for concrete networks. The compositional semantics enablescompositional analysis of concrete networks, assuming it is correct. We prove that the compositional semantics is a safe approximation of the direct semantics. Wealso provide a method that can be used in many cases to establish that the two semantics fully coincide, i. e. safety is not achieved through inactivity or meaningless answers. The results are extended to cover recursively-defined infinite networks as well as nested finitenetworks. A robust prototype implementation of our model is available.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Origin of hydrodynamic turbulence in rotating shear flows is investigated. The particular emphasis is on flows whose angular velocities decrease but specific angular momenta increase with increasing radial coordinate. Such flows are Rayleigh stable, but must be turbulent in order to explain observed data. Such a mismatch between the linear theory and observations/experiments is more severe when any hydromagnetic/magnetohydrodynamic instability and the corresponding turbulence therein is ruled out. The present work explores the effect of stochastic noise on such hydrodynamic flows. We focus on a small section of such a flow which is essentially a plane shear flow supplemented by the Coriolis effect. This also mimics a small section of an astrophysical accretion disk. It is found that such stochastically driven flows exhibit large temporal and spatial correlations of perturbation velocities, and hence large energy dissipations, that presumably generate instability. A range of angular velocity profiles (for the steady flow), starting with the constant angular momentum to that of the constant circular velocity are explored. It is shown that the growth and roughness exponents calculated from the contour (envelope) of the perturbed flows are all identical, revealing a unique universality class for the stochastically forced hydrodynamics of rotating shear flows. This work, to the best of our knowledge, is the first attempt to understand origin of instability and turbulence in the three-dimensional Rayleigh stable rotating shear flows by introducing additive stochastic noise to the underlying linearized governing equations. This has important implications in resolving the turbulence problem in astrophysical hydrodynamic flows such as accretion disks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Magnetoencephalography (MEG), a non-invasive technique for characterizing brain electrical activity, is gaining popularity as a tool for assessing group-level differences between experimental conditions. One method for assessing task-condition effects involves beamforming, where a weighted sum of field measurements is used to tune activity on a voxel-by-voxel basis. However, this method has been shown to produce inhomogeneous smoothness differences as a function of signal-to-noise across a volumetric image, which can then produce false positives at the group level. Here we describe a novel method for group-level analysis with MEG beamformer images that utilizes the peak locations within each participant's volumetric image to assess group-level effects. We compared our peak-clustering algorithm with SnPM using simulated data. We found that our method was immune to artefactual group effects that can arise as a result of inhomogeneous smoothness differences across a volumetric image. We also used our peak-clustering algorithm on experimental data and found that regions were identified that corresponded with task-related regions identified in the literature. These findings suggest that our technique is a robust method for group-level analysis with MEG beamformer images.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Emerging vehicular comfort applications pose a host of completely new set of requirements such as maintaining end-to-end connectivity, packet routing, and reliable communication for internet access while on the move. One of the biggest challenges is to provide good quality of service (QoS) such as low packet delay while coping with the fast topological changes. In this paper, we propose a clustering algorithm based on minimal path loss ratio (MPLR) which should help in spectrum efficiency and reduce data congestion in the network. The vehicular nodes which experience minimal path loss are selected as the cluster heads. The performance of the MPLR clustering algorithm is calculated by rate of change of cluster heads, average number of clusters and average cluster size. Vehicular traffic models derived from the Traffic Wales data are fed as input to the motorway simulator. A mathematical analysis for the rate of change of cluster head is derived which validates the MPLR algorithm and is compared with the simulated results. The mathematical and simulated results are in good agreement indicating the stability of the algorithm and the accuracy of the simulator. The MPLR system is also compared with V2R system with MPLR system performing better. © 2013 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper clarifies the role of alternative optimal solutions in the clustering of multidimensional observations using data envelopment analysis (DEA). The paper shows that alternative optimal solutions corresponding to several units produce different groups with different sizes and different decision making units (DMUs) at each class. This implies that a specific DMU may be grouped into different clusters when the corresponding DEA model has multiple optimal solutions. © 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Multiple Pheromone Ant Clustering Algorithm (MPACA) models the collective behaviour of ants to find clusters in data and to assign objects to the most appropriate class. It is an ant colony optimisation approach that uses pheromones to mark paths linking objects that are similar and potentially members of the same cluster or class. Its novelty is in the way it uses separate pheromones for each descriptive attribute of the object rather than a single pheromone representing the whole object. Ants that encounter other ants frequently enough can combine the attribute values they are detecting, which enables the MPACA to learn influential variable interactions. This paper applies the model to real-world data from two domains. One is logistics, focusing on resource allocation rather than the more traditional vehicle-routing problem. The other is mental-health risk assessment. The task for the MPACA in each domain was to predict class membership where the classes for the logistics domain were the levels of demand on haulage company resources and the mental-health classes were levels of suicide risk. Results on these noisy real-world data were promising, demonstrating the ability of the MPACA to find patterns in the data with accuracy comparable to more traditional linear regression models. © 2013 Polish Information Processing Society.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We argue that, for certain constrained domains, elaborate model transformation technologies-implemented from scratch in general-purpose programming languages-are unnecessary for model-driven engineering; instead, lightweight configuration of commercial off-the-shelf productivity tools suffices. In particular, in the CancerGrid project, we have been developing model-driven techniques for the generation of software tools to support clinical trials. A domain metamodel captures the community's best practice in trial design. A scientist authors a trial protocol, modelling their trial by instantiating the metamodel; customized software artifacts to support trial execution are generated automatically from the scientist's model. The metamodel is expressed as an XML Schema, in such a way that it can be instantiated by completing a form to generate a conformant XML document. The same process works at a second level for trial execution: among the artifacts generated from the protocol are models of the data to be collected, and the clinician conducting the trial instantiates such models in reporting observations-again by completing a form to create a conformant XML document, representing the data gathered during that observation. Simple standard form management tools are all that is needed. Our approach is applicable to a wide variety of information-modelling domains: not just clinical trials, but also electronic public sector computing, customer relationship management, document workflow, and so on. © 2012 Springer-Verlag.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Origin of hydrodynamic turbulence in rotating shear flows is investigated. The particular emphasis is on flows whose angular velocities decrease but specific angular momenta increase with increasing radial coordinate. Such flows are Rayleigh stable, but must be turbulent in order to explain observed data. Such a mismatch between the linear theory and observations/experiments is more severe when any hydromagnetic/magnetohydrodynamic instability and the corresponding turbulence therein is ruled out. The present work explores the effect of stochastic noise on such hydrodynamic flows. We focus on a small section of such a flow which is essentially a plane shear flow supplemented by the Coriolis effect. This also mimics a small section of an astrophysical accretion disk. It is found that such stochastically driven flows exhibit large temporal and spatial correlations of perturbation velocities, and hence large energy dissipations, that presumably generate instability. A range of angular velocity profiles (for the steady flow), starting with the constant angular momentum to that of the constant circular velocity are explored. It is shown that the growth and roughness exponents calculated from the contour (envelope) of the perturbed flows are all identical, revealing a unique universality class for the stochastically forced hydrodynamics of rotating shear flows. This work, to the best of our knowledge, is the first attempt to understand origin of instability and turbulence in the three-dimensional Rayleigh stable rotating shear flows by introducing additive stochastic noise to the underlying linearized governing equations. This has important implications in resolving the turbulence problem in astrophysical hydrodynamic flows such as accretion disks.