901 resultados para Centralize density-based spatial clustering of applications with noise


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study subdivides the Potter Cove, King George Island, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis includes in total 42 different environmental variables, interpolated based on samples taken during Australian summer seasons 2010/2011 and 2011/2012. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared and the most reasonable method has been applied. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested and 4, 7, 10 as well as 12 were identified as reasonable numbers for clustering the Potter Cove. Especially the results of 10 and 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning techniques are used for extracting valuable knowledge from data. Nowa¬days, these techniques are becoming even more important due to the evolution in data ac¬quisition and storage, which is leading to data with different characteristics that must be exploited. Therefore, advances in data collection must be accompanied with advances in machine learning techniques to solve new challenges that might arise, on both academic and real applications. There are several machine learning techniques depending on both data characteristics and purpose. Unsupervised classification or clustering is one of the most known techniques when data lack of supervision (unlabeled data) and the aim is to discover data groups (clusters) according to their similarity. On the other hand, supervised classification needs data with supervision (labeled data) and its aim is to make predictions about labels of new data. The presence of data labels is a very important characteristic that guides not only the learning task but also other related tasks such as validation. When only some of the available data are labeled whereas the others remain unlabeled (partially labeled data), neither clustering nor supervised classification can be used. This scenario, which is becoming common nowadays because of labeling process ignorance or cost, is tackled with semi-supervised learning techniques. This thesis focuses on the branch of semi-supervised learning closest to clustering, i.e., to discover clusters using available labels as support to guide and improve the clustering process. Another important data characteristic, different from the presence of data labels, is the relevance or not of data features. Data are characterized by features, but it is possible that not all of them are relevant, or equally relevant, for the learning process. A recent clustering tendency, related to data relevance and called subspace clustering, claims that different clusters might be described by different feature subsets. This differs from traditional solutions to data relevance problem, where a single feature subset (usually the complete set of original features) is found and used to perform the clustering process. The proximity of this work to clustering leads to the first goal of this thesis. As commented above, clustering validation is a difficult task due to the absence of data labels. Although there are many indices that can be used to assess the quality of clustering solutions, these validations depend on clustering algorithms and data characteristics. Hence, in the first goal three known clustering algorithms are used to cluster data with outliers and noise, to critically study how some of the most known validation indices behave. The main goal of this work is however to combine semi-supervised clustering with subspace clustering to obtain clustering solutions that can be correctly validated by using either known indices or expert opinions. Two different algorithms are proposed from different points of view to discover clusters characterized by different subspaces. For the first algorithm, available data labels are used for searching for subspaces firstly, before searching for clusters. This algorithm assigns each instance to only one cluster (hard clustering) and is based on mapping known labels to subspaces using supervised classification techniques. Subspaces are then used to find clusters using traditional clustering techniques. The second algorithm uses available data labels to search for subspaces and clusters at the same time in an iterative process. This algorithm assigns each instance to each cluster based on a membership probability (soft clustering) and is based on integrating known labels and the search for subspaces into a model-based clustering approach. The different proposals are tested using different real and synthetic databases, and comparisons to other methods are also included when appropriate. Finally, as an example of real and current application, different machine learning tech¬niques, including one of the proposals of this work (the most sophisticated one) are applied to a task of one of the most challenging biological problems nowadays, the human brain model¬ing. Specifically, expert neuroscientists do not agree with a neuron classification for the brain cortex, which makes impossible not only any modeling attempt but also the day-to-day work without a common way to name neurons. Therefore, machine learning techniques may help to get an accepted solution to this problem, which can be an important milestone for future research in neuroscience. Resumen Las técnicas de aprendizaje automático se usan para extraer información valiosa de datos. Hoy en día, la importancia de estas técnicas está siendo incluso mayor, debido a que la evolución en la adquisición y almacenamiento de datos está llevando a datos con diferentes características que deben ser explotadas. Por lo tanto, los avances en la recolección de datos deben ir ligados a avances en las técnicas de aprendizaje automático para resolver nuevos retos que pueden aparecer, tanto en aplicaciones académicas como reales. Existen varias técnicas de aprendizaje automático dependiendo de las características de los datos y del propósito. La clasificación no supervisada o clustering es una de las técnicas más conocidas cuando los datos carecen de supervisión (datos sin etiqueta), siendo el objetivo descubrir nuevos grupos (agrupaciones) dependiendo de la similitud de los datos. Por otra parte, la clasificación supervisada necesita datos con supervisión (datos etiquetados) y su objetivo es realizar predicciones sobre las etiquetas de nuevos datos. La presencia de las etiquetas es una característica muy importante que guía no solo el aprendizaje sino también otras tareas relacionadas como la validación. Cuando solo algunos de los datos disponibles están etiquetados, mientras que el resto permanece sin etiqueta (datos parcialmente etiquetados), ni el clustering ni la clasificación supervisada se pueden utilizar. Este escenario, que está llegando a ser común hoy en día debido a la ignorancia o el coste del proceso de etiquetado, es abordado utilizando técnicas de aprendizaje semi-supervisadas. Esta tesis trata la rama del aprendizaje semi-supervisado más cercana al clustering, es decir, descubrir agrupaciones utilizando las etiquetas disponibles como apoyo para guiar y mejorar el proceso de clustering. Otra característica importante de los datos, distinta de la presencia de etiquetas, es la relevancia o no de los atributos de los datos. Los datos se caracterizan por atributos, pero es posible que no todos ellos sean relevantes, o igualmente relevantes, para el proceso de aprendizaje. Una tendencia reciente en clustering, relacionada con la relevancia de los datos y llamada clustering en subespacios, afirma que agrupaciones diferentes pueden estar descritas por subconjuntos de atributos diferentes. Esto difiere de las soluciones tradicionales para el problema de la relevancia de los datos, en las que se busca un único subconjunto de atributos (normalmente el conjunto original de atributos) y se utiliza para realizar el proceso de clustering. La cercanía de este trabajo con el clustering lleva al primer objetivo de la tesis. Como se ha comentado previamente, la validación en clustering es una tarea difícil debido a la ausencia de etiquetas. Aunque existen muchos índices que pueden usarse para evaluar la calidad de las soluciones de clustering, estas validaciones dependen de los algoritmos de clustering utilizados y de las características de los datos. Por lo tanto, en el primer objetivo tres conocidos algoritmos se usan para agrupar datos con valores atípicos y ruido para estudiar de forma crítica cómo se comportan algunos de los índices de validación más conocidos. El objetivo principal de este trabajo sin embargo es combinar clustering semi-supervisado con clustering en subespacios para obtener soluciones de clustering que puedan ser validadas de forma correcta utilizando índices conocidos u opiniones expertas. Se proponen dos algoritmos desde dos puntos de vista diferentes para descubrir agrupaciones caracterizadas por diferentes subespacios. Para el primer algoritmo, las etiquetas disponibles se usan para bus¬car en primer lugar los subespacios antes de buscar las agrupaciones. Este algoritmo asigna cada instancia a un único cluster (hard clustering) y se basa en mapear las etiquetas cono-cidas a subespacios utilizando técnicas de clasificación supervisada. El segundo algoritmo utiliza las etiquetas disponibles para buscar de forma simultánea los subespacios y las agru¬paciones en un proceso iterativo. Este algoritmo asigna cada instancia a cada cluster con una probabilidad de pertenencia (soft clustering) y se basa en integrar las etiquetas conocidas y la búsqueda en subespacios dentro de clustering basado en modelos. Las propuestas son probadas utilizando diferentes bases de datos reales y sintéticas, incluyendo comparaciones con otros métodos cuando resulten apropiadas. Finalmente, a modo de ejemplo de una aplicación real y actual, se aplican diferentes técnicas de aprendizaje automático, incluyendo una de las propuestas de este trabajo (la más sofisticada) a una tarea de uno de los problemas biológicos más desafiantes hoy en día, el modelado del cerebro humano. Específicamente, expertos neurocientíficos no se ponen de acuerdo en una clasificación de neuronas para la corteza cerebral, lo que imposibilita no sólo cualquier intento de modelado sino también el trabajo del día a día al no tener una forma estándar de llamar a las neuronas. Por lo tanto, las técnicas de aprendizaje automático pueden ayudar a conseguir una solución aceptada para este problema, lo cual puede ser un importante hito para investigaciones futuras en neurociencia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A PET imaging system demonstrator based on LYSO crystal arrays coupled to SiPM matrices is under construction at the University and INFN of Pisa. Two SiPM matrices, composed of 8×8 SiPM pixels, and 1,5 mm pitch, have been coupled one to one to a LYSO crystals array and read out by a custom electronics system. front-end ASICs were used to read 8 channels of each matrix. Data from each front-end were multiplexed and sent to a DAQ board for the digital conversion; a motherboard collects the data and communicates with a host computer through a USB port for the storage and off-line data processing. In this paper we show the first preliminary tomographic image of a point-like radioactive source acquired with part of the two detection heads in time coincidence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The crop simulation model AquaCrop, recently developed by FAO can be used for a wide range of purposes. However, in its present form, its use over large areas or for applications that require a large number of simulations runs (e.g., long-term analysis), is not practical without developing software to facilitate such applications. Two tools for managing the inputs and outputs of AquaCrop, named AquaData and AquaGIS, have been developed for this purpose and are presented here. Both software utilities have been programmed in Delphi v. 5 and in addition, AquaGIS requires the Geographic Information System (GIS) programming tool MapObjects. These utilities allow the efficient management of input and output files, along with a GIS module to develop spatial analysis and effect spatial visualization of the results, facilitating knowledge dissemination. A sample of application of the utilities is given here, as an AquaCrop simulation analysis of impact of climate change on wheat yield in Southern Spain, which requires extensive input data preparation and output processing. The use of AquaCrop without the two utilities would have required approximately 1000 h of work, while the utilization of AquaData and AquaGIS reduced that time by more than 99%. Furthermore, the use of GIS, made it possible to perform a spatial analysis of the results, thus providing a new option to extend the use of the AquaCrop model to scales requiring spatial and temporal analyses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The biggest problem when analyzing the brain is that its synaptic connections are extremely complex. Generally, the billions of neurons making up the brain exchange information through two types of highly specialized structures: chemical synapses (the vast majority) and so-called gap junctions (a substrate of one class of electrical synapse). Here we are interested in exploring the three-dimensional spatial distribution of chemical synapses in the cerebral cortex. Recent research has showed that the three-dimensional spatial distribution of synapses in layer III of the neocortex can be modeled by a random sequential adsorption (RSA) point process, i.e., synapses are distributed in space almost randomly, with the only constraint that they cannot overlap. In this study we hypothesize that RSA processes can also explain the distribution of synapses in all cortical layers. We also investigate whether there are differences in both the synaptic density and spatial distribution of synapses between layers. Using combined focused ion beam milling and scanning electron microscopy (FIB/SEM), we obtained three-dimensional samples from the six layers of the rat somatosensory cortex and identified and reconstructed the synaptic junctions. A total volume of tissue of approximately 4500μm3 and around 4000 synapses from three different animals were analyzed. Different samples, layers and/or animals were aggregated and compared using RSA replicated spatial point processes. The results showed no significant differences in the synaptic distribution across the different rats used in the study. We found that RSA processes described the spatial distribution of synapses in all samples of each layer. We also found that the synaptic distribution in layers II to VI conforms to a common underlying RSA process with different densities per layer. Interestingly, the results showed that synapses in layer I had a slightly different spatial distribution from the other layers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Substances containing unpaired electrons have been studied by electron paramagnetic resonance (EPR) for nearly 70 years. With continual development and enhancement of EPR techniques, questions have arisen regarding optimum method selection for a given sample based on its properties. In this work, radiation defects, natural lattice defects, solid organic radicals, radicals in solution, and spin-labeled proteins were analyzed using CW, pulse, and rapid scan EPR to compare methods. Studies of solid BDPA, EOe in quartz, Ns0 in diamond, and a-Si:H, showed that rapid scan could overcome many obstacles presented by other techniques, cementing rapid scan as an effective alternative to CW and pulse methods. Relaxation times of six nitroxide radicals were characterized from 0.25-34 GHz, guiding synthesis of improved nitroxides for in vivo imaging experiments. Processes contributing to T1 of DPPH in polystyrene were found through variable temperature measurements at X- and Q-band, resolving previously-reported discrepancies in relaxation properties and providing new insight into this commonly-used standard. In the history of EPR, the study of proteins is relatively new. Double electron-electron resonance (DEER) has emerged as a powerful technique for the study of amyloid fibrils, a class of protein aggregates implicated in a number of neurodegenerative disorders. Microtubule-associated protein tau forms fibrils linked to Alzheimerfs disease through seeded conversion of monomer. Self-assembly is mediated by the microtubule binding repeats in tau, and there are either three or four repeats present depending on the isoform. DEER was used to show that filaments of 3R and 4R tau are conformationally distinct and that 4R fibrils adopt a heterogeneous mixture of conformations. Populations of 4R fibril conformations, which were independently validated using a model system, can be modulated by introduction of mutations to the primary sequence or by varying fibril growth conditions. These findings provided unprecedented insights into the seed selection of tau monomers and established conformational compatibility as an important driving force in tau fibril propagation. Lastly, DEER acquisition was improved through addition of paramagnetic metal to spin-labeled protein, decreasing collection time, and through use of a novel spin label with increased T2, thereby lengthening the available acquisition window.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this article, we present several novel techniques to effectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important level-two topological relations: contains, contained, overlap, and disjoint. We first present a novel framework to construct a multiscale Euler histogram in 2D space with the guarantee of the exact summarization results for aligned windows in constant time. To minimize the storage space in such a multiscale Euler histogram, an approximate algorithm with the approximate ratio 19/12 is presented, while the problem is shown NP-hard generally. To conform to a limited storage space where a multiscale histogram may be allowed to have only k Euler histograms, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy in approximately summarizing aligned windows. Then, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. We also investigate the problem of nonaligned windows and the problem of effectively partitioning the data space to support nonaligned window queries. Finally, we extend our techniques to 3D space. Our extensive experiments against both synthetic and real world datasets demonstrate that the approximate multiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost efficiency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for many popular real datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Client-side caching of spatial data is an important yet very much under investigated issue. Effective caching of vector spatial data has the potential to greatly improve the performance of spatial applications in the Web and wireless environments. In this paper, we study the problem of semantic spatial caching, focusing on effective organization of spatial data and spatial query trimming to take advantage of cached data. Semantic caching for spatial data is a much more complex problem than semantic caching for aspatial data. Several novel ideas are proposed in this paper for spatial applications. A number of typical spatial application scenarios are used to generate spatial query sequences. An extensive experimental performance study is conducted based on these scenarios using real spatial data. We demonstrate a significant performance improvement using our ideas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Summarizing topological relations is fundamental to many spatial applications including spatial query optimization. In this paper, we present several novel techniques to eectively construct cell density based spatial histograms for range (window) summarizations restricted to the four most important topological relations: contains, contained, overlap, and disjoint. We rst present a novel framework to construct a multiscale histogram composed of multiple Euler histograms with the guarantee of the exact summarization results for aligned windows in constant time. Then we present an approximate algorithm, with the approximate ratio 19/12, to minimize the storage spaces of such multiscale Euler histograms, although the problem is generally NP-hard. To conform to a limited storage space where only k Euler histograms are allowed, an effective algorithm is presented to construct multiscale histograms to achieve high accuracy. Finally, we present a new approximate algorithm to query an Euler histogram that cannot guarantee the exact answers; it runs in constant time. Our extensive experiments against both synthetic and real world datasets demonstrated that the approximate mul- tiscale histogram techniques may improve the accuracy of the existing techniques by several orders of magnitude while retaining the cost effciency, and the exact multiscale histogram technique requires only a storage space linearly proportional to the number of cells for the real datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study tested three hypotheses: (1) that there is clustering of the neuronal cytoplasmic inclusions (NCI), astrocytic plaques (AP) and ballooned neurons (BN) in corticobasal degeneration (CBD), (2) that the clusters of NCI and BN are not spatially correlated, and (3) that the lesions are correlated with disease ‘stage’. In 50% of the regions, clusters of lesions were 400–800 µm in diameter and regularly distributed parallel to the tissue boundary. Clusters of NCI and BN were larger in laminae II/III and V/VI, respectively. In a third of regions, the clusters of BN and NCI were negatively spatially correlated. Cluster size of the BN in the parahippocampal gyrus (PHG) was positively correlated with disease ‘stage’. The data suggest the following: (1) degeneration of the cortico-cortical pathways in CBD, (2) clusters of NCI and BN may affect different anatomical pathways and (3) BN may develop after the NCI in the PHG.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dementia with neurofilament inclusions (DNI) is a new disorder characterized clinically by early-onset dementia and histologically by the presence of intraneural inclusions immunopositive for neurofilament antigens but lacking tau and α-synuclein reactivity. We studied the clustering patterns of the neurofilament inclusions (NI) in regions of the temporal lobe in three cases of DNI to determine whether they have the same spatial patterns as inclusions in the tauopathies and α-synucleinopathies. The NI exhibited a clustered distribution (mean size of clusters 400 μm, range 50-800 μm, SD 687.8) in 24/28 of the areas studied. In 22 of these areas, the clusters exhibited a regular distribution along the tissue parallel to the pia mater or alveus. In 3 cortical areas, there was evidence of a more complex pattern in which the NI clusters were aggregated into larger superclusters. In 6 cortical areas, the size of the clusters approximated to those of the cells of origin of the cortico-cortical pathways but in the remaining areas cluster size was smaller than 400 μm. Despite the unique molecular profile of the NI, their spatial patterns are similar to those shown by filamentous neuronal inclusions in the tauopathies and α-synucleinopathies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The spatial pattern of the prion protein (PrP) deposits was studied in the cerebral cortex and cerebellum in 10 patients with sporadic Creutzfeldt–Jakob disease (CJD). In all patients the PrP deposits were aggregated into clusters and, in 90% of cortical areas and in 50% of cerebellar sections, the clusters exhibited a regular periodicity parallel to the tissue boundary; a spatial pattern also exhibited by ß-amyloid (Aß) deposits in Alzheimer's disease (AD). In the cerebral cortex, the incidence of regular clustering of the PrP deposits was similar in the upper and lower cortical laminae. The sizes of the PrP clusters in the upper and lower cortex were uncorrelated. No significant differences in mean cluster size of the PrP deposits were observed between brain regions. The size, location and distribution of the PrP deposit clusters suggest that PrP deposition occurs in relation to specific anatomical pathways and supports the hypothesis that prion pathology spreads through the brain via such pathways. In addition, the data suggest that there are similarities in the pathogenesis of extracellular protein deposits in prion disease and in AD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Similar pathological processes may be involved in the deposition of extracellular proteins in the brains of patients with Creutzfeldt-Jakob disease (CJD) and Alzheimer's disease (AD). Hence, this study compared the spatial patterns of prion protein (PrP) deposits in the cerebral cortex and hippocampus in cases of sporadic CJD with those of β-amyloid (Aβ) deposits in sporadic AD. PrP and Aβ deposits were aggregated into clusters and, in 90% of brain areas in CJD and 57% in AD, the clusters were regularly distributed parallel to the tissue boundary. In a significant proportion of cortical analyses, the mean diameter of the clusters of PrP and Aβ deposits were similar to those of the cells of origin of the cortico-cortical pathways. Aβ deposits in AD were distributed more frequently in larger-sized clusters than PrP deposits in CJD. In addition, in the hippocampus and dentate gyrus, clustering of Aβ deposits was observed in AD but PrP deposits were rare in these regions in CJD. The size, location and distribution of the extracellular protein deposits within the cortex of both disorders was consistent with the degeneration of the cortico-cortical pathways. Furthermore, spread of the pathology along these pathways may be a pathogenic feature common to CJD and AD. © 2001 Elsevier Science Ireland Ltd.