947 resultados para spatial clustering algorithms
Resumo:
The appearance of large geolocated communication datasets has recently increased our understanding of how social networks relate to their physical space. However, many recurrently reported properties, such as the spatial clustering of network communities, have not yet been systematically tested at different scales. In this work we analyze the social network structure of over 25 million phone users from three countries at three different scales: country, provinces and cities. We consistently find that this last urban scenario presents significant differences to common knowledge about social networks. First, the emergence of a giant component in the network seems to be controlled by whether or not the network spans over the entire urban border, almost independently of the population or geographic extension of the city. Second, urban communities are much less geographically clustered than expected. These two findings shed new light on the widely-studied searchability in self-organized networks. By exhaustive simulation of decentralized search strategies we conclude that urban networks are searchable not through geographical proximity as their country-wide counterparts, but through an homophily-driven community structure.
Resumo:
Alignments of homologous genes typically reveal a great diversity of intron locations, far more than could fit comfortably in a single gene. Thus, a minority of these intron positions could be inherited from a single ancestral gene, but the larger share must be attributed to subsequent events of intron gain or intron “sliding” (movement from one position to another within a gene). Intron sliding has been argued from cases of discordant introns and from putative spatial clustering of intron positions. A list of 32 cases of discordant introns is presented here. Most of these cases are found to be artefactual. The spatial and phylogenetic distributions of intron positions from five published compilations of gene data, comprising 205 intron positions, have been examined systematically for evidence of intron sliding. The results suggest that sliding, if it occurs at all, has contributed little to the diversity of intron positions.
Resumo:
Although much of the brain’s functional organization is genetically predetermined, it appears that some noninnate functions can come to depend on dedicated and segregated neural tissue. In this paper, we describe a series of experiments that have investigated the neural development and organization of one such noninnate function: letter recognition. Functional neuroimaging demonstrates that letter and digit recognition depend on different neural substrates in some literate adults. How could the processing of two stimulus categories that are distinguished solely by cultural conventions become segregated in the brain? One possibility is that correlation-based learning in the brain leads to a spatial organization in cortex that reflects the temporal and spatial clustering of letters with letters in the environment. Simulations confirm that environmental co-occurrence does indeed lead to spatial localization in a neural network that uses correlation-based learning. Furthermore, behavioral studies confirm one critical prediction of this co-occurrence hypothesis, namely, that subjects exposed to a visual environment in which letters and digits occur together rather than separately (postal workers who process letters and digits together in Canadian postal codes) do indeed show less behavioral evidence for segregated letter and digit processing.
Resumo:
The occurrence of rockbursts was quite common during active mining periods in the Champion reef mines of Kolar gold fields, India. Among the major rockbursts, the ‘area-rockbursts’ were unique both in regard to their spatio-temporal distribution and the extent of damage caused to the mine workings. A detailed study of the spatial clustering of 3 major area-rockbursts (ARB) was carried out using a multi-fractal technique involving generalized correlation integral functions. The spatial distribution analysis of all 3 area-rockbursts showed that they are heterogeneous. The degree of heterogeneity (D2 – D∞) in the cases of ARB-I, II and III were found to be 0.52, 0.37 and 0.41 respectively. These differences in fractal structure indicate that the ARBs of the present study were fully controlled by different heterogeneous stress fields associated with different mining and geological conditions. The present study clearly showed the advantages of the application of multi-fractals to seismic data and to characterise, analyse and examine the area-rockbursts and their causative factors in the Kolar gold mines.
Resumo:
This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.
Resumo:
This paper presents a statistical comparison of regional phonetic and lexical variation in American English. Both the phonetic and lexical datasets were first subjected to separate multivariate spatial analyses in order to identify the most common dimensions of spatial clustering in these two datasets. The dimensions of phonetic and lexical variation extracted by these two analyses were then correlated with each other, after being interpolated over a shared set of reference locations, in order to measure the similarity of regional phonetic and lexical variation in American English. This analysis shows that regional phonetic and lexical variation are remarkably similar in Modern American English.
Resumo:
Mainstream gentrification research predominantly examines experiences and motivations of the middle-class gentrifier groups, while overlooking experiences of non-gentrifying groups including the impact of in situ local processes on gentrification itself. In this paper, I discuss gentrification, neighbourhood belonging and spatial distribution of class in Istanbul by examining patterns of belonging both of gentrifiers and non-gentrifying groups in historic neighbourhoods of the Golden Horn/Halic. I use multiple correspondence analysis (MCA), a methodology rarely used in gentrification research, to explore social and symbolic borders between these two groups. I show how gentrification leads to spatial clustering by creating exclusionary practices and eroding social cohesion, and illuminate divisions that are inscribed into the physical space of the neighbourhood.
Resumo:
Segmentation is an important step in many medical imaging applications and a variety of image segmentation techniques exist. One group of segmentation algorithms is based on clustering concepts. In this article we investigate several fuzzy c-means based clustering algorithms and their application to medical image segmentation. In particular we evaluate the conventional hard c-means (HCM) and fuzzy c-means (FCM) approaches as well as three computationally more efficient derivatives of fuzzy c-means: fast FCM with random sampling, fast generalised FCM, and a new anisotropic mean shift based FCM. © 2010 by IJTS, ISDER.
Resumo:
Modern IT infrastructures are constructed by large scale computing systems and administered by IT service providers. Manually maintaining such large computing systems is costly and inefficient. Service providers often seek automatic or semi-automatic methodologies of detecting and resolving system issues to improve their service quality and efficiency. This dissertation investigates several data-driven approaches for assisting service providers in achieving this goal. The detailed problems studied by these approaches can be categorized into the three aspects in the service workflow: 1) preprocessing raw textual system logs to structural events; 2) refining monitoring configurations for eliminating false positives and false negatives; 3) improving the efficiency of system diagnosis on detected alerts. Solving these problems usually requires a huge amount of domain knowledge about the particular computing systems. The approaches investigated by this dissertation are developed based on event mining algorithms, which are able to automatically derive part of that knowledge from the historical system logs, events and tickets. ^ In particular, two textual clustering algorithms are developed for converting raw textual logs into system events. For refining the monitoring configuration, a rule based alert prediction algorithm is proposed for eliminating false alerts (false positives) without losing any real alert and a textual classification method is applied to identify the missing alerts (false negatives) from manual incident tickets. For system diagnosis, this dissertation presents an efficient algorithm for discovering the temporal dependencies between system events with corresponding time lags, which can help the administrators to determine the redundancies of deployed monitoring situations and dependencies of system components. To improve the efficiency of incident ticket resolving, several KNN-based algorithms that recommend relevant historical tickets with resolutions for incoming tickets are investigated. Finally, this dissertation offers a novel algorithm for searching similar textual event segments over large system logs that assists administrators to locate similar system behaviors in the logs. Extensive empirical evaluation on system logs, events and tickets from real IT infrastructures demonstrates the effectiveness and efficiency of the proposed approaches.^
Resumo:
Forest trees, like oaks, rely on high levels of genetic variation to adapt to varying environmental conditions. Thus, genetic variation and its distribution are important for the long-term survival and adaptability of oak populations. Climate change is projected to lead to increased drought and fire events as well as a northward migration of tree species, including oaks. Additionally, decline in oak regeneration has become increasingly concerning since it may lead to decreased gene flow and increased inbreeding levels. This will in turn lead to lowered levels of genetic diversity, negatively affecting the growth and survival of populations. At the same time, populations at the species’ distribution edge, like those in this study, could possess important stores of genetic diversity and adaptive potential, while also being vulnerable to climatic or anthropogenic changes. A survey of the level and distribution of genetic variation and identification of potentially adaptive genes is needed since adaptive genetic variation is essential for their long-term survival. Oaks possess a remarkable characteristic in that they maintain their species identity and specific environmental adaptations despite their propensity to hybridize. Thus, in the face of interspecific gene flow, some areas of the genome remain differentiated due to selection. This characteristic allows the study of local environmental adaptation through genetic variation analyses. Furthermore, using genic markers with known putative functions makes it possible to link those differentiated markers to potential adaptive traits (e.g., flowering time, drought stress tolerance). Demographic processes like gene flow and genetic drift also play an important role in how genes (including adaptive genes) are maintained or spread. These processes are influenced by disturbances, both natural and anthropogenic. An examination of how genetic variation is geographically distributed can display how these genetic processes and geographical disturbances influence genetic variation patterns. For example, the spatial clustering of closely related trees could promote inbreeding with associated negative effects (inbreeding depression), if gene flow is limited. In turn this can have negative consequences for a species’ ability to adapt to changing environmental conditions. In contrast, interspecific hybridization may also allow the transfer of genes between species that increase their adaptive potential in a changing environment. I have studied the ecologically divergent, interfertile red oaks, Quercus rubra and Q. ellipsoidalis, to identify genes with potential roles in adaptation to abiotic stress through traits such as drought tolerance and flowering time, and to assess the level and distribution of genetic variation. I found evidence for moderate gene flow between the two species and low interspecific genetic differences at most genetic markers (Lind and Gailing 2013). However, the screening of genic markers with potential roles in phenology and drought tolerance led to the identification of a CONSTANS-like (COL) gene, a candidate gene for flowering time and growth. This marker, located in the coding region of the gene, was highly differentiated between the two species in multiple geographical areas, despite interspecific gene flow, and may play a role in reproductive isolation and adaptive divergence between the two species (Lind-Riehl et al. 2014). Since climate change could result in a northward migration of trees species like oaks, this gene could be important in maintaining species identity despite increased contact zones between species (e.g., increased gene flow). Finally I examined differences in spatial genetic structure (SGS) and genetic variation between species and populations subjected to different management strategies and natural disturbances. Diverse management activities combined with various natural disturbances as well as species specific life history traits influenced SGS patterns and inbreeding levels (Lind-Riehl and Gailing submitted).
Resumo:
Virtually every sector of business and industry that uses computing, including financial analysis, search engines, and electronic commerce, incorporate Big Data analysis into their business model. Sophisticated clustering algorithms are popular for deducing the nature of data by assigning labels to unlabeled data. We address two main challenges in Big Data. First, by definition, the volume of Big Data is too large to be loaded into a computer’s memory (this volume changes based on the computer used or available, but there is always a data set that is too large for any computer). Second, in real-time applications, the velocity of new incoming data prevents historical data from being stored and future data from being accessed. Therefore, we propose our Streaming Kernel Fuzzy c-Means (stKFCM) algorithm, which reduces both computational complexity and space complexity significantly. The proposed stKFCM only requires O(n2) memory where n is the (predetermined) size of a data subset (or data chunk) at each time step, which makes this algorithm truly scalable (as n can be chosen based on the available memory). Furthermore, only 2n2 elements of the full N × N (where N >> n) kernel matrix need to be calculated at each time-step, thus reducing both the computation time in producing the kernel elements and also the complexity of the FCM algorithm. Empirical results show that stKFCM, even with relatively very small n, can provide clustering performance as accurately as kernel fuzzy c-means run on the entire data set while achieving a significant speedup.
Resumo:
BACKGROUND African swine fever (ASF) is one of the most complex viral diseases affecting both domestic and wild pigs. It is caused by ASF virus (ASFV), the only DNA virus which can be efficiently transmitted by an arthropod vector, soft ticks of the genus Ornithodoros. These ticks can be part of ASFV-transmission cycles, and in Europe, O. erraticus was shown to be responsible for long-term maintenance of ASFV in Spain and Portugal. In 2014, the disease has been reintroduced into the European Union, affecting domestic pigs and, importantly, also the Eurasian wild boar population. In a first attempt to assess the risk of a tick-wild boar transmission cycle in Central Europe that would further complicate eradication of the disease, over 700 pre-existing serum samples from wild boar hunted in four representative German Federal States were investigated for the presence of antibodies directed against salivary antigen of Ornithodoros erraticus ticks using an indirect ELISA format. RESULTS Out of these samples, 16 reacted with moderate to high optical densities that could be indicative of tick bites in sampled wild boar. However, these samples did not show a spatial clustering (they were collected from distant geographical regions) and were of bad quality (hemolysis/impurities). Furthermore, all positive samples came from areas with suboptimal climate for soft ticks. For this reason, false positive reactions are likely. CONCLUSION In conclusion, the study did not provide stringent evidence for soft tick-wild boar contact in the investigated German Federal States and thus, a relevant involvement in the epidemiology of ASF in German wild boar is unlikely. This fact would facilitate the eradication of ASF in the area, although other complex relations (wild boar biology and interactions with domestic pigs) need to be considered.
Resumo:
Introducción: El Cáncer es prevenible en algunos casos, si se evita la exposición a sustancias cancerígenas en el medio ambiente. En Colombia, Cundinamarca es uno de los departamentos con mayores incrementos en la tasa de mortalidad y en el municipio de Sibaté, habitantes han manifestado preocupación por el incremento de la enfermedad. En el campo de la salud ambiental mundial, la georreferenciación aplicada al estudio de fenómenos en salud, ha tenido éxito con resultados válidos. El estudio propuso usar herramientas de información geográfica, para generar análisis de tiempo y espacio que hicieran visible el comportamiento del cáncer en Sibaté y sustentaran hipótesis de influencias ambientales sobre concentraciones de casos. Objetivo: Obtener incidencia y prevalencia de casos de cáncer en habitantes de Sibaté y georreferenciar los casos en un periodo de 5 años, con base en indagación de registros. Metodología: Estudio exploratorio descriptivo de corte transversal,sobre todos los diagnósticos de cáncer entre los años 2010 a 2014, encontrados en los archivos de la Secretaria de Salud municipal. Se incluyeron unicamente quienes tuvieron residencia permanente en el municipio y fueron diagnosticados con cáncer entre los años de 2010 a 2104. Sobre cada caso se obtuvo género, edad, estrato socioeconómico, nivel académico, ocupación y estado civil. Para el análisis de tiempo se usó la fecha de diagnóstico y para el análisis de espacio, la dirección de residencia, tipo de cáncer y coordenada geográfica. Se generaron coordenadas geográficas con un equipo GPS Garmin y se crearon mapas con los puntos de la ubicación de las viviendas de los pacientes. Se proceso la información, con Epi Info 7 Resultados: Se encontraron 107 casos de cáncer registrados en la Secretaria de Salud de Sibaté, 66 mujeres, 41 hombres. Sin división de género, el 30.93% de la población presento cáncer del sistema reproductor, el 18,56% digestivo y el 17,53% tegumentario. Se presentaron 2 grandes casos de agrupaciones espaciales en el territorio estudiado, una en el Barrio Pablo Neruda con 12 (21,05%) casos y en el casco Urbano de Sibaté con 38 (66,67%) casos. Conclusión: Se corroboro que el análisis geográfico con variables espacio temporales y de exposición, puede ser la herramienta para generar hipótesis sobre asociaciones de casos de cáncer con factores ambientales.
Resumo:
In rural and isolated areas without cellular coverage, Satellite Communication (SatCom) is the best candidate to complement terrestrial coverage. However, the main challenge for future generations of wireless networks will be to meet the growing demand for new services while dealing with the scarcity of frequency spectrum. As a result, it is critical to investigate more efficient methods of utilizing the limited bandwidth; and resource sharing is likely the only choice. The research community’s focus has recently shifted towards the interference management and exploitation paradigm to meet the increasing data traffic demands. In the Downlink (DL) and Feedspace (FS), LEO satellites with an on-board antenna array can offer service to numerous User Terminals (UTs) (VSAT or Handhelds) on-ground in FFR schemes by using cutting-edge digital beamforming techniques. Considering this setup, the adoption of an effective user scheduling approach is a critical aspect given the unusually high density of User terminals on the ground as compared to the on-board available satellite antennas. In this context, one possibility is that of exploiting clustering algorithms for scheduling in LEO MU-MIMO systems in which several users within the same group are simultaneously served by the satellite via Space Division Multiplexing (SDM), and then these different user groups are served in different time slots via Time Division Multiplexing (TDM). This thesis addresses this problem by defining a user scheduling problem as an optimization problem and discusses several algorithms to solve it. In particular, focusing on the FS and user service link (i.e., DL) of a single MB-LEO satellite operating below 6 GHz, the user scheduling problem in the Frequency Division Duplex (FDD) mode is addressed. The proposed State-of-the-Art scheduling approaches are based on graph theory. The proposed solution offers high performance in terms of per-user capacity, Sum-rate capacity, SINR, and Spectral Efficiency.
Resumo:
This thesis project aims to the development of an algorithm for the obstacle detection and the interaction between the safety areas of an Automated Guided Vehicles (AGV) and a Point Cloud derived map inside the context of a CAD software. The first part of the project focuses on the implementation of an algorithm for the clipping of general polygons, with which has been possible to: construct the safety areas polygon, derive the sweep of this areas along the navigation path performing a union and detect the intersections with line or polygon representing the obstacles. The second part is about the construction of a map in terms of geometric entities (lines and polygons) starting from a point cloud given by the 3D scan of the environment. The point cloud is processed using: filters, clustering algorithms and concave/convex hull derived algorithms in order to extract line and polygon entities representing obstacles. Finally, the last part aims to use the a priori knowledge of possible obstacle detections on a given segment, to predict the behavior of the AGV and use this prediction to optimize the choice of the vehicle's assigned velocity in that segment, minimizing the travel time.