873 resultados para agglomerative clustering
Resumo:
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communication/interaction and by unusual repetitive and restricted behaviors and interests. ASD often co-occurs in the same families with other neuropsychiatric diseases (NPD), such as intellectual disability, schizophrenia, epilepsy, depression and attention deficit hyperactivity disorder. Genetic factors have an important role in ASD etiology. Multiple copy number variants (CNVs) and single nucleotide variants (SNVs) in candidate genes have been associated with an increased risk to develop ASD. Nevertheless, recent heritability estimates and the high genotypic and phenotypic heterogeneity characteristic of ASD indicate a role of environmental and epigenetic factors, such as long noncoding RNA (lncRNA) and microRNA (miRNA), as modulators of genetic expression and further clinical presentation. Both miRNA and lncRNA are functional RNA molecules that are transcribed from DNA but not translated into proteins, instead they act as powerful regulators of gene expression. While miRNA are small noncoding RNAs with 22-25 nucleotides in length that act at the post-transcriptional level of gene expression, the lncRNA are bigger molecules (>200 nucleotides in length) that are capped, spliced, and polyadenylated, similar to messenger RNA. Although few lncRNA were well characterized until date, there is a great evidence that they are implicated in several levels of gene expression (transcription/post-transcription/post-translation, organization of protein complexes, cell– cell signaling as well as recombination) as shown in figure 1.
Resumo:
© 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)
Resumo:
In this thesis we present a mathematical formulation of the interaction between microorganisms such as bacteria or amoebae and chemicals, often produced by the organisms themselves. This interaction is called chemotaxis and leads to cellular aggregation. We derive some models to describe chemotaxis. The first is the pioneristic Keller-Segel parabolic-parabolic model and it is derived by two different frameworks: a macroscopic perspective and a microscopic perspective, in which we start with a stochastic differential equation and we perform a mean-field approximation. This parabolic model may be generalized by the introduction of a degenerate diffusion parameter, which depends on the density itself via a power law. Then we derive a model for chemotaxis based on Cattaneo's law of heat propagation with finite speed, which is a hyperbolic model. The last model proposed here is a hydrodynamic model, which takes into account the inertia of the system by a friction force. In the limit of strong friction, the model reduces to the parabolic model, whereas in the limit of weak friction, we recover a hyperbolic model. Finally, we analyze the instability condition, which is the condition that leads to aggregation, and we describe the different kinds of aggregates we may obtain: the parabolic models lead to clusters or peaks whereas the hyperbolic models lead to the formation of network patterns or filaments. Moreover, we discuss the analogy between bacterial colonies and self gravitating systems by comparing the chemotactic collapse and the gravitational collapse (Jeans instability).
Resumo:
Chronic traumatic encephalopathy (CTE) is a neurodegenerative disorder which may result from repetitive brain injury. A variety of tau-immunoreactive pathologies are present, including neurofibrillary tangles (NFT), neuropil threads (NT), dot-like grains (DLG), astrocytic tangles (AT), and occasional neuritic plaques (NP). In tauopathies, cellular inclusions in the cortex are clustered within specific laminae, the clusters being regularly distributed parallel to the pia mater. To determine whether a similar spatial pattern is present in CTE, clustering of the tau-immunoreactive pathology was studied in the cortex, hippocampus, and dentate gyrus in 11 cases of CTE and 7 cases of Alzheimer’s disease neuropathologic change (ADNC) without CTE. In CTE: (1) all aspects of tau-immunoreactive pathology were clustered and the clusters were frequently regularly distributed parallel to the tissue boundary, (2) clustering was similar in two CTE cases with minimal co-pathology compared with cases with associated ADNC or TDP-43 proteinopathy, (3) in a proportion of cortical gyri, estimated cluster size was similar to that of cell columns of the cortico-cortical pathways, and (4) clusters of the tau-immunoreactive pathology were infrequently spatially correlated with blood vessels. The NFT and NP in ADNC without CTE were less frequently randomly or uniformly distributed and more frequently in defined clusters than in CTE. Hence, the spatial pattern of the tau-immunoreactive pathology observed in CTE is typical of the tauopathies but with some distinct differences compared to ADNC alone. The spread of pathogenic tau along anatomical pathways could be a factor in the pathogenesis of the disease.
Resumo:
The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.
Resumo:
Influenza A virus is an important human pathogen causative of yearly epidemics and occasional pandemics. The ability to replicate within the host cell is a determinant of virulence, amplifying viral numbers for host-to-host transmission. This process requires multiple rounds of entering permissive cells, replication, and virion assembly at the plasma membrane, the site of viral budding and release. The assembly of influenza A virus involves packaging of several viral (and host) proteins and of a segmented genome, composed of 8 distinct RNAs in the form of viral ribonucleoproteins (vRNPs). The selective assembly of the 8-segment core remains one of the most interesting unresolved problems in virology. The recycling endosome regulatory GTPase Rab11 was shown to contribute to the process, by transporting vRNPs to the periphery, giving rise to enlarged cytosolic puncta rich in Rab11 and the 8 vRNPs. We recently reported that vRNP hotspots were formed of clustered vesicles harbouring protruding electron-dense structures that resembled vRNPs. Mechanistically, vRNP hotspots were formed as vRNPs outcompeted the cognate effectors of Rab11, the Rab11-Family-Interacting-Proteins (FIPs) for binding, and as a consequence impair recycling sorting at an unknown step. Here, we speculate on the impact that such impairment might have in host immunity, membrane architecture and viral assembly.
Resumo:
Research regarding the use of social media among travelers has mainly focused on its impact on travelers’ travel planning process and there is consensus that travel decisions are highly influenced by social media. Yet, little attention has been paid to the differences among travelers regarding their use of social media for travel purposes. Based on the use of travel social media, cluster analysis was employed to identify different segments among travelers. Furthermore, the study profiles the clusters based on demographic and other travel related characteristics. The findings of this study are important to online marketers to better understand traveler’s use of social media and their characteristics, in order to adapt online marketing strategies according to the profile of each segment.
Resumo:
Research regarding the use of social media among travelers has mainly focused on its impact on travelers’ travel planning process and there is consensus that travel decisions are highly influenced by social media. Yet, little attention has been paid to the differences among travelers regarding their use of social media for travel purposes. Based on the use of travel social media, cluster analysis was employed to identify different segments among travelers. Furthermore, the study profiles the clusters based on demographic and other travel related characteristics. The findings of this study are important to online marketers to better understand traveler’s use of social media and their characteristics, in order to adapt online marketing strategies according to the profile of each segment.
Resumo:
Diese Studie untersucht Gruppen von Ortsnamen in Deutschland (in den Postleitregionen) nach vorhandenen Ähnlichkeiten. Als Messgröße wird ein Häufigkeitsvektor von Trigrammen in jeder Gruppe herangezogen. Mit der Anwendung des Average Linkage-Algorithmus auf die Messgröße werden Cluster aus räumlich zusammenhängenden Gebieten gebildet, obwohl das Verfahren keine Kenntnis über die Lage der Cluster zueinander besitzt. In den Clustern werden die zehn häufigsten n-Gramme ermittelt, um charakteristische Wortpartikel darzustellen. Die von den Clustern umschriebenen Gebiete lassen sich zwanglos durch historische oder linguistische Entwicklungen erklären. Das hier verwendete Verfahren setzt jedoch kein linguistisches, geographisches oder historisches Wissen voraus, ermöglicht aber die Gruppierung von Namen in eindeutiger Weise unter Berücksichtigung einer Vielzahl von Wortpartikeln in einem Schritt. Die Vorgehensweise ohne Vorwissen unterscheidet diese Studie von den meisten bisher angewendeten Untersuchungen.
Resumo:
This paper proposes a novel demand response model using a fuzzy subtractive cluster approach. The model development provides support to domestic consumer decisions on controllable loads management, considering consumers’ consumption needs and the appropriate load shape or rescheduling in order to achieve possible economic benefits. The model based on fuzzy subtractive clustering method considers clusters of domestic consumption covering an adequate consumption range. Analysis of different scenarios is presented considering available electric power and electric energy prices. Simulation results are presented and conclusions of the proposed demand response model are discussed.
Resumo:
The success of regional development policies depends on the homogeneity of the territorial units. This paper aims to propose a framework for obtaining homogenous territorial clusters based on a Pareto frontier considering multiple criteria related to territories’ endogenous resources, economic profile and socio-cultural features. This framework is developed in two phases. First, the criteria correlated with development at the territorial unit level are determined through statistical and econometric methods. Then, a multi-criteria approach is developed to allocate each territorial unit (parishes) to a territorial agglomerate, according to the Pareto frontier established.
Resumo:
This paper proposes a novel demand response model using a fuzzy subtractive cluster approach. The model development provides support to domestic consumer decisions on controllable loads management, considering consumers’ consumption needs and the appropriate load shape or rescheduling in order to achieve possible economic benefits. The model based on fuzzy subtractive clustering method considers clusters of domestic consumption covering an adequate consumption range. Analysis of different scenarios is presented considering available electric power and electric energy prices. Simulation results are presented and conclusions of the proposed demand response model are discussed.
Resumo:
Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivariate data streams based on their evolving behavior over time, exploring the correlations among their attributes by computing the fractal dimension. Experimental results with climate data streams show that the clusters' quality and compactness can be improved compared to the competing method, leading to the thoughtfulness that attributes correlations cannot be put aside. In fact, the clusters' compactness are 7 to 25 times better using our method. Our framework also proves to be an useful tool to assist meteorologists in understanding the climate behavior along a period of time.
Resumo:
In this work we compare Grapholita molesta Busck (Lepidoptera: Tortricidae) populations originated from Brazil, Chile, Spain, Italy and Greece using power spectral density and phylogenetic analysis to detect any similarities between the population macro- and the molecular micro-level. Log-transformed population data were normalized and AR(p) models were developed to generate for each case population time series of equal lengths. The time-frequency/scale properties of the population data were further analyzed using wavelet analysis to detect any population dynamics frequency changes and cluster the populations. Based on the power spectral of each population time series and the hierarchical clustering schemes, populations originated from Southern America (Brazil and Chile) exhibit similar rhythmic properties and are both closer related with populations originated from Greece. Populations from Spain and especially Italy, have higher distance by terms of periodic changes on their population dynamics. Moreover, the members within the same cluster share similar spectral information, therefore they are supposed to participate in the same temporally regulated population process. On the contrary, the phylogenetic approach revealed a less structured pattern that bears indications of panmixia, as the two clusters contain individuals from both Europe and South America. This preliminary outcome will be further assessed by incorporating more individuals and likely employed a second molecular marker.
Resumo:
The investigations of the large-scale structure of our Universe provide us with extremely powerful tools to shed light on some of the open issues of the currently accepted Standard Cosmological Model. Until recently, constraining the cosmological parameters from cosmic voids was almost infeasible, because the amount of data in void catalogues was not enough to ensure statistically relevant samples. The increasingly wide and deep fields in present and upcoming surveys have made the cosmic voids become promising probes, despite the fact that we are not yet provided with a unique and generally accepted definition for them. In this Thesis we address the two-point statistics of cosmic voids, in the very first attempt to model its features with cosmological purposes. To this end, we implement an improved version of the void power spectrum presented by Chan et al. (2014). We have been able to build up an exceptionally robust method to tackle with the void clustering statistics, by proposing a functional form that is entirely based on first principles. We extract our data from a suite of high-resolution N-body simulations both in the LCDM and alternative modified gravity scenarios. To accurately compare the data to the theory, we calibrate the model by accounting for a free parameter in the void radius that enters the theory of void exclusion. We then constrain the cosmological parameters by means of a Bayesian analysis. As far as the modified gravity effects are limited, our model is a reliable method to constrain the main LCDM parameters. By contrast, it cannot be used to model the void clustering in the presence of stronger modification of gravity. In future works, we will further develop our analysis on the void clustering statistics, by testing our model on large and high-resolution simulations and on real data, also addressing the void clustering in the halo distribution. Finally, we also plan to combine these constraints with those of other cosmological probes.