814 resultados para Hierarchical clustering model


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reaction norm models have been widely used to study genotype by environment interaction (G × E) in animal breeding. The objective of this study was to describe environmental sensitivity across first lactation in Brazilian Holstein cows using a reaction norm approach. A total of 50,168 individual monthly test day (TD) milk yields (10 test days) from 7476 complete first lactations of Holstein cattle were analyzed. The statistical models for all traits (10 TDs and for 305-day milk yield) included the fixed effects of contemporary group, age of cow (linear and quadratic effects), and days in milk (linear effect), except for 305-day milk yield. A hierarchical reaction norm model (HRNM) based on the unknown covariate was used. The present study showed the presence of G × E in milk yield across first lactation of Holstein cows. The variation in the heritability estimates implies differences in the response to selection depending on the environment where the animals of this population are evaluated. In the average environment, the heritabilities for all traits were rather similar, in range from 0.02 to 0.63. The scaling effect of G × E predominated throughout most of lactation. Particularly during the first 2 months of lactation, G × E caused reranking of breeding values. It is therefore important to include the environmental sensitivity of animals according to the phase of lactation in the genetic evaluations of Holstein cattle in tropical environments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The mechanisms responsible for containing activity in systems represented by networks are crucial in various phenomena, for example, in diseases such as epilepsy that affect the neuronal networks and for information dissemination in social networks. The first models to account for contained activity included triggering and inhibition processes, but they cannot be applied to social networks where inhibition is clearly absent. A recent model showed that contained activity can be achieved with no need of inhibition processes provided that the network is subdivided into modules (communities). In this paper, we introduce a new concept inspired in the Hebbian theory, through which containment of activity is achieved by incorporating a dynamics based on a decaying activity in a random walk mechanism preferential to the node activity. Upon selecting the decay coefficient within a proper range, we observed sustained activity in all the networks tested, namely, random, Barabasi-Albert and geographical networks. The generality of this finding was confirmed by showing that modularity is no longer needed if the dynamics based on the integrate-and-fire dynamics incorporated the decay factor. Taken together, these results provide a proof of principle that persistent, restrained network activation might occur in the absence of any particular topological structure. This may be the reason why neuronal activity does not spread out to the entire neuronal network, even when no special topological organization exists.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a random intercept Poisson model in which the random effect is assumed to follow a generalized log-gamma (GLG) distribution. This random effect accommodates (or captures) the overdispersion in the counts and induces within-cluster correlation. We derive the first two moments for the marginal distribution as well as the intraclass correlation. Even though numerical integration methods are, in general, required for deriving the marginal models, we obtain the multivariate negative binomial model from a particular parameter setting of the hierarchical model. An iterative process is derived for obtaining the maximum likelihood estimates for the parameters in the multivariate negative binomial model. Residual analysis is proposed and two applications with real data are given for illustration. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multivariate analyses of UV-Vis spectral data from cachaca wood extracts provide a simple and robust model to classify aged Brazilian cachacas according to the wood species used in the maturation barrels. The model is based on inspection of 93 extracts of oak and different Brazilian wood species by a non-aged cachaca used as an extraction solvent. Application of PCA (Principal Components Analysis) and HCA (Hierarchical Cluster Analysis) leads to identification of 6 clusters of cachaca wood extracts (amburana, amendoim, balsamo, castanheira, jatoba, and oak). LDA (Linear Discriminant Analysis) affords classification of 10 different wood species used in the cachaca extracts (amburana, amendoim, balsamo, cabreuva-parda, canela-sassafras, castanheira, jatoba, jequitiba-rosa, louro-canela, and oak) with an accuracy ranging from 80% (amendoim and castanheira) to 100% (balsamo and jequitiba-rosa). The methodology provides a low-cost alternative to methods based on liquid chromatography and mass spectrometry to classify cachacas aged in barrels that are composed of different wood species.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Galaxy clusters occupy a special position in the cosmic hierarchy as they are the largest bound structures in the Universe. There is now general agreement on a hierarchical picture for the formation of cosmic structures, in which galaxy clusters are supposed to form by accretion of matter and merging between smaller units. During merger events, shocks are driven by the gravity of the dark matter in the diffuse barionic component, which is heated up to the observed temperature. Radio and hard-X ray observations have discovered non-thermal components mixed with the thermal Intra Cluster Medium (ICM) and this is of great importance as it calls for a “revision” of the physics of the ICM. The bulk of present information comes from the radio observations which discovered an increasing number of Mpcsized emissions from the ICM, Radio Halos (at the cluster center) and Radio Relics (at the cluster periphery). These sources are due to synchrotron emission from ultra relativistic electrons diffusing through µG turbulent magnetic fields. Radio Halos are the most spectacular evidence of non-thermal components in the ICM and understanding the origin and evolution of these sources represents one of the most challenging goal of the theory of the ICM. Cluster mergers are the most energetic events in the Universe and a fraction of the energy dissipated during these mergers could be channelled into the amplification of the magnetic fields and into the acceleration of high energy particles via shocks and turbulence driven by these mergers. Present observations of Radio Halos (and possibly of hard X-rays) can be best interpreted in terms of the reacceleration scenario in which MHD turbulence injected during these cluster mergers re-accelerates high energy particles in the ICM. The physics involved in this scenario is very complex and model details are difficult to test, however this model clearly predicts some simple properties of Radio Halos (and resulting IC emission in the hard X-ray band) which are almost independent of the details of the adopted physics. In particular in the re-acceleration scenario MHD turbulence is injected and dissipated during cluster mergers and thus Radio Halos (and also the resulting hard X-ray IC emission) should be transient phenomena (with a typical lifetime <» 1 Gyr) associated with dynamically disturbed clusters. The physics of the re-acceleration scenario should produce an unavoidable cut-off in the spectrum of the re-accelerated electrons, which is due to the balance between turbulent acceleration and radiative losses. The energy at which this cut-off occurs, and thus the maximum frequency at which synchrotron radiation is produced, depends essentially on the efficiency of the acceleration mechanism so that observations at high frequencies are expected to catch only the most efficient phenomena while, in principle, low frequency radio surveys may found these phenomena much common in the Universe. These basic properties should leave an important imprint in the statistical properties of Radio Halos (and of non-thermal phenomena in general) which, however, have not been addressed yet by present modellings. The main focus of this PhD thesis is to calculate, for the first time, the expected statistics of Radio Halos in the context of the re-acceleration scenario. In particular, we shall address the following main questions: • Is it possible to model “self-consistently” the evolution of these sources together with that of the parent clusters? • How the occurrence of Radio Halos is expected to change with cluster mass and to evolve with redshift? How the efficiency to catch Radio Halos in galaxy clusters changes with the observing radio frequency? • How many Radio Halos are expected to form in the Universe? At which redshift is expected the bulk of these sources? • Is it possible to reproduce in the re-acceleration scenario the observed occurrence and number of Radio Halos in the Universe and the observed correlations between thermal and non-thermal properties of galaxy clusters? • Is it possible to constrain the magnetic field intensity and profile in galaxy clusters and the energetic of turbulence in the ICM from the comparison between model expectations and observations? Several astrophysical ingredients are necessary to model the evolution and statistical properties of Radio Halos in the context of re-acceleration model and to address the points given above. For these reason we deserve some space in this PhD thesis to review the important aspects of the physics of the ICM which are of interest to catch our goals. In Chapt. 1 we discuss the physics of galaxy clusters, and in particular, the clusters formation process; in Chapt. 2 we review the main observational properties of non-thermal components in the ICM; and in Chapt. 3 we focus on the physics of magnetic field and of particle acceleration in galaxy clusters. As a relevant application, the theory of Alfv´enic particle acceleration is applied in Chapt. 4 where we report the most important results from calculations we have done in the framework of the re-acceleration scenario. In this Chapter we show that a fraction of the energy of fluid turbulence driven in the ICM by the cluster mergers can be channelled into the injection of Alfv´en waves at small scales and that these waves can efficiently re-accelerate particles and trigger Radio Halos and hard X-ray emission. The main part of this PhD work, the calculation of the statistical properties of Radio Halos and non-thermal phenomena as expected in the context of the re-acceleration model and their comparison with observations, is presented in Chapts.5, 6, 7 and 8. In Chapt.5 we present a first approach to semi-analytical calculations of statistical properties of giant Radio Halos. The main goal of this Chapter is to model cluster formation, the injection of turbulence in the ICM and the resulting particle acceleration process. We adopt the semi–analytic extended Press & Schechter (PS) theory to follow the formation of a large synthetic population of galaxy clusters and assume that during a merger a fraction of the PdV work done by the infalling subclusters in passing through the most massive one is injected in the form of magnetosonic waves. Then the processes of stochastic acceleration of the relativistic electrons by these waves and the properties of the ensuing synchrotron (Radio Halos) and inverse Compton (IC, hard X-ray) emission of merging clusters are computed under the assumption of a constant rms average magnetic field strength in emitting volume. The main finding of these calculations is that giant Radio Halos are naturally expected only in the more massive clusters, and that the expected fraction of clusters with Radio Halos is consistent with the observed one. In Chapt. 6 we extend the previous calculations by including a scaling of the magnetic field strength with cluster mass. The inclusion of this scaling allows us to derive the expected correlations between the synchrotron radio power of Radio Halos and the X-ray properties (T, LX) and mass of the hosting clusters. For the first time, we show that these correlations, calculated in the context of the re-acceleration model, are consistent with the observed ones for typical µG strengths of the average B intensity in massive clusters. The calculations presented in this Chapter allow us to derive the evolution of the probability to form Radio Halos as a function of the cluster mass and redshift. The most relevant finding presented in this Chapter is that the luminosity functions of giant Radio Halos at 1.4 GHz are expected to peak around a radio power » 1024 W/Hz and to flatten (or cut-off) at lower radio powers because of the decrease of the electron re-acceleration efficiency in smaller galaxy clusters. In Chapt. 6 we also derive the expected number counts of Radio Halos and compare them with available observations: we claim that » 100 Radio Halos in the Universe can be observed at 1.4 GHz with deep surveys, while more than 1000 Radio Halos are expected to be discovered in the next future by LOFAR at 150 MHz. This is the first (and so far unique) model expectation for the number counts of Radio Halos at lower frequency and allows to design future radio surveys. Based on the results of Chapt. 6, in Chapt.7 we present a work in progress on a “revision” of the occurrence of Radio Halos. We combine past results from the NVSS radio survey (z » 0.05 − 0.2) with our ongoing GMRT Radio Halos Pointed Observations of 50 X-ray luminous galaxy clusters (at z » 0.2−0.4) and discuss the possibility to test our model expectations with the number counts of Radio Halos at z » 0.05 − 0.4. The most relevant limitation in the calculations presented in Chapt. 5 and 6 is the assumption of an “averaged” size of Radio Halos independently of their radio luminosity and of the mass of the parent clusters. This assumption cannot be released in the context of the PS formalism used to describe the formation process of clusters, while a more detailed analysis of the physics of cluster mergers and of the injection process of turbulence in the ICM would require an approach based on numerical (possible MHD) simulations of a very large volume of the Universe which is however well beyond the aim of this PhD thesis. On the other hand, in Chapt.8 we report our discovery of novel correlations between the size (RH) of Radio Halos and their radio power and between RH and the cluster mass within the Radio Halo region, MH. In particular this last “geometrical” MH − RH correlation allows us to “observationally” overcome the limitation of the “average” size of Radio Halos. Thus in this Chapter, by making use of this “geometrical” correlation and of a simplified form of the re-acceleration model based on the results of Chapt. 5 and 6 we are able to discuss expected correlations between the synchrotron power and the thermal cluster quantities relative to the radio emitting region. This is a new powerful tool of investigation and we show that all the observed correlations (PR − RH, PR − MH, PR − T, PR − LX, . . . ) now become well understood in the context of the re-acceleration model. In addition, we find that observationally the size of Radio Halos scales non-linearly with the virial radius of the parent cluster, and this immediately means that the fraction of the cluster volume which is radio emitting increases with cluster mass and thus that the non-thermal component in clusters is not self-similar.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The intensity of regional specialization in specific activities, and conversely, the level of industrial concentration in specific locations, has been used as a complementary evidence for the existence and significance of externalities. Additionally, economists have mainly focused the debate on disentangling the sources of specialization and concentration processes according to three vectors: natural advantages, internal, and external scale economies. The arbitrariness of partitions plays a key role in capturing these effects, while the selection of the partition would have to reflect the actual characteristics of the economy. Thus, the identification of spatial boundaries to measure specialization becomes critical, since most likely the model will be adapted to different scales of distance, and be influenced by different types of externalities or economies of agglomeration, which are based on the mechanisms of interaction with particular requirements of spatial proximity. This work is based on the analysis of the spatial aspect of economic specialization supported by the manufacturing industry case. The main objective is to propose, for discrete and continuous space: i) a measure of global specialization; ii) a local disaggregation of the global measure; and iii) a spatial clustering method for the identification of specialized agglomerations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This PhD Thesis is part of a long-term wide research project, carried out by the "Osservatorio Astronomico di Bologna (INAF-OABO)", that has as primary goal the comprehension and reconstruction of formation mechanism of galaxies and their evolution history. There is now substantial evidence, both from theoretical and observational point of view, in favor of the hypothesis that the halo of our Galaxy has been at least partially, built up by the progressive accretion of small fragments, similar in nature to the present day dwarf galaxies of the Local Group. In this context, the photometric and spectroscopic study of systems which populate the halo of our Galaxy (i.e. dwarf spheroidal galaxy, tidal streams, massive globular cluster, etc) permits to discover, not only the origin and behaviour of these systems, but also the structure of our Galactic halo, combined with its formation history. In fact, the study of the population of these objects and also of their chemical compositions, age, metallicities and velocity dispersion, permit us not only an improvement in the understanding of the mechanisms that govern the Galactic formation, but also a valid indirect test for cosmological model itself. Specifically, in this Thesis we provided a complete characterization of the tidal Stream of the Sagittarius dwarf spheroidal galaxy, that is the most striking example of the process of tidal disruption and accretion of a dwarf satellite in to our Galaxy. Using Red Clump stars, extracted from the catalogue of the Sloan Digital Sky Survey (SDSS) we obtained an estimate of the distance, the depth along the line of sight and of the number density for each detected portion of the Stream (and more in general for each detected structure along our line of sight). Moreover comparing the relative number (i.e. the ratio) of Blue Horizontal Branch stars and Red Clump stars (the two features are tracers of different age/different metallicity populations) in the main body of the galaxy and in the Stream, in order to verify the presence of an age-metallicity gradient along the Stream. We also report the detection of a population of Red Clump stars probably associated with the recently discovered Bootes III stellar system. Finally, we also present the results of a survey of radial velocities over a wide region, extending from r ~ 10' out to r ~ 80' within the massive star cluster Omega Centauri. The survey was performed with FLAMES@VLT, to study the velocity dispersion profile in the outer regions of this stellar system. All the results presented in this Thesis, have already been published in refeered journals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A central design challenge facing network planners is how to select a cost-effective network configuration that can provide uninterrupted service despite edge failures. In this paper, we study the Survivable Network Design (SND) problem, a core model underlying the design of such resilient networks that incorporates complex cost and connectivity trade-offs. Given an undirected graph with specified edge costs and (integer) connectivity requirements between pairs of nodes, the SND problem seeks the minimum cost set of edges that interconnects each node pair with at least as many edge-disjoint paths as the connectivity requirement of the nodes. We develop a hierarchical approach for solving the problem that integrates ideas from decomposition, tabu search, randomization, and optimization. The approach decomposes the SND problem into two subproblems, Backbone design and Access design, and uses an iterative multi-stage method for solving the SND problem in a hierarchical fashion. Since both subproblems are NP-hard, we develop effective optimization-based tabu search strategies that balance intensification and diversification to identify near-optimal solutions. To initiate this method, we develop two heuristic procedures that can yield good starting points. We test the combined approach on large-scale SND instances, and empirically assess the quality of the solutions vis-à-vis optimal values or lower bounds. On average, our hierarchical solution approach generates solutions within 2.7% of optimality even for very large problems (that cannot be solved using exact methods), and our results demonstrate that the performance of the method is robust for a variety of problems with different size and connectivity characteristics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The group analysed some syntactic and phonological phenomena that presuppose the existence of interrelated components within the lexicon, which motivate the assumption that there are some sublexicons within the global lexicon of a speaker. This result is confirmed by experimental findings in neurolinguistics. Hungarian speaking agrammatic aphasics were tested in several ways, the results showing that the sublexicon of closed-class lexical items provides a highly automated complex device for processing surface sentence structure. Analysing Hungarian ellipsis data from a semantic-syntactic aspect, the group established that the lexicon is best conceived of being as split into at least two main sublexicons: the store of semantic-syntactic feature bundles and a separate store of sound forms. On this basis they proposed a format for representing open-class lexical items whose meanings are connected via certain semantic relations. They also proposed a new classification of verbs to account for the contribution of the aspectual reading of the sentence depending on the referential type of the argument, and a new account of the syntactic and semantic behaviour of aspectual prefixes. The partitioned sets of lexical items are sublexicons on phonological grounds. These sublexicons differ in terms of phonotactic grammaticality. The degrees of phonotactic grammaticality are tied up with the problem of psychological reality, of how many degrees of this native speakers are sensitive to. The group developed a hierarchical construction network as an extension of the original General Inheritance Network formalism and this framework was then used as a platform for the implementation of the grammar fragments.