26 resultados para Evidence accumulation clustering
em Reposit
Resumo:
The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.
Resumo:
Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully exploits the nature of the co-association matrix. Our solution determines probabilistic assignments of data points to clusters by minimizing a Bregman divergence between the observed co-association frequencies and the corresponding co-occurrence probabilities expressed as functions of the unknown assignments. We additionally propose an optimization algorithm to find a solution under any double-convex Bregman divergence. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
Resumo:
Biosignals analysis has become widespread, upstaging their typical use in clinical settings. Electrocardiography (ECG) plays a central role in patient monitoring as a diagnosis tool in today's medicine and as an emerging biometric trait. In this paper we adopt a consensus clustering approach for the unsupervised analysis of an ECG-based biometric records. This type of analysis highlights natural groups within the population under investigation, which can be correlated with ground truth information in order to gain more insights about the data. Preliminary results are promising, for meaningful clusters are extracted from the population under analysis. © 2014 EURASIP.
Resumo:
The 27 December 1722 Algarve earthquake destroyed a large area in southern Portugal generating a local tsunami that inundated the shallow areas of Tavira. It is unclear whether its source was located onshore or offshore and, in any case, what was the tectonic source responsible for the event. We analyze available historical information concerning macroseismicity and the tsunami to discuss the most probable location of the source. We also review available seismotectonic knowledge of the offshore region close to the probable epicenter, selecting a set of four candidate sources. We simulate tsunamis produced by these candidate sources assuming that the sea bottom displacement is caused by a compressive dislocation over a rectangular fault, as given by the half-space homogeneous elastic approach, and we use numerical modeling to study wave propagation and run-up. We conclude that the 27 December 1722 Tavira earthquake and tsunami was probably generated offshore, close to 37 degrees 01'N, 7 degrees 49'W.
Resumo:
This paper studies the evolution of the default risk premia for European firms during the years surrounding the recent credit crisis. We employ the information embedded in Credit Default Swaps (CDS) and Moody’s KMV EDF default probabilities to analyze the common factors driving this risk premia. The risk premium is characterized in several directions: Firstly, we perform a panel data analysis to capture the relationship between CDS spreads and actual default probabilities. Secondly, we employ the intensity framework of Jarrow et al. (2005) in order to measure the theoretical effect of risk premium on expected bond returns. Thirdly, we carry out a dynamic panel data to identify the macroeconomic sources of risk premium. Finally, a vector autoregressive model analyzes which proportion of the co-movement is attributable to financial or macro variables. Our estimations report coefficients for risk premium substantially higher than previously referred for US firms and a time varying behavior. A dominant factor explains around 60% of the common movements in risk premia. Additionally, empirical evidence suggests a public-to-private risk transfer between the sovereign CDS spreads and corporate risk premia.
Resumo:
We investigate shareholder value creation of Spanish listed firms in response to announcements of acquisitions of unlisted companies and compare this experience to the purchase of listed firms over the period 1991–2006. Similar to foreign markets, acquirers of listed targets earn insignificant average abnormal returns, whereas acquirers of unlisted targets gain significant positive average abnormal returns. When we relate these results to company and transaction characteristics our findings diverge from those reported in the literature for other foreign markets, as our evidence suggests that the listing status effect is mainly associated with the fact that unlisted firms tend to be smaller and lesser–known firms, and thus suffer from a lack of competition in the market for corporate control. Consequently, the payment of lower premiums and the possibility of diversifying shareholders’ portfolios lead to unlisted firm acquisitions being viewed as value–orientated transactions.
Resumo:
According to the stock market efficiency theory, it is not possible to consistently beat the market. However, technical analysis is more and more spread as an efficient way to achieve abnormal returns. In fact there is evidence that momentum investing strategies provide abnormal returns in different stock markets, Jegadeesh, N. and Titman, S. (1993), George, T. and Hwang, C. (2004) and Du, D. (2009). In this work we study if like other markets, the Portuguese stock market also allows to obtain abnormal returns, using a strategy that consists in picking stocks according to their past performance. Our work confirms the results of Soares, J. and Serra, A. (2005) and Pereira, P. (2009), showing that an investor can get abnormal returns investing in momentum portfolios. The Portuguese stock market evidences momentum returns in short term, exhibiting reversal in long term.
Resumo:
Following the theoretical model of Merton (1987), we provide a new perspective of study about the role of idiosyncratic risk in the asset pricing process. More precisely, we analyze whether the idiosyncratic risk premium depends on the idiosyncratic risk level of an asset as well as the vatriation in the market-wide measure of idiosyncratic risk. As expected, we obtain a net positive risk premium for the Spanish stock market over the period 1987-2007. Our results show a positive relation between returns and individual indiosyncratic risk levels and a negative but lower relation with the aggregate measure of idiosyncratic risk. These findings have important implications for portfolio and risk management and contribute to provide a unified and coherent answer for the main and still unsolved question about the idiosyncratic risk puzzle: whether or not there exists a premium associated to this kind of risk and the sign for this risk premium.
Resumo:
We present a palaeomagnetic study on 38 lava flows and 20 dykes encompassing the past 1.3 Myr on S. Jorge Island (Azores ArchipelagoNorth Atlantic Ocean). The sections sampled in the southeastern and central/western parts of the island record reversed and normal polarities, respectively. They indicate a mean palaeomagnetic pole (81.3 degrees N, 160.7 degrees E, K= 33 and A95= 3.4 degrees) with a latitude shallower than that expected from Geocentric Axial Dipole assumption, suggesting an effect of non-dipolar components of the Earth magnetic field. Virtual Geomagnetic Poles of eight flows and two dykes closely follow the contemporaneous records of the Cobb Mountain Subchron (ODP/DSDP programs) and constrain the age transition from reversed to normal polarity at ca. 1.207 +/- 0.017 Ma. Volcano flank instabilities, probably related to dyke emplacement along an NNWSSE direction, led to southwestward tilting of the lava pile towards the sea. Two spatially and temporally distinct dyke systems have been recognized on the island. The eastern is dominated by NNWSSE trending dykes emplaced before the end of the Matuyama Chron, whereas in the central/western parts the eruptive fissures oriented WNWESE controlled the westward growth of the S. Jorge Island during the Brunhes Chron. Both directions are consistent with the present-day regional stress conditions deduced from plate kinematics and tectonomorphology and suggest the emplacement of dykes along pre-existing fractures. The distinct timing and location of each dyke system likely results from a slight shift of the magmatic source.
Resumo:
Hyperhomocysteinemia (HHcy) is a risk factor for vascular disease, but the underlying mechanisms remain incompletely defined. Reduced bioavailability of nitric oxide (NO) is a principal manifestation of underlying endothelial dysfunction, which is an initial event in vascular disease. Inhibition of cellular methylation reactions by S-adenosylhomocysteine (AdoHcy), which accumulates during HHcy, has been suggested to contribute to vascular dysfunction. However, thus far, the effect of intracellular AdoHcy accumulation on NO bioavailability has not yet been fully substantiated by experimental evidence. The present study was carried out to evaluate whether disturbances in cellular methylation status affect NO production by cultured human endothelial cells. Here, we show that a hypomethylating environment, induced by the accumulation of AdoHcy, impairs NO production. Consistent with this finding, we observed decreased eNOS expression and activity, but, by contrast, enhanced NOS3 transcription. Taken together, our data support the existence of regulatory post-transcriptional mechanisms modulated by cellular methylation potential leading to impaired NO production by cultured human endothelial cells. As such, our conclusions may have implications for the HHcy-mediated reductions in NO bioavailability and endothelial dysfunction.
Resumo:
Background - The rate and fitness effects of mutations are key in understanding the evolution of every species. Traditionally, these parameters are estimated in mutation accumulation experiments where replicate lines are propagated in conditions that allow mutations to randomly accumulate without the purging effect of natural selection. These experiments have been performed with many model organisms but we still lack empirical estimates of the rate and effects of mutation in the protists. Results - We performed a mutation accumulation (MA) experiment in Tetrahymena thermophila, a species that can reproduce sexually and asexually in nature, and measured both the mean decline and variance increase in fitness of 20 lines. The results obtained with T. thermophila were compared with T. pyriformis that is an obligate asexual species. We show that MA lines of T. thermophila go to extinction at a rate of 1.25 clonal extinctions per bottleneck. In contrast, populations of T. pyriformis show a much higher resistance to extinction. Variation in gene copy number is likely to be a key factor in explaining these results, and indeed we show that T. pyriformis has a higher mean copy number per cell than T. thermophila. From fitness measurements during the MA experiment, we infer a rate of mutation to copy number variation of 0.0333 per haploid MAC genome of T. thermophila and a mean effect against copy number variation of 0.16. A strong effect of population size in the rate of fitness decline was also found, consistent with the increased power of natural selection. Conclusions - The rate of clonal extinction measured for T. thermophila is characteristic of a mutational degradation and suggests that this species must undergo sexual reproduction to avoid the deleterious effects detected in the laboratory experiments. We also suggest that an increase in chromosomal copy number associated with the phenotypic assortment of amitotic divisions can provide an alternative mechanism to escape the deleterious effect of random chromosomal copy number variation in species like T. pyriformis that lack the resetting mechanism of sexual reproduction. Our results are relevant to the understanding of cell line longevity and senescence in ciliates.
Resumo:
Clustering analysis is a useful tool to detect and monitor disease patterns and, consequently, to contribute for an effective population disease management. Portugal has the highest incidence of tuberculosis in the European Union (in 2012, 21.6 cases per 100.000 inhabitants), although it has been decreasing consistently. Two critical PTB (Pulmonary Tuberculosis) areas, metropolitan Oporto and metropolitan Lisbon regions, were previously identified through spatial and space-time clustering for PTB incidence rate and risk factors. Identifying clusters of temporal trends can further elucidate policy makers about municipalities showing a faster or a slower TB control improvement.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.