16 resultados para GALAXIES, CLUSTERING
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
We present a photometric catalogue of compact groups of galaxies (p2MCGs) automatically extracted from the Two-Micron All Sky Survey (2MASS) extended source catalogue. A total of 262 p2MCGs are identified, following the criteria defined by Hickson, of which 230 survive visual inspection (given occasional galaxy fragmentation and blends in the 2MASS parent catalogue). Only one quarter of these 230 groups were previously known compact groups (CGs). Among the 144 p2MCGs that have all their galaxies with known redshifts, 85 (59?per cent) have four or more accordant galaxies. This v2MCG sample of velocity-filtered p2MCGs constitutes the largest sample of CGs (with N = 4) catalogued to date, with both well-defined selection criteria and velocity filtering, and is the first CG sample selected by stellar mass. It is fairly complete up to Kgroup similar to 9 and radial velocity of similar to 6000?km?s-1. We compared the properties of the 78 v2MCGs with median velocities greater than 3000?km?s-1 with the properties of other CG samples, as well as those (mvCGs) extracted from the semi-analytical model (SAM) of Guo et al. run on the high-resolution Millennium-II simulation. This mvCG sample is similar (i.e. with 2/3 of physically dense CGs) to those we had previously extracted on three other SAMs run on the Millennium simulation with 125 times worse spatial and mass resolutions. The space density of v2MCGs within 6000?km?s-1 is 8.0 X 10-5?h3?Mpc-3, i.e. four times that of the Hickson sample [Hickson Compact Group (HCG)] up to the same distance and with the same criteria used in this work, but still 40?per cent less than that of mvCGs. The v2MCG constitutes the first group catalogue to show a statistically large firstsecond ranked galaxy magnitude gap according to TremaineRichstone statistics, as expected if the first ranked group members tend to be the products of galaxy mergers, and as confirmed in the mvCGs. The v2MCG is also the first observed sample to show that first-ranked galaxies tend to be centrally located, again consistent with the predictions obtained from mvCGs. We found no significant correlation of group apparent elongation and velocity dispersion in the quartets among the v2MCGs, and the velocity dispersions of apparently round quartets are not significantly larger than those of chain-like ones, in contrast to what has been previously reported in HCGs. By virtue of its automatic selection with the popular Hickson criteria, its size, its selection on stellar mass, and its statistical signs of mergers and centrally located brightest galaxies, the v2MCG catalogue appears to be the laboratory of choice to study physically dense groups of four or more galaxies of comparable luminosity.
Resumo:
In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity
Resumo:
Observations of cosmic rays arrival directions made with the Pierre Auger Observatory have previously provided evidence of anisotropy at the 99% CL using the correlation of ultra high energy cosmic rays (UHECRs) with objects drawn from the Veron-Cetty Veron catalog. In this paper we report on the use of three catalog independent methods to search for anisotropy. The 2pt-L, 2pt+ and 3pt methods, each giving a different measure of self-clustering in arrival directions, were tested on mock cosmic ray data sets to study the impacts of sample size and magnetic smearing on their results, accounting for both angular and energy resolutions. If the sources of UHECRs follow the same large scale structure as ordinary galaxies in the local Universe and if UHECRs are deflected no more than a few degrees, a study of mock maps suggests that these three method can efficiently respond to the resulting anisotropy with a P-value = 1.0% or smaller with data sets as few as 100 events. using data taken from January 1, 2004 to July 31, 2010 we examined the 20, 30, ... , 110 highest energy events with a corresponding minimum energy threshold of about 49.3 EeV. The minimum P-values found were 13.5% using the 2pt-L method, 1.0% using the 2pt+ method and 1.1% using the 3pt method for the highest 100 energy events. In view of the multiple (correlated) scans performed on the data set, these catalog-independent methods do not yield strong evidence of anisotropy in the highest energy cosmic rays.
Resumo:
There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.
Resumo:
This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Background: The development of sugarcane as a sustainable crop has unlimited applications. The crop is one of the most economically viable for renewable energy production, and CO2 balance. Linkage maps are valuable tools for understanding genetic and genomic organization, particularly in sugarcane due to its complex polyploid genome of multispecific origins. The overall objective of our study was to construct a novel sugarcane linkage map, compiling AFLP and EST-SSR markers, and to generate data on the distribution of markers anchored to sequences of scIvana_1, a complete sugarcane transposable element, and member of the Copia superfamily. Results: The mapping population parents ('IAC66-6' and 'TUC71-7') contributed equally to polymorphisms, independent of marker type, and generated markers that were distributed into nearly the same number of co-segregation groups (or CGs). Bi-parentally inherited alleles provided the integration of 19 CGs. The marker number per CG ranged from two to 39. The total map length was 4,843.19 cM, with a marker density of 8.87 cM. Markers were assembled into 92 CGs that ranged in length from 1.14 to 404.72 cM, with an estimated average length of 52.64 cM. The greatest distance between two adjacent markers was 48.25 cM. The scIvana_1-based markers (56) were positioned on 21 CGs, but were not regularly distributed. Interestingly, the distance between adjacent scIvana_1-based markers was less than 5 cM, and was observed on five CGs, suggesting a clustered organization. Conclusions: Results indicated the use of a NBS-profiling technique was efficient to develop retrotransposon-based markers in sugarcane. The simultaneous maximum-likelihood estimates of linkage and linkage phase based strategies confirmed the suitability of its approach to estimate linkage, and construct the linkage map. Interestingly, using our genetic data it was possible to calculate the number of retrotransposonscIvana_1 (similar to 60) copies in the sugarcane genome, confirming previously reported molecular results. In addition, this research possibly will have indirect implications in crop economics e. g., productivity enhancement via QTL studies, as the mapping population parents differ in response to an important fungal disease.
Resumo:
Multicentric carpotarsal osteolysis (MCTO) is a rare skeletal dysplasia characterized by aggressive osteolysis, particularly affecting the carpal and tarsal bones, and is frequently associated with progressive renal failure. Using exome capture and next-generation sequencing in five unrelated simplex cases of MCTO, we identified previously unreported missense mutations clustering within a 51 base pair region of the single exon of MAFB, validated by Sanger sequencing. A further six unrelated simplex cases with MCTO were also heterozygous for previously unreported mutations within this same region, as were affected members of two families with autosomal-dominant MCTO. MAFB encodes a transcription factor that negatively regulates RANKL-induced osteoclastogenesis and is essential for normal renal development. Identification of this gene paves the way for development of novel therapeutic approaches for this crippling disease and provides insight into normal bone and kidney development.
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
We present a new method to quantify substructures in clusters of galaxies, based on the analysis of the intensity of structures. This analysis is done in a residual image that is the result of the subtraction of a surface brightness model, obtained by fitting a two-dimensional analytical model (beta-model or Sersic profile) with elliptical symmetry, from the X-ray image. Our method is applied to 34 clusters observed by the Chandra Space Telescope that are in the redshift range z is an element of [0.02, 0.2] and have a signal-to-noise ratio (S/N) greater than 100. We present the calibration of the method and the relations between the substructure level with physical quantities, such as the mass, X-ray luminosity, temperature, and cluster redshift. We use our method to separate the clusters in two sub-samples of high-and low-substructure levels. We conclude, using Monte Carlo simulations, that the method recuperates very well the true amount of substructure for small angular core radii clusters (with respect to the whole image size) and good S/N observations. We find no evidence of correlation between the substructure level and physical properties of the clusters such as gas temperature, X-ray luminosity, and redshift; however, analysis suggest a trend between the substructure level and cluster mass. The scaling relations for the two sub-samples (high-and low-substructure level clusters) are different (they present an offset, i. e., given a fixed mass or temperature, low-substructure clusters tend to be more X-ray luminous), which is an important result for cosmological tests using the mass-luminosity relation to obtain the cluster mass function, since they rely on the assumption that clusters do not present different scaling relations according to their dynamical state.
Resumo:
We revisit the issue of the constancy of the dark matter (DM) and baryonic Newtonian acceleration scales within the DM scale radius by considering a large sample of late-type galaxies. We rely on a Markov Chain Monte Carlo method to estimate the parameters of the halo model and the stellar mass-to-light ratio and then propagate the uncertainties from the rotation curve data to the estimate of the acceleration scales. This procedure allows us to compile a catalogue of 58 objects with estimated values of the B-band absolute magnitude M-B, the virial mass M-vir, and the DM and baryonic Newtonian accelerations (denoted as g(DM)(r(0)) and g(bar)(r(0)), respectively) within the scale radius r(0) which we use to investigate whether it is possible to define a universal acceleration scale. We find a weak but statistically meaningful correlation with M-vir thus making us argue against the universality of the acceleration scales. However, the results somewhat depend on the sample adopted so that a careful analysis of selection effects should be carried out before any definitive conclusion can be drawn.
Resumo:
VISTA Variables in the Via Lactea (VVV) is an ESO variability survey that is performing observations in near-infrared bands (ZY JHK(s)) toward the Galactic bulge and part of the disk with the completeness limits at least 3 mag deeper than Two Micron All Sky Survey. In the present work, we searched in the VVV survey data for background galaxies near the Galactic plane using ZY JHK(s) photometry that covers 1.636 deg(2). We identified 204 new galaxy candidates by analyzing colors, sizes, and visual inspection of multi-band (ZY JHK(s)) images. The galaxy candidate colors were also compared with the predicted ones by star count models considering a more realistic extinction model at the same completeness limits observed by VVV. A comparison of the galaxy candidates with the expected one by Millennium simulations is also presented. Our results increase the number density of known galaxies behind the Milky Way by more than one order of magnitude. A catalog with galaxy properties including ellipticity, Petrosian radii, and ZY JHK(s) magnitudes is provided, as well as comparisons of the results with other surveys of galaxies toward the Galactic plane.
Resumo:
Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.
Resumo:
This paper addresses the m-machine no-wait flow shop problem where the set-up time of a job is separated from its processing time. The performance measure considered is the total flowtime. A new hybrid metaheuristic Genetic Algorithm-Cluster Search is proposed to solve the scheduling problem. The performance of the proposed method is evaluated and the results are compared with the best method reported in the literature. Experimental tests show superiority of the new method for the test problems set, regarding the solution quality. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.