863 resultados para Distance Matrix
Resumo:
Darwin's paradigm holds that the diversity of present-day organisms has arisen via a process of genetic descent with modification, as on a bifurcating tree. Evidence is accumulating that genes are sometimes transferred not along lineages but rather across lineages. To the extent that this is so, Darwin's paradigm can apply only imperfectly to genomes, potentially complicating or perhaps undermining attempts to reconstruct historical relationships among genomes (i.e., a genome tree). Whether most genes in a genome have arisen via treelike (vertical) descent or by lateral transfer across lineages can be tested if enough complete genome sequences are used. We define a phylogenetically discordant sequence (PDS) as an open reading frame (ORF) that exhibits patterns of similarity relationships statistically distinguishable from those of most other ORFs in the same genome. PDSs represent between 6.0 and 16.8% (mean, 10.8%) of the analyzable ORFs in the genomes of 28 bacteria, eight archaea, and one eukaryote (Saccharomyces cerevisiae). In this study we developed and assessed a distance-based approach, based on mean pairwise sequence similarity, for generating genome trees. Exclusion of PDSs improved bootstrap support for basal nodes but altered few topological features, indicating that there is little systematic bias among PDSs. Many but not all features of the genome tree from which PDSs were excluded are consistent with the 16S rRNA tree.
Resumo:
As the volume of image data and the need of using it in various applications is growing significantly in the last days it brings a necessity of retrieval efficiency and effectiveness. Unfortunately, existing indexing methods are not applicable to a wide range of problem-oriented fields due to their operating time limitations and strong dependency on the traditional descriptors extracted from the image. To meet higher requirements, a novel distance-based indexing method for region-based image retrieval has been proposed and investigated. The method creates premises for considering embedded partitions of images to carry out the search with different refinement or roughening level and so to seek the image meaningful content.
Resumo:
Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp, 2016. © 2016 Wiley Periodicals, Inc.
Resumo:
The D-eigenvalues of a graph G are the eigenvalues of its distance matrix D, and the D-energy ED(G) is the sum of the absolute values of its D-eigenvalues. Two graphs are said to be D-equienergetic if they have the same D-energy. In this note we obtain bounds for the distance spectral radius and D-energy of graphs of diameter 2. Pairs of equiregular D-equienergetic graphs of diameter 2, on p = 3t + 1 vertices are also constructed.
Resumo:
The D-eigenvalues of a graph G are the eigenvalues of its distance matrix D, and the D-energy ED(G) is the sum of the absolute values of its D-eigenvalues. Two graphs are said to be D-equienergetic if they have the same D-energy. In this note we obtain bounds for the distance spectral radius and D-energy of graphs of diameter 2. Pairs of equiregular D-equienergetic graphs of diameter 2, on p = 3t + 1 vertices are also constructed.
Resumo:
Culex is the largest genus of Culicini and includes vectors of several arboviruses and filarial worms. Many species of Culex are morphologically similar, which makes their identification difficult, particularly when using female specimens. To aid evolutionary studies and species distinction, molecular techniques are often used. Sequences of the second internal transcribed spacer (ITS2) of ribosomal DNA (rDNA) from 16 species of the genus Culex and one of Lutzia were used to assess their genomic variability and to verify their applicability in the phylogenetic analysis of the group. The distance matrix (uncorrected p-distance) that was obtained revealed intragenomic and intraspecific variation. Because of the intragenomic variability, we selected ITS2 copies for use in distance analyses based on their secondary structures. Neighbour-joining topology was obtained with an uncorrected p-distance. Despite the heterogeneity observed, individuals of the same species were grouped together and correlated with the current, morphology-based classification, thereby showing that ITS2 is an appropriate marker to be used in the taxonomy of Culex.
Resumo:
Sequences of the cytochrome c oxidase subunit I (COI) mitochondrial gene from adults of 22 Culex ( Culex ) species from Argentina and Brazil were employed to assess species identification and to test the usefulness of COI for barcoding using the best close match (BCM) algorithm. A pairwise Kimura two-parameter distance matrix including the mean intra and interspecific distances for 71 COI barcode sequences was constructed. Of the 12 COI lineages recovered in the Neighbour-joining topology, five confirmed recognised morphological species ( Cx. acharistus , Cx. chidesteri , Cx. dolosus , Cx. lygrus and Cx. saltanensis ) with intraspecific divergences lower than 1.75%. Cx. bilineatus is formally resurrected from the synonymy of Cx. dolosus . Cx. maxi , Cx. surinamensis and the Coronator group species included were clustered into an unresolved lineage. The intraspecific distance of Cx. pipiens (3%) was almost twice the interspecific between it and Cx. quinquefasciatus (1.6%). Regarding the BCM criteria, the COI barcode successfully identified 69% of all species. The rest of the sequences, approximately 10%, 18% and 3%, remained as ambiguously, mis and unidentified, respectively. The COI barcode does not contain enough information to distinguish Culex ( Cux. ) species.
Resumo:
Subcompositional coherence is a fundamental property of Aitchison s approach to compositional data analysis, and is the principal justification for using ratios of components. We maintain, however, that lack of subcompositional coherence, that is incoherence, can be measured in an attempt to evaluate whether any given technique is close enough, for all practical purposes, to being subcompositionally coherent. This opens up the field to alternative methods, which might be better suited to cope with problems such as data zeros and outliers, while being only slightly incoherent. The measure that we propose is based on the distance measure between components. We show that the two-part subcompositions, which appear to be the most sensitive to subcompositional incoherence, can be used to establish a distance matrix which can be directly compared with the pairwise distances in the full composition. The closeness of these two matrices can be quantified using a stress measure that is common in multidimensional scaling, providing a measure of subcompositional incoherence. The approach is illustrated using power-transformed correspondence analysis, which has already been shown to converge to log-ratio analysis as the power transform tends to zero.
Resumo:
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.
Resumo:
Due to the low genetic variability reported in the commercial plantations of papaya (Carica papaya L.), the objective of this study was analyze the genetic diversity of 32 genotypes including cultivars, landraces, inbred lines, and improved germplasm using the AFLP technique (Amplified Fragment Length Polymorphism). The genetic distance matrix was obtained using the Nei and Li genetic distance and clustering was performed using the unweighted pair-method with arithmetic mean (UPGMA). Using 11 combinations of EcoRI/MseI primers, 383 polymorphic bands were obtained. On average, 34.8 polymorphic bands were obtained per primer combination. Five clusters were formed. The traditional cultivar 'Sunrise' and the inbred line CMF-L30-08 were the closest genotypes, and the improved germplasm (CMF041) and landrace (CMF233) the most distant. The main papaya cultivars commercially grown in Brazil, as well as four inbred lines and three improved germplasm, were clustered together, however, were not grouped in the same branch. The genetic distance between the Sunrise and Golden cultivars was 0.329, and even arising from mutation and selection within the Sunrise variety, the Golden stores considerable genetic variability. Additional variability was observed in the inbred lines derived from papaya breeding program at Embrapa Cassava and Fruits.
Resumo:
A continuous random variable is expanded as a sum of a sequence of uncorrelated random variables. These variables are principal dimensions in continuous scaling on a distance function, as an extension of classic scaling on a distance matrix. For a particular distance, these dimensions are principal components. Then some properties are studied and an inequality is obtained. Diagonal expansions are considered from the same continuous scaling point of view, by means of the chi-square distance. The geometric dimension of a bivariate distribution is defined and illustrated with copulas. It is shown that the dimension can have the power of continuum.
Resumo:
Euclidean distance matrix analysis (EDMA) methods are used to distinguish whether or not significant difference exists between conformational samples of antibody complementarity determining region (CDR) loops, isolated LI loop and LI in three-loop assembly (LI, L3 and H3) obtained from Monte Carlo simulation. After the significant difference is detected, the specific inter-Ca distance which contributes to the difference is identified using EDMA.The estimated and improved mean forms of the conformational samples of isolated LI loop and LI loop in three-loop assembly, CDR loops of antibody binding site, are described using EDMA and distance geometry (DGEOM). To the best of our knowledge, it is the first time the EDMA methods are used to analyze conformational samples of molecules obtained from Monte Carlo simulations. Therefore, validations of the EDMA methods using both positive control and negative control tests for the conformational samples of isolated LI loop and LI in three-loop assembly must be done. The EDMA-I bootstrap null hypothesis tests showed false positive results for the comparison of six samples of the isolated LI loop and true positive results for comparison of conformational samples of isolated LI loop and LI in three-loop assembly. The bootstrap confidence interval tests revealed true negative results for comparisons of six samples of the isolated LI loop, and false negative results for the conformational comparisons between isolated LI loop and LI in three-loop assembly. Different conformational sample sizes are further explored by combining the samples of isolated LI loop to increase the sample size, or by clustering the sample using self-organizing map (SOM) to narrow the conformational distribution of the samples being comparedmolecular conformations. However, there is no improvement made for both bootstrap null hypothesis and confidence interval tests. These results show that more work is required before EDMA methods can be used reliably as a method for comparison of samples obtained by Monte Carlo simulations.
Resumo:
The statement that pairs of individuals from different populations are often more genetically similar than pairs from the same population is a widespread idea inside and outside the scientific community. Witherspoon et al. [""Genetic similarities within and between human populations,"" Genetics 176:351-359 (2007)] proposed an index called the dissimilarity fraction (omega) to access in a quantitative way the validity of this statement for genetic systems. Witherspoon demonstrated that, as the number of loci increases, omega decreases to a point where, when enough sampling is available, the statement is false. In this study, we applied the dissimilarity fraction to Howells`s craniometric database to establish whether or not similar results are obtained for cranial morphological traits. Although in genetic studies thousands of loci are available, Howells`s database provides no more than 55 metric traits, making the contribution of each variable important. To cope with this limitation, we developed a routine that takes this effect into consideration when calculating. omega Contrary to what was observed for the genetic data, our results show that cranial morphology asymptotically approaches a mean omega of 0.3 and therefore supports the initial statement-that is, that individuals from the same geographic region do not form clear and discrete clusters-further questioning the idea of the existence of discrete biological clusters in the human species. Finally, by assuming that cranial morphology is under an additive polygenetic model, we can say that the population history signal of human craniometric traits presents the same resolution as a neutral genetic system dependent on no more than 20 loci.
Resumo:
The objective of this work was to study the effect of selective thinning on! the genetic divergence in progenies of Pinus caribaea var. bahamensis, aiming to identify the most productive and divergent progenies for the use of improvement program. The test of progenies containing 119 progenies and two commercial controls were planted in March 1990, using 11 x 11 square lattice design, sextuple, partially balanced, disposed in lineal plots with six trees in the spacing of 3,0 x 3,0m. 13 years after planting thinning was realized (selection for DBH), with 50% selection intensity based on Multi-effect index, leaving three trees per plot in all the experiment. The evaluations were done at four situations: A (before the thinning); B (thinned trees); C (remaining trees after thinning) and D (one year after thinning). The analyzed traits were: height, diameter at breast height (DBH), volume, form of stem and wood density. The genetic divergence among the progenies was studied with aid of the canonical variables and of clustering of Tocher method using the generalized distance matrix of Mahalanobis (D(2)) as estimate of the genetic similarity. The progenies were grouped in four groups in situation A, fourteen in the situation B, two in the situation C and three in the situation D. The selective thinning of the trees within of the progenies caused a change in the genetic divergence among the progenies, genetically homogenizing the progenies, as demonstrated by the generalized distances of Mahalanobis, clustering of Tocher' and canonical variables methods; The. thinning made possible a high uniformity in respect to the relative contribution, of the traits for the total genetic divergence. The techniques, of clustering were efficient to identify groups of divergent,progenies for the use hybridization and little divergent progenies for the use in backcross program.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)