264 resultados para compositional heterogeneity


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many techniques in information retrieval produce counts from a sample, and it is common to analyse these counts as proportions of the whole - term frequencies are a familiar example. Proportions carry only relative information and are not free to vary independently of one another: for the proportion of one term to increase, one or more others must decrease. These constraints are hallmarks of compositional data. While there has long been discussion in other fields of how such data should be analysed, to our knowledge, Compositional Data Analysis (CoDA) has not been considered in IR. In this work we explore compositional data in IR through the lens of distance measures, and demonstrate that common measures, naïve to compositions, have some undesirable properties which can be avoided with composition-aware measures. As a practical example, these measures are shown to improve clustering. Copyright 2014 ACM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work we discuss the development of a mathematical model to predict the shift in gas composition observed over time from a producing CSG (coal seam gas) well, and investigate the effect that physical properties of the coal seam have on gas production. A detailed (local) one-dimensional, two-scale mathematical model of a coal seam has been developed. The model describes the competitive adsorption and desorption of three gas species (CH4, CO2 and N2) within a microscopic, porous coal matrix structure. The (diffusive) flux of these gases between the coal matrices (microscale) and a cleat network (macroscale) is accounted for in the model. The cleat network is modelled as a one-dimensional, volume averaged, porous domain that extends radially from a central well. Diffusive and advective transport of the gases occurs within the cleat network, which also contains liquid water that can be advectively transported. The water and gas phases are assumed to be immiscible. The driving force for the advection in the gas and liquid phases is taken to be a pressure gradient with capillarity also accounted for. In addition, the relative permeabilities of the water and gas phases are considered as functions of the degree of water saturation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The native Asian oyster, Crassostrea ariakensis is one of the most common and important Crassostrea species that occur naturally along the coast of East Asia. Molecular species diagnosis is a prerequisite for population genetic analysis of wild oyster populations because oyster species cannot be discriminated reliably using external morphological characters alone due to character ambiguity. To date there have been few phylogeographic studies of natural edible oyster populations in East Asia, in particular this is true of the common species in Korea C. ariakensis. We therefore assessed the levels and patterns of molecular genetic variation in East Asian wild populations of C. ariakensis from Korea, Japan, and China using DNA sequence analysis of five concatenated mtDNA regions namely; 16S rRNA, cytochrome oxidase I, cytochrome oxidase II, cytochrome oxidase III, and cytochrome b. Two divergent C. ariakensis clades were identified between southern China and remaining sites from the northern region. In addition, hierarchical AMOVA and pairwise UST analyses showed that genetic diversity was discontinuous among wild populations of C. ariakensis in East Asia. Biogeographical and historical sea level changes are discussed as potential factors that may have influenced the genetic heterogeneity of wild C. ariakensis stocks across this region.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Impulse propagation in biological tissues is known to be modulated by structural heterogeneity. In cardiac muscle, improved understanding on how this heterogeneity influences electrical spread is key to advancing our interpretation of dispersion of repolarization. We propose fractional diffusion models as a novel mathematical description of structurally heterogeneous excitable media, as a means of representing the modulation of the total electric field by the secondary electrical sources associated with tissue inhomogeneities. Our results, analysed against in vivo human recordings and experimental data of different animal species, indicate that structural heterogeneity underlies relevant characteristics of cardiac electrical propagation at tissue level. These include conduction effects on action potential (AP) morphology, the shortening of AP duration along the activation pathway and the progressive modulation by premature beats of spatial patterns of dispersion of repolarization. The proposed approach may also have important implications in other research fields involving excitable complex media.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Molecular phylogenetic studies of homologous sequences of nucleotides often assume that the underlying evolutionary process was globally stationary, reversible, and homogeneous (SRH), and that a model of evolution with one or more site-specific and time-reversible rate matrices (e.g., the GTR rate matrix) is enough to accurately model the evolution of data over the whole tree. However, an increasing body of data suggests that evolution under these conditions is an exception, rather than the norm. To address this issue, several non-SRH models of molecular evolution have been proposed, but they either ignore heterogeneity in the substitution process across sites (HAS) or assume it can be modeled accurately using the distribution. As an alternative to these models of evolution, we introduce a family of mixture models that approximate HAS without the assumption of an underlying predefined statistical distribution. This family of mixture models is combined with non-SRH models of evolution that account for heterogeneity in the substitution process across lineages (HAL). We also present two algorithms for searching model space and identifying an optimal model of evolution that is less likely to over- or underparameterize the data. The performance of the two new algorithms was evaluated using alignments of nucleotides with 10 000 sites simulated under complex non-SRH conditions on a 25-tipped tree. The algorithms were found to be very successful, identifying the correct HAL model with a 75% success rate (the average success rate for assigning rate matrices to the tree's 48 edges was 99.25%) and, for the correct HAL model, identifying the correct HAS model with a 98% success rate. Finally, parameter estimates obtained under the correct HAL-HAS model were found to be accurate and precise. The merits of our new algorithms were illustrated with an analysis of 42 337 second codon sites extracted from a concatenation of 106 alignments of orthologous genes encoded by the nuclear genomes of Saccharomyces cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. castellii, S. kluyveri, S. bayanus, and Candida albicans. Our results show that second codon sites in the ancestral genome of these species contained 49.1% invariable sites, 39.6% variable sites belonging to one rate category (V1), and 11.3% variable sites belonging to a second rate category (V2). The ancestral nucleotide content was found to differ markedly across these three sets of sites, and the evolutionary processes operating at the variable sites were found to be non-SRH and best modeled by a combination of eight edge-specific rate matrices (four for V1 and four for V2). The number of substitutions per site at the variable sites also differed markedly, with sites belonging to V1 evolving slower than those belonging to V2 along the lineages separating the seven species of Saccharomyces. Finally, sites belonging to V1 appeared to have ceased evolving along the lineages separating S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus, implying that they might have become so selectively constrained that they could be considered invariable sites in these species.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This contribution is a long-term study of the evolving use of the organization-wide groupware in a service network. We are describing the practices related to organization-wide groupware in conjunction with local groupware-related practices and how they have proceeded since the organization was established. In the discussion of these practices we are focussing on issues such as: 1. tendencies for proliferation and integration, 2. local appropriations of a variety of systems, 3. creative appropriations, including the creation of a unique heterogeneous groupware fabric, 4. the design strategy of multiple parallel experimental use an; 5. the relation between disparate local meanings and successful computer supported cooperative practice. As an overarching theme we are exploring the explanatory value of the concepts of objectification and appropriation as compared to the concepts of design vs. use.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Circulating tumor cells (CTCs) in the blood of cancer patients are recognized as important potential targets for future anticancer therapies. As mediators of metastatic spread, CTCs are also promising to be used as € liquid biopsyto aid clinical decision-making. Recent work has revealed potentially important genotypic and phenotypic heterogeneity within CTC populations, even within the same patient. MicroRNAs (miRNAs) are key regulators of gene expression and have emerged as potentially important diagnostic markers and targets for anti-cancer therapy. Here, we describe a robust in situ hybridization (ISH) protocol, incorporating the CellSearch ® CTC detection system, enabling clinical investigation of important miRNAs, such as miR-10b on a cell by cell basis. We also use this method to demonstrate heterogeneity of such as miR-10b on a cell-by-cell basis. We also use this method to demonstrate heterogeneity of miR-10b in individual CTCs from breast, prostate and colorectal cancer patients.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents a novel approach to building large-scale agent-based models of networked physical systems using a compositional approach to provide extensibility and flexibility in building the models and simulations. A software framework (MODAM - MODular Agent-based Model) was implemented for this purpose, and validated through simulations. These simulations allow assessment of the impact of technological change on the electricity distribution network looking at the trajectories of electricity consumption at key locations over many years.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The detection and replication of schizophrenia risk loci can require substantial sample sizes, which has prompted various collaborative efforts for combining multiple samples. However, pooled samples may comprise sub-samples with substantial population genetic differences, including allele frequency differences. We investigated the impact of population differences via linkage reanalysis of Molecular Genetics of Schizophrenia 1 (MGS1) affected sibling-pair data, comprising two samples of distinct ancestral origin: European (EA: 263 pedigrees) and African-American (AA: 146 pedigrees). To exploit the linkage information contained within these distinct continental samples, we performed separate analyses of the individual samples, allowing for within-sample locus heterogeneity, and the pooled sample, allowing for both within-sample and between-sample heterogeneity. Significance levels, corrected for the multiple tests, were determined empirically. For all suggestive peaks, stronger linkage evidence was obtained in either the EA or AA sample than the combined sample, regardless of how heterogeneity was modeled for the latter. Notably, we report genomewide significant linkage of schizophrenia to 8p23.3 and evidence for a second, independent susceptibility locus, reaching suggestive linkage, 29 cM away on 8p21.3. We also detected suggestive linkage on chromosomes 5p13.3 and 7q36.2. Many regions showed pronounced differences in the extent of linkage between the EA and AA samples. This reanalysis highlights the potential impact of population differences upon linkage evidence in pooled data and demonstrates a useful approach for the analysis of samples drawn from distinct continental groups.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As for other complex diseases, linkage analyses of schizophrenia (SZ) have produced evidence for numerous chromosomal regions, with inconsistent results reported across studies. The presence of locus heterogeneity appears likely and may reduce the power of linkage analyses if homogeneity is assumed. In addition, when multiple heterogeneous datasets are pooled, inter-sample variation in the proportion of linked families (alpha) may diminish the power of the pooled sample to detect susceptibility loci, in spite of the larger sample size obtained. We compare the significance of linkage findings obtained using allele-sharing LOD scores (LOD(exp))-which assume homogeneity-and heterogeneity LOD scores (HLOD) in European American and African American NIMH SZ families. We also pool these two samples and evaluate the relative power of the LOD(exp) and two different heterogeneity statistics. One of these (HLOD-P) estimates the heterogeneity parameter alpha only in aggregate data, while the second (HLOD-S) determines alpha separately for each sample. In separate and combined data, we show consistently improved performance of HLOD scores over LOD(exp). Notably, genome-wide significant evidence for linkage is obtained at chromosome 10p in the European American sample using a recessive HLOD score. When the two samples are combined, linkage at the 10p locus also achieves genome-wide significance under HLOD-S, but not HLOD-P. Using HLOD-S, improved evidence for linkage was also obtained for a previously reported region on chromosome 15q. In linkage analyses of complex disease, power may be maximised by routinely modelling locus heterogeneity within individual datasets, even when multiple datasets are combined to form larger samples.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper seeks to draw attention to the importance of appreciating and using ever-present diversity to achieve increased legitimacy for entrepreneurship education. As such, it aims to draw the reader into a reflective process of discovery as to why entrepreneurship education is important and how such importance can be prolonged. Design/methodology/approach - The paper revisits Gartner's 1985 conceptual framework for understanding the complexity of entrepreneurship. The paper proposes an alternative framework based on the logic of Gartner's framework to advance the understanding of entrepreneurship education. The authors discuss the dimensions of the proposed framework and explain the nature of the dialogic relations contained within. Findings - It is argued that the proposed conceptual framework provides a new way to understand ever-present heterogeneity related to the development and delivery of entrepreneurship education. Practical implications - The paper extends an invitation to the reader to audit their own involvement and proximity to entrepreneurship education. It argues that increased awareness of the value that heterogeneity plays in student learning outcomes and programme branding is directly related to the presence of heterogeneity across the dimensions of the conceptual framework. Originality/value - The paper introduces a simple yet powerlu1 means of understanding what factors contribute to the success or otherwise of developing and delivering entrepreneurship education. The simplicity of the approach suggested provides all entrepreneurship educators with the means to audit all facets of their programme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Compositional data analysis usually deals with relative information between parts where the total (abundances, mass, amount, etc.) is unknown or uninformative. This article addresses the question of what to do when the total is known and is of interest. Tools used in this case are reviewed and analysed, in particular the relationship between the positive orthant of D-dimensional real space, the product space of the real line times the D-part simplex, and their Euclidean space structures. The first alternative corresponds to data analysis taking logarithms on each component, and the second one to treat a log-transformed total jointly with a composition describing the distribution of component amounts. Real data about total abundances of phytoplankton in an Australian river motivated the present study and are used for illustration.