135 resultados para Probabilistic metrics
Resumo:
Recent technological advances in remote sensing have enabled investigation of the morphodynamics and hydrodynamics of large rivers. However, measuring topography and flow in these very large rivers is time consuming and thus often constrains the spatial resolution and reach-length scales that can be monitored. Similar constraints exist for computational fluid dynamics (CFD) studies of large rivers, requiring maximization of mesh-or grid-cell dimensions and implying a reduction in the representation of bedform-roughness elements that are of the order of a model grid cell or less, even if they are represented in available topographic data. These ``subgrid'' elements must be parameterized, and this paper applies and considers the impact of roughness-length treatments that include the effect of bed roughness due to ``unmeasured'' topography. CFD predictions were found to be sensitive to the roughness-length specification. Model optimization was based on acoustic Doppler current profiler measurements and estimates of the water surface slope for a variety of roughness lengths. This proved difficult as the metrics used to assess optimal model performance diverged due to the effects of large bedforms that are not well parameterized in roughness-length treatments. However, the general spatial flow patterns are effectively predicted by the model. Changes in roughness length were shown to have a major impact upon flow routing at the channel scale. The results also indicate an absence of secondary flow circulation cells in the reached studied, and suggest simpler two-dimensional models may have great utility in the investigation of flow within large rivers. Citation: Sandbach, S. D. et al. (2012), Application of a roughness-length representation to parameterize energy loss in 3-D numerical simulations of large rivers, Water Resour. Res., 48, W12501, doi: 10.1029/2011WR011284.
Resumo:
The paper follows on from earlier work [Taroni F and Aitken CGG. Probabilistic reasoning in the law, Part 1: assessment of probabilities and explanation of the value of DNA evidence. Science & Justice 1998; 38: 165-177]. Different explanations of the value of DNA evidence were presented to students from two schools of forensic science and to members of fifteen laboratories all around the world. The responses were divided into two groups; those which came from a school or laboratory identified as Bayesian and those which came from a school or laboratory identified as non-Bayesian. The paper analyses these responses using a likelihood approach. This approach is more consistent with a Bayesian analysis than one based on a frequentist approach, as was reported by Taroni F and Aitken CGG. [Probabilistic reasoning in the law, Part 1: assessment of probabilities and explanation of the value of DNA evidence] in Science & Justice 1998.
Resumo:
Network analysis naturally relies on graph theory and, more particularly, on the use of node and edge metrics to identify the salient properties in graphs. When building visual maps of networks, these metrics are turned into useful visual cues or are used interactively to filter out parts of a graph while querying it, for instance. Over the years, analysts from different application domains have designed metrics to serve specific needs. Network science is an inherently cross-disciplinary field, which leads to the publication of metrics with similar goals; different names and descriptions of their analytics often mask the similarity between two metrics that originated in different fields. Here, we study a set of graph metrics and compare their relative values and behaviors in an effort to survey their potential contributions to the spatial analysis of networks.
Resumo:
Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot
Resumo:
AimWe take a comparative phylogeographical approach to assess whether three species involved in a specialized oil-rewarding pollination system (i.e. Lysimachia vulgaris and two oil-collecting bees within the genus Macropis) show congruent phylogeographical trajectories during post-glacial colonization processes. Our working hypothesis is that within specialized mutualistic interactions, where each species relies on the co-occurrence of the other for survival and/or reproduction, partners are expected to show congruent evolutionary trajectories, because they are likely to have followed parallel migration routes and to have shared glacial refugia. LocationWestern Palaearctic. MethodsOur analysis relies on the extensive sampling of 104 Western Palaearctic populations (totalling 434, 159 and 74 specimens of Lysimachiavulgaris, Macropiseuropaea and Macropisfulvipes, respectively), genotyped with amplified fragment length polymorphism. Based on this, we evaluated the regional genetic diversity (Shannon diversity and allele rarity index) and genetic structure (assessed using structure, population networks, isolation-by-distance and spatial autocorrelation metrics) of each species. Finally, we compared the general phylogeographical patterns obtained. ResultsContrary to our expectations, the analyses revealed phylogeographical signals suggesting that the investigated organisms demonstrate independent post-glacial trajectories as well as distinct contemporaneous demographic parameters, despite their mutualistic interaction. Main conclusionsThe mutualistic partners investigated here are likely to be experiencing distinct and independent evolutionary dynamics because of their contrasting life-history traits (e.g. dispersal abilities), as well as distinct hubs and migration routes. Such conditions would prevent and/or erase any signature of co-structuring of lineages in space and time. As a result, the lack of phylogeographical congruence driven by differences in life-history traits might have arisen irrespective of the three species having shared similar Pleistocene glacial refugia.
Resumo:
Boundaries for delta, representing a "quantitatively significant" or "substantively impressive" distinction, have not been established, analogous to the boundary of alpha, usually set at 0.05, for the stochastic or probabilistic component of "statistical significance". To determine what boundaries are being used for the "quantitative" decisions, we reviewed pertinent articles in three general medical journals. For each contrast of two means, contrast of two rates, or correlation coefficient, we noted the investigators' decisions about stochastic significance, stated in P values or confidence intervals, and about quantitative significance, indicated by interpretive comments. The boundaries between impressive and unimpressive distinctions were best formed by a ratio of greater than or equal to 1.2 for the smaller to the larger mean in 546 comparisons, by a standardized increment of greater than or equal to 0.28 and odds ratio of greater than or equal to 2.2 in 392 comparisons of two rates; and by an r value of greater than or equal to 0.32 in 154 correlation coefficients. Additional boundaries were also identified for "substantially" and "highly" significant quantitative distinctions. Although the proposed boundaries should be kept flexible, indexes and boundaries for decisions about "quantitative significance" are particularly useful when a value of delta must be chosen for calculating sample size before the research is done, and when the "statistical significance" of completed research is appraised for its quantitative as well as stochastic components.
Resumo:
BACKGROUND: Data for trends in glycaemia and diabetes prevalence are needed to understand the effects of diet and lifestyle within populations, assess the performance of interventions, and plan health services. No consistent and comparable global analysis of trends has been done. We estimated trends and their uncertainties in mean fasting plasma glucose (FPG) and diabetes prevalence for adults aged 25 years and older in 199 countries and territories. METHODS: We obtained data from health examination surveys and epidemiological studies (370 country-years and 2·7 million participants). We converted systematically between different glycaemic metrics. For each sex, we used a Bayesian hierarchical model to estimate mean FPG and its uncertainty by age, country, and year, accounting for whether a study was nationally, subnationally, or community representative. FINDINGS: In 2008, global age-standardised mean FPG was 5·50 mmol/L (95% uncertainty interval 5·37-5·63) for men and 5·42 mmol/L (5·29-5·54) for women, having risen by 0·07 mmol/L and 0·09 mmol/L per decade, respectively. Age-standardised adult diabetes prevalence was 9·8% (8·6-11·2) in men and 9·2% (8·0-10·5) in women in 2008, up from 8·3% (6·5-10·4) and 7·5% (5·8-9·6) in 1980. The number of people with diabetes increased from 153 (127-182) million in 1980, to 347 (314-382) million in 2008. We recorded almost no change in mean FPG in east and southeast Asia and central and eastern Europe. Oceania had the largest rise, and the highest mean FPG (6·09 mmol/L, 5·73-6·49 for men; 6·08 mmol/L, 5·72-6·46 for women) and diabetes prevalence (15·5%, 11·6-20·1 for men; and 15·9%, 12·1-20·5 for women) in 2008. Mean FPG and diabetes prevalence in 2008 were also high in south Asia, Latin America and the Caribbean, and central Asia, north Africa, and the Middle East. Mean FPG in 2008 was lowest in sub-Saharan Africa, east and southeast Asia, and high-income Asia-Pacific. In high-income subregions, western Europe had the smallest rise, 0·07 mmol/L per decade for men and 0·03 mmol/L per decade for women; North America had the largest rise, 0·18 mmol/L per decade for men and 0·14 mmol/L per decade for women. INTERPRETATION: Glycaemia and diabetes are rising globally, driven both by population growth and ageing and by increasing age-specific prevalences. Effective preventive interventions are needed, and health systems should prepare to detect and manage diabetes and its sequelae. FUNDING: Bill & Melinda Gates Foundation and WHO.
Resumo:
Many studies have provided evidence that prey adjust their behaviour to adaptively balance the fitness effects of reproduction and predation risk. Nocturnal terrestrial animals should deal with a range of environmental conditions during the reproductive season at the breeding sites, including a variable amount of natural ambient light. High degrees of illumination are expected to minimize those behaviours that might increase the animal detection by predators. Therefore, under habitat variable brightness conditions and in different ecosystems, the above mentioned behaviours are expected to depend on the variation in predation risk. Although moon effects on amphibian biology have been recognized, the direction of this influence is rather controversial with evidences of both increased and depressed activity under full moon. We tested in four nocturnal amphibian species (Hyla intermedia, Rana dalmatina, Rana italica, Salamandrina perspicillata) the effects of different (i) light conditions and (ii) habitats (open land vs. dense forest) on the reproductive phenology. Our results showed that the effects of the lunar cycle on the study species are associated with the change in luminosity, and there is no evidence of an endogenous rhythm controlled by biological clocks. The habitat type conditioned the amphibian reproductive strategy in relation to moon phases. Open habitat breeders (e. g., ponds with no canopy cover) strongly avoided conditions with high brightness, whereas forest habitat breeders were apparently unaffected by the different moon phases. Indeed, for all the studied species no effects of the moon phase itself on the considered metrics were found. Rather, the considered amphibian species seem to be conditioned mainly by moonlight irrespective of the moon phase. The two anurans spawning in open habitat apparently adjust their oviposition timing by balancing the fitness effects of the risk to be detected by predators and the reproduction.
Resumo:
Background The 'database search problem', that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions. The method's graphical environment, along with its computational and probabilistic architectures, represents a rich package that offers analysts and discussants with additional modes of interaction, concise representation, and coherent communication.
Resumo:
We investigated procedural learning in 18 children with basal ganglia (BG) lesions or dysfunctions of various aetiologies, using a visuo-motor learning test, the Serial Reaction Time (SRT) task, and a cognitive learning test, the Probabilistic Classification Learning (PCL) task. We compared patients with early (<1 year old, n=9), later onset (>6 years old, n=7) or progressive disorder (idiopathic dystonia, n=2). All patients showed deficits in both visuo-motor and cognitive domains, except those with idiopathic dystonia, who displayed preserved classification learning skills. Impairments seem to be independent from the age of onset of pathology. As far as we know, this study is the first to investigate motor and cognitive procedural learning in children with BG damage. Procedural impairments were documented whatever the aetiology of the BG damage/dysfunction and time of pathology onset, thus supporting the claim of very early skill learning development and lack of plasticity in case of damage.
Resumo:
PURPOSE: All kinds of blood manipulations aim to increase the total hemoglobin mass (tHb-mass). To establish tHb-mass as an effective screening parameter for detecting blood doping, the knowledge of its normal variation over time is necessary. The aim of the present study, therefore, was to determine the intraindividual variance of tHb-mass in elite athletes during a training year emphasizing off, training, and race seasons at sea level. METHODS: tHb-mass and hemoglobin concentration ([Hb]) were determined in 24 endurance athletes five times during a year and were compared with a control group (n = 6). An analysis of covariance was used to test the effects of training phases, age, gender, competition level, body mass, and training volume. Three error models, based on 1) a total percentage error of measurement, 2) the combination of a typical percentage error (TE) of analytical origin with an absolute SD of biological origin, and 3) between-subject and within-subject variance components as obtained by an analysis of variance, were tested. RESULTS: In addition to the expected influence of performance status, the main results were that the effects of training volume (P = 0.20) and training phases (P = 0.81) on tHb-mass were not significant. We found that within-subject variations mainly have an analytical origin (TE approximately 1.4%) and a very small SD (7.5 g) of biological origin. CONCLUSION: tHb-mass shows very low individual oscillations during a training year (<6%), and these oscillations are below the expected changes in tHb-mass due to Herythropoetin (EPO) application or blood infusion (approximately 10%). The high stability of tHb-mass over a period of 1 year suggests that it should be included in an athlete's biological passport and analyzed by recently developed probabilistic inference techniques that define subject-based reference ranges.
Resumo:
Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high specificity of orthology assignments for these databases. We show that differences in the completeness of predicted gene relationships and in the phylogenetic information are, for the great majority, not due to the methods used, but to differences in the underlying database concepts. According to our metrics, none of the databases provides a fully correct and comprehensive protein classification. Our results provide a framework for meaningful and systematic comparisons of phylogenomic databases. In the future, a sustainable set of 'Gold standard' phylogenetic trees could provide a robust method for phylogenomic databases to assess their current quality status, measure changes following new database releases and diagnose improvements subsequent to an upgrade of the analysis procedure.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.
Resumo:
In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to 'explain' an observed set of alleles. The other procedure is probabilistic using Bayes' theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.
Resumo:
A ubiquitous assessment of swimming velocity (main metric of the performance) is essential for the coach to provide a tailored feedback to the trainee. We present a probabilistic framework for the data-driven estimation of the swimming velocity at every cycle using a low-cost wearable inertial measurement unit (IMU). The statistical validation of the method on 15 swimmers shows that an average relative error of 0.1 ± 9.6% and high correlation with the tethered reference system (rX,Y=0.91 ) is achievable. Besides, a simple tool to analyze the influence of sacrum kinematics on the performance is provided.