132 resultados para probabilistic topic models
em Université de Lausanne, Switzerland
Resumo:
Due to the rise of criminal, civil and administrative judicial situations involving people lacking valid identity documents, age estimation of living persons has become an important operational procedure for numerous forensic and medicolegal services worldwide. The chronological age of a given person is generally estimated from the observed degree of maturity of some selected physical attributes by means of statistical methods. However, their application in the forensic framework suffers from some conceptual and practical drawbacks, as recently claimed in the specialised literature. The aim of this paper is therefore to offer an alternative solution for overcoming these limits, by reiterating the utility of a probabilistic Bayesian approach for age estimation. This approach allows one to deal in a transparent way with the uncertainty surrounding the age estimation process and to produce all the relevant information in the form of posterior probability distribution about the chronological age of the person under investigation. Furthermore, this probability distribution can also be used for evaluating in a coherent way the possibility that the examined individual is younger or older than a given legal age threshold having a particular legal interest. The main novelty introduced by this work is the development of a probabilistic graphical model, i.e. a Bayesian network, for dealing with the problem at hand. The use of this kind of probabilistic tool can significantly facilitate the application of the proposed methodology: examples are presented based on data related to the ossification status of the medial clavicular epiphysis. The reliability and the advantages of this probabilistic tool are presented and discussed.
Resumo:
Part I of this series of articles focused on the construction of graphical probabilistic inference procedures, at various levels of detail, for assessing the evidential value of gunshot residue (GSR) particle evidence. The proposed models - in the form of Bayesian networks - address the issues of background presence of GSR particles, analytical performance (i.e., the efficiency of evidence searching and analysis procedures) and contamination. The use and practical implementation of Bayesian networks for case pre-assessment is also discussed. This paper, Part II, concentrates on Bayesian parameter estimation. This topic complements Part I in that it offers means for producing estimates useable for the numerical specification of the proposed probabilistic graphical models. Bayesian estimation procedures are given a primary focus of attention because they allow the scientist to combine (his/her) prior knowledge about the problem of interest with newly acquired experimental data. The present paper also considers further topics such as the sensitivity of the likelihood ratio due to uncertainty in parameters and the study of likelihood ratio values obtained for members of particular populations (e.g., individuals with or without exposure to GSR).
Resumo:
Detailed large-scale information on mammal distribution has often been lacking, hindering conservation efforts. We used the information from the 2009 IUCN Red List of Threatened Species as a baseline for developing habitat suitability models for 5027 out of 5330 known terrestrial mammal species, based on their habitat relationships. We focused on the following environmental variables: land cover, elevation and hydrological features. Models were developed at 300 m resolution and limited to within species' known geographical ranges. A subset of the models was validated using points of known species occurrence. We conducted a global, fine-scale analysis of patterns of species richness. The richness of mammal species estimated by the overlap of their suitable habitat is on average one-third less than that estimated by the overlap of their geographical ranges. The highest absolute difference is found in tropical and subtropical regions in South America, Africa and Southeast Asia that are not covered by dense forest. The proportion of suitable habitat within mammal geographical ranges correlates with the IUCN Red List category to which they have been assigned, decreasing monotonically from Least Concern to Endangered. These results demonstrate the importance of fine-resolution distribution data for the development of global conservation strategies for mammals.
Resumo:
INTRODUCTION: This study sought to increase understanding of women's thoughts and feelings about decision making and the experience of subsequent pregnancy following stillbirth (intrauterine death after 24 weeks' gestation). METHODS: Eleven women were interviewed, 8 of whom were pregnant at the time of the interview. Modified grounded theory was used to guide the research methodology and to analyze the data. RESULTS: A model was developed to illustrate women's experiences of decision making in relation to subsequent pregnancy and of subsequent pregnancy itself. DISCUSSION: The results of the current study have significant implications for women who have experienced stillbirth and the health professionals who work with them. Based on the model, women may find it helpful to discuss their beliefs in relation to healing and health professionals to provide support with this in mind. Women and their partners may also benefit from explanations and support about the potentially conflicting emotions they may experience during this time.
Resumo:
OBJECTIVE: To calculate the variable costs involved with the process of delivering erythropoiesis stimulating agents (ESA) in European dialysis practices. METHODS: A conceptual model was developed to classify the processes and sub-processes followed in the pharmacy (ordering from supplier, receiving/storing/delivering ESA to the dialysis unit), dialysis unit (dose determination, ordering, receipt, registration, storage, administration, registration) and waste disposal unit. Time and material costs were recorded. Labour costs were derived from actual local wages while material costs came from the facilities' accounting records. Activities associated with ESA administration were listed and each activity evaluated to determine if dosing frequency affected the amount of resources required. RESULTS: A total of 21 centres in 8 European countries supplied data for 142 patients (mean) per hospital (range 42-648). Patients received various ESA regimens (thrice-weekly, twice-weekly, once-weekly, once every 2 weeks and once-monthly). Administering ESA every 2 weeks, the mean costs per patient per year for each process and the estimates of the percentage reduction in costs obtainable, respectively, were: pharmacy labour (10.1 euro, 39%); dialysis unit labour (66.0 euro, 65%); dialysis unit materials (4.11 euro, 61%) and waste unit materials (0.43 euro, 49%). LIMITATION: Impact on financial costs was not measured. CONCLUSION: ESA administration has quantifiable labour and material costs which are affected by dosing frequency.
Resumo:
The success of combination antiretroviral therapy is limited by the evolutionary escape dynamics of HIV-1. We used Isotonic Conjunctive Bayesian Networks (I-CBNs), a class of probabilistic graphical models, to describe this process. We employed partial order constraints among viral resistance mutations, which give rise to a limited set of mutational pathways, and we modeled phenotypic drug resistance as monotonically increasing along any escape pathway. Using this model, the individualized genetic barrier (IGB) to each drug is derived as the probability of the virus not acquiring additional mutations that confer resistance. Drug-specific IGBs were combined to obtain the IGB to an entire regimen, which quantifies the virus' genetic potential for developing drug resistance under combination therapy. The IGB was tested as a predictor of therapeutic outcome using between 2,185 and 2,631 treatment change episodes of subtype B infected patients from the Swiss HIV Cohort Study Database, a large observational cohort. Using logistic regression, significant univariate predictors included most of the 18 drugs and single-drug IGBs, the IGB to the entire regimen, the expert rules-based genotypic susceptibility score (GSS), several individual mutations, and the peak viral load before treatment change. In the multivariate analysis, the only genotype-derived variables that remained significantly associated with virological success were GSS and, with 10-fold stronger association, IGB to regimen. When predicting suppression of viral load below 400 cps/ml, IGB outperformed GSS and also improved GSS-containing predictors significantly, but the difference was not significant for suppression below 50 cps/ml. Thus, the IGB to regimen is a novel data-derived predictor of treatment outcome that has potential to improve the interpretation of genotypic drug resistance tests.
Resumo:
Results of 14 randomized controlled trials of acupuncture for chronic pain were pooled in a meta-analysis and analysed in three subgroups according to site of pain; and in two subgroups each according to type to trial, type of treatment, type of control, 'blindness' of participating agents, trial size, and type of journal in which results were published. While few individual trials had statistically significant results, pooled results of many subgroups attained statistical significance in favour of acupuncture. Various potential sources of bias, including problems with blindness, precluded a conclusive finding although most results apparently favoured acupuncture.
Resumo:
Aim Conservation strategies are in need of predictions that capture spatial community composition and structure. Currently, the methods used to generate these predictions generally focus on deterministic processes and omit important stochastic processes and other unexplained variation in model outputs. Here we test a novel approach of community models that accounts for this variation and determine how well it reproduces observed properties of alpine butterfly communities. Location The western Swiss Alps. Methods We propose a new approach to process probabilistic predictions derived from stacked species distribution models (S-SDMs) in order to predict and assess the uncertainty in the predictions of community properties. We test the utility of our novel approach against a traditional threshold-based approach. We used mountain butterfly communities spanning a large elevation gradient as a case study and evaluated the ability of our approach to model species richness and phylogenetic diversity of communities. Results S-SDMs reproduced the observed decrease in phylogenetic diversity and species richness with elevation, syndromes of environmental filtering. The prediction accuracy of community properties vary along environmental gradient: variability in predictions of species richness was higher at low elevation, while it was lower for phylogenetic diversity. Our approach allowed mapping the variability in species richness and phylogenetic diversity projections. Main conclusion Using our probabilistic approach to process species distribution models outputs to reconstruct communities furnishes an improved picture of the range of possible assemblage realisations under similar environmental conditions given stochastic processes and help inform manager of the uncertainty in the modelling results
Resumo:
Difficult tracheal intubation assessment is an important research topic in anesthesia as failed intubations are important causes of mortality in anesthetic practice. The modified Mallampati score is widely used, alone or in conjunction with other criteria, to predict the difficulty of intubation. This work presents an automatic method to assess the modified Mallampati score from an image of a patient with the mouth wide open. For this purpose we propose an active appearance models (AAM) based method and use linear support vector machines (SVM) to select a subset of relevant features obtained using the AAM. This feature selection step proves to be essential as it improves drastically the performance of classification, which is obtained using SVM with RBF kernel and majority voting. We test our method on images of 100 patients undergoing elective surgery and achieve 97.9% accuracy in the leave-one-out crossvalidation test and provide a key element to an automatic difficult intubation assessment system.
Resumo:
Continuing developments in science and technology mean that the amounts of information forensic scientists are able to provide for criminal investigations is ever increasing. The commensurate increase in complexity creates difficulties for scientists and lawyers with regard to evaluation and interpretation, notably with respect to issues of inference and decision. Probability theory, implemented through graphical methods, and specifically Bayesian networks, provides powerful methods to deal with this complexity. Extensions of these methods to elements of decision theory provide further support and assistance to the judicial system. Bayesian Networks for Probabilistic Inference and Decision Analysis in Forensic Science provides a unique and comprehensive introduction to the use of Bayesian decision networks for the evaluation and interpretation of scientific findings in forensic science, and for the support of decision-makers in their scientific and legal tasks. Includes self-contained introductions to probability and decision theory. Develops the characteristics of Bayesian networks, object-oriented Bayesian networks and their extension to decision models. Features implementation of the methodology with reference to commercial and academically available software. Presents standard networks and their extensions that can be easily implemented and that can assist in the reader's own analysis of real cases. Provides a technique for structuring problems and organizing data based on methods and principles of scientific reasoning. Contains a method for the construction of coherent and defensible arguments for the analysis and evaluation of scientific findings and for decisions based on them. Is written in a lucid style, suitable for forensic scientists and lawyers with minimal mathematical background. Includes a foreword by Ian Evett. The clear and accessible style of this second edition makes this book ideal for all forensic scientists, applied statisticians and graduate students wishing to evaluate forensic findings from the perspective of probability and decision analysis. It will also appeal to lawyers and other scientists and professionals interested in the evaluation and interpretation of forensic findings, including decision making based on scientific information.
Resumo:
SUMMARYSpecies distribution models (SDMs) represent nowadays an essential tool in the research fields of ecology and conservation biology. By combining observations of species occurrence or abundance with information on the environmental characteristic of the observation sites, they can provide information on the ecology of species, predict their distributions across the landscape or extrapolate them to other spatial or time frames. The advent of SDMs, supported by geographic information systems (GIS), new developments in statistical models and constantly increasing computational capacities, has revolutionized the way ecologists can comprehend species distributions in their environment. SDMs have brought the tool that allows describing species realized niches across a multivariate environmental space and predict their spatial distribution. Predictions, in the form of probabilistic maps showing the potential distribution of the species, are an irreplaceable mean to inform every single unit of a territory about its biodiversity potential. SDMs and the corresponding spatial predictions can be used to plan conservation actions for particular species, to design field surveys, to assess the risks related to the spread of invasive species, to select reserve locations and design reserve networks, and ultimately, to forecast distributional changes according to scenarios of climate and/or land use change.By assessing the effect of several factors on model performance and on the accuracy of spatial predictions, this thesis aims at improving techniques and data available for distribution modelling and at providing the best possible information to conservation managers to support their decisions and action plans for the conservation of biodiversity in Switzerland and beyond. Several monitoring programs have been put in place from the national to the global scale, and different sources of data now exist and start to be available to researchers who want to model species distribution. However, because of the lack of means, data are often not gathered at an appropriate resolution, are sampled only over limited areas, are not spatially explicit or do not provide a sound biological information. A typical example of this is data on 'habitat' (sensu biota). Even though this is essential information for an effective conservation planning, it often has to be approximated from land use, the closest available information. Moreover, data are often not sampled according to an established sampling design, which can lead to biased samples and consequently to spurious modelling results. Understanding the sources of variability linked to the different phases of the modelling process and their importance is crucial in order to evaluate the final distribution maps that are to be used for conservation purposes.The research presented in this thesis was essentially conducted within the framework of the Landspot Project, a project supported by the Swiss National Science Foundation. The main goal of the project was to assess the possible contribution of pre-modelled 'habitat' units to model the distribution of animal species, in particular butterfly species, across Switzerland. While pursuing this goal, different aspects of data quality, sampling design and modelling process were addressed and improved, and implications for conservation discussed. The main 'habitat' units considered in this thesis are grassland and forest communities of natural and anthropogenic origin as defined in the typology of habitats for Switzerland. These communities are mainly defined at the phytosociological level of the alliance. For the time being, no comprehensive map of such communities is available at the national scale and at fine resolution. As a first step, it was therefore necessary to create distribution models and maps for these communities across Switzerland and thus to gather and collect the necessary data. In order to reach this first objective, several new developments were necessary such as the definition of expert models, the classification of the Swiss territory in environmental domains, the design of an environmentally stratified sampling of the target vegetation units across Switzerland, the development of a database integrating a decision-support system assisting in the classification of the relevés, and the downscaling of the land use/cover data from 100 m to 25 m resolution.The main contributions of this thesis to the discipline of species distribution modelling (SDM) are assembled in four main scientific papers. In the first, published in Journal of Riogeography different issues related to the modelling process itself are investigated. First is assessed the effect of five different stepwise selection methods on model performance, stability and parsimony, using data of the forest inventory of State of Vaud. In the same paper are also assessed: the effect of weighting absences to ensure a prevalence of 0.5 prior to model calibration; the effect of limiting absences beyond the environmental envelope defined by presences; four different methods for incorporating spatial autocorrelation; and finally, the effect of integrating predictor interactions. Results allowed to specifically enhance the GRASP tool (Generalized Regression Analysis and Spatial Predictions) that now incorporates new selection methods and the possibility of dealing with interactions among predictors as well as spatial autocorrelation. The contribution of different sources of remotely sensed information to species distribution models was also assessed. The second paper (to be submitted) explores the combined effects of sample size and data post-stratification on the accuracy of models using data on grassland distribution across Switzerland collected within the framework of the Landspot project and supplemented with other important vegetation databases. For the stratification of the data, different spatial frameworks were compared. In particular, environmental stratification by Swiss Environmental Domains was compared to geographical stratification either by biogeographic regions or political states (cantons). The third paper (to be submitted) assesses the contribution of pre- modelled vegetation communities to the modelling of fauna. It is a two-steps approach that combines the disciplines of community ecology and spatial ecology and integrates their corresponding concepts of habitat. First are modelled vegetation communities per se and then these 'habitat' units are used in order to model animal species habitat. A case study is presented with grassland communities and butterfly species. Different ways of integrating vegetation information in the models of butterfly distribution were also evaluated. Finally, a glimpse to climate change is given in the fourth paper, recently published in Ecological Modelling. This paper proposes a conceptual framework for analysing range shifts, namely a catalogue of the possible patterns of change in the distribution of a species along elevational or other environmental gradients and an improved quantitative methodology to identify and objectively describe these patterns. The methodology was developed using data from the Swiss national common breeding bird survey and the article presents results concerning the observed shifts in the elevational distribution of breeding birds in Switzerland.The overall objective of this thesis is to improve species distribution models as potential inputs for different conservation tools (e.g. red lists, ecological networks, risk assessment of the spread of invasive species, vulnerability assessment in the context of climate change). While no conservation issues or tools are directly tested in this thesis, the importance of the proposed improvements made in species distribution modelling is discussed in the context of the selection of reserve networks.RESUMELes modèles de distribution d'espèces (SDMs) représentent aujourd'hui un outil essentiel dans les domaines de recherche de l'écologie et de la biologie de la conservation. En combinant les observations de la présence des espèces ou de leur abondance avec des informations sur les caractéristiques environnementales des sites d'observation, ces modèles peuvent fournir des informations sur l'écologie des espèces, prédire leur distribution à travers le paysage ou l'extrapoler dans l'espace et le temps. Le déploiement des SDMs, soutenu par les systèmes d'information géographique (SIG), les nouveaux développements dans les modèles statistiques, ainsi que la constante augmentation des capacités de calcul, a révolutionné la façon dont les écologistes peuvent comprendre la distribution des espèces dans leur environnement. Les SDMs ont apporté l'outil qui permet de décrire la niche réalisée des espèces dans un espace environnemental multivarié et prédire leur distribution spatiale. Les prédictions, sous forme de carte probabilistes montrant la distribution potentielle de l'espèce, sont un moyen irremplaçable d'informer chaque unité du territoire de sa biodiversité potentielle. Les SDMs et les prédictions spatiales correspondantes peuvent être utilisés pour planifier des mesures de conservation pour des espèces particulières, pour concevoir des plans d'échantillonnage, pour évaluer les risques liés à la propagation d'espèces envahissantes, pour choisir l'emplacement de réserves et les mettre en réseau, et finalement, pour prévoir les changements de répartition en fonction de scénarios de changement climatique et/ou d'utilisation du sol. En évaluant l'effet de plusieurs facteurs sur la performance des modèles et sur la précision des prédictions spatiales, cette thèse vise à améliorer les techniques et les données disponibles pour la modélisation de la distribution des espèces et à fournir la meilleure information possible aux gestionnaires pour appuyer leurs décisions et leurs plans d'action pour la conservation de la biodiversité en Suisse et au-delà. Plusieurs programmes de surveillance ont été mis en place de l'échelle nationale à l'échelle globale, et différentes sources de données sont désormais disponibles pour les chercheurs qui veulent modéliser la distribution des espèces. Toutefois, en raison du manque de moyens, les données sont souvent collectées à une résolution inappropriée, sont échantillonnées sur des zones limitées, ne sont pas spatialement explicites ou ne fournissent pas une information écologique suffisante. Un exemple typique est fourni par les données sur 'l'habitat' (sensu biota). Même s'il s'agit d'une information essentielle pour des mesures de conservation efficaces, elle est souvent approximée par l'utilisation du sol, l'information qui s'en approche le plus. En outre, les données ne sont souvent pas échantillonnées selon un plan d'échantillonnage établi, ce qui biaise les échantillons et par conséquent les résultats de la modélisation. Comprendre les sources de variabilité liées aux différentes phases du processus de modélisation s'avère crucial afin d'évaluer l'utilisation des cartes de distribution prédites à des fins de conservation.La recherche présentée dans cette thèse a été essentiellement menée dans le cadre du projet Landspot, un projet soutenu par le Fond National Suisse pour la Recherche. L'objectif principal de ce projet était d'évaluer la contribution d'unités 'd'habitat' pré-modélisées pour modéliser la répartition des espèces animales, notamment de papillons, à travers la Suisse. Tout en poursuivant cet objectif, différents aspects touchant à la qualité des données, au plan d'échantillonnage et au processus de modélisation sont abordés et améliorés, et leurs implications pour la conservation des espèces discutées. Les principaux 'habitats' considérés dans cette thèse sont des communautés de prairie et de forêt d'origine naturelle et anthropique telles que définies dans la typologie des habitats de Suisse. Ces communautés sont principalement définies au niveau phytosociologique de l'alliance. Pour l'instant aucune carte de la distribution de ces communautés n'est disponible à l'échelle nationale et à résolution fine. Dans un premier temps, il a donc été nécessaire de créer des modèles de distribution de ces communautés à travers la Suisse et par conséquent de recueillir les données nécessaires. Afin d'atteindre ce premier objectif, plusieurs nouveaux développements ont été nécessaires, tels que la définition de modèles experts, la classification du territoire suisse en domaines environnementaux, la conception d'un échantillonnage environnementalement stratifié des unités de végétation cibles dans toute la Suisse, la création d'une base de données intégrant un système d'aide à la décision pour la classification des relevés, et le « downscaling » des données de couverture du sol de 100 m à 25 m de résolution. Les principales contributions de cette thèse à la discipline de la modélisation de la distribution d'espèces (SDM) sont rassemblées dans quatre articles scientifiques. Dans le premier article, publié dans le Journal of Biogeography, différentes questions liées au processus de modélisation sont étudiées en utilisant les données de l'inventaire forestier de l'Etat de Vaud. Tout d'abord sont évalués les effets de cinq méthodes de sélection pas-à-pas sur la performance, la stabilité et la parcimonie des modèles. Dans le même article sont également évalués: l'effet de la pondération des absences afin d'assurer une prévalence de 0.5 lors de la calibration du modèle; l'effet de limiter les absences au-delà de l'enveloppe définie par les présences; quatre méthodes différentes pour l'intégration de l'autocorrélation spatiale; et enfin, l'effet de l'intégration d'interactions entre facteurs. Les résultats présentés dans cet article ont permis d'améliorer l'outil GRASP qui intègre désonnais de nouvelles méthodes de sélection et la possibilité de traiter les interactions entre variables explicatives, ainsi que l'autocorrélation spatiale. La contribution de différentes sources de données issues de la télédétection a également été évaluée. Le deuxième article (en voie de soumission) explore les effets combinés de la taille de l'échantillon et de la post-stratification sur le la précision des modèles. Les données utilisées ici sont celles concernant la répartition des prairies de Suisse recueillies dans le cadre du projet Landspot et complétées par d'autres sources. Pour la stratification des données, différents cadres spatiaux ont été comparés. En particulier, la stratification environnementale par les domaines environnementaux de Suisse a été comparée à la stratification géographique par les régions biogéographiques ou par les cantons. Le troisième article (en voie de soumission) évalue la contribution de communautés végétales pré-modélisées à la modélisation de la faune. C'est une approche en deux étapes qui combine les disciplines de l'écologie des communautés et de l'écologie spatiale en intégrant leurs concepts de 'habitat' respectifs. Les communautés végétales sont modélisées d'abord, puis ces unités de 'habitat' sont utilisées pour modéliser les espèces animales. Une étude de cas est présentée avec des communautés prairiales et des espèces de papillons. Différentes façons d'intégrer l'information sur la végétation dans les modèles de répartition des papillons sont évaluées. Enfin, un clin d'oeil aux changements climatiques dans le dernier article, publié dans Ecological Modelling. Cet article propose un cadre conceptuel pour l'analyse des changements dans la distribution des espèces qui comprend notamment un catalogue des différentes formes possibles de changement le long d'un gradient d'élévation ou autre gradient environnemental, et une méthode quantitative améliorée pour identifier et décrire ces déplacements. Cette méthodologie a été développée en utilisant des données issues du monitoring des oiseaux nicheurs répandus et l'article présente les résultats concernant les déplacements observés dans la distribution altitudinale des oiseaux nicheurs en Suisse.L'objectif général de cette thèse est d'améliorer les modèles de distribution des espèces en tant que source d'information possible pour les différents outils de conservation (par exemple, listes rouges, réseaux écologiques, évaluation des risques de propagation d'espèces envahissantes, évaluation de la vulnérabilité des espèces dans le contexte de changement climatique). Bien que ces questions de conservation ne soient pas directement testées dans cette thèse, l'importance des améliorations proposées pour la modélisation de la distribution des espèces est discutée à la fin de ce travail dans le contexte de la sélection de réseaux de réserves.
Resumo:
Background The 'database search problem', that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions. The method's graphical environment, along with its computational and probabilistic architectures, represents a rich package that offers analysts and discussants with additional modes of interaction, concise representation, and coherent communication.
Resumo:
In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to 'explain' an observed set of alleles. The other procedure is probabilistic using Bayes' theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.
Resumo:
Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.