134 resultados para Statistical Distributions.
Resumo:
This study presents a classification criteria for two-class Cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland, law enforcement authorities regularly ask laboratories to determine cannabis plant's chemotype from seized material in order to ascertain that the plantation is legal or not. In this study, the classification analysis is based on data obtained from the relative proportion of three major leaf compounds measured by gas-chromatography interfaced with mass spectrometry (GC-MS). The aim is to discriminate between drug type (illegal) and fiber type (legal) cannabis at an early stage of the growth. A Bayesian procedure is proposed: a Bayes factor is computed and classification is performed on the basis of the decision maker specifications (i.e. prior probability distributions on cannabis type and consequences of classification measured by losses). Classification rates are computed with two statistical models and results are compared. Sensitivity analysis is then performed to analyze the robustness of classification criteria.
Resumo:
Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence-environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence-environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building 'under fit' models, having insufficient flexibility to describe observed occurrence-environment relationships, we risk misunderstanding the factors shaping species distributions. By building 'over fit' models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.
Resumo:
This paper presents reflexions about statistical considerations on illicit drug profiling and more specifically about the calculation of threshold for determining of the seizure are linked or not. The specific case of heroin and cocaine profiling is presented with the necessary details on the target profiling variables (major alkaloids) selected and the analytical method used. Statistical approach to compare illicit drug seizures is also presented with the introduction of different scenarios dealing with different data pre-treatment or transformation of variables.The main aim consists to demonstrate the influence of data pre-treatment on the statistical outputs. A thorough study of the evolution of the true positive rate (TP) and the false positive rate (FP) in heroin and cocaine comparison is then proposed to investigate this specific topic and to demonstrate that there is no universal approach available and that the calculations have to be revaluate for each new specific application.
Resumo:
Aim We test for the congruence between allele-based range boundaries (break zones) in silicicolous alpine plants and species-based break zones in the silicicolous flora of the European Alps. We also ask whether such break zones coincide with areas of large elevational variation.Location The European Alps.Methods On a regular grid laid across the entire Alps, we determined areas of allele- and species-based break zones using respective clustering algorithms, identifying discontinuities in cluster distributions (breaks), and quantifying integrated break densities (break zones). Discontinuities were identified based on the intra-specific genetic variation of 12 species and on the floristic distribution data from 239 species, respectively. Coincidence between the two types of break zones was tested using Spearman's correlation. Break zone densities were also regressed on topographical complexity to test for the effect of elevational variation.Results We found that two main break zones in the distribution of alleles and species were significantly correlated. Furthermore, we show that these break zones are in topographically complex regions, characterized by massive elevational ranges owing to high mountains and deep glacial valleys. We detected a third break zone in the distribution of species in the eastern Alps, which is not correlated with topographic complexity, and which is also not evident from allelic distribution patterns. Species with the potential for long-distance dispersal tended to show larger distribution ranges than short-distance dispersers.Main conclusions We suggest that the history of Pleistocene glaciations is the main driver of the congruence between allele-based and species-based distribution patterns, because occurrences of both species and alleles were subject to the same processes (such as extinction, migration and drift) that shaped the distributions of species and genetic lineages. Large elevational ranges have had a profound effect as a dispersal barrier for alleles during post-glacial immigration. Because plant species, unlike alleles, cannot spread via pollen but only via seed, and thus disperse less effectively, we conclude that species break zones are maintained over longer time spans and reflect more ancient patterns than allele break zones.Conny Thiel-Egenter and Nadir Alvarez contributed equally to this paper and are considered joint first authors.
Resumo:
The aim of this study is to investigate the influence of unusual writing positions on a person's signature, in comparison to a standard writing position. Ten writers were asked to sign their signature six times, in each of four different writing positions, including the standard one. In order to take into consideration the effect of the day-to-day variation, this same process was repeated over 12 sessions, giving a total of 288 signatures per subject. The signatures were collected simultaneously in an off-line and on-line acquisition mode, using an interactive tablet and a ballpoint pen. Unidimensional variables (height to width ratio; time with or without in air displacement) and time-dependent variables (pressure; X and Y coordinates; altitude and azimuth angles) were extracted from each signature. For the unidimensional variables, the position effect was assessed through ANOVA and Dunnett contrast tests. Concerning the time-dependent variables, the signatures were compared by using dynamic time warping, and the position effect was evaluated through classification by linear discriminant analysis. Both of these variables provided similar results: no general tendency regarding the position factor could be highlighted. The influence of the position factor varies according to the subject as well as the variable studied. The impact of the session factor was shown to cover the impact that could be ascribed to the writing position factor. Indeed, the day-to-day variation has a greater effect than the position factor on the studied signature variables. The results of this study suggest guidelines for best practice in the area of signature comparisons and demonstrate the importance of a signature collection procedure covering an adequate number of sampling sessions, with a sufficient number of samples per session.
Resumo:
Abstract : In the subject of fingerprints, the rise of computers tools made it possible to create powerful automated search algorithms. These algorithms allow, inter alia, to compare a fingermark to a fingerprint database and therefore to establish a link between the mark and a known source. With the growth of the capacities of these systems and of data storage, as well as increasing collaboration between police services on the international level, the size of these databases increases. The current challenge for the field of fingerprint identification consists of the growth of these databases, which makes it possible to find impressions that are very similar but coming from distinct fingers. However and simultaneously, this data and these systems allow a description of the variability between different impressions from a same finger and between impressions from different fingers. This statistical description of the withinand between-finger variabilities computed on the basis of minutiae and their relative positions can then be utilized in a statistical approach to interpretation. The computation of a likelihood ratio, employing simultaneously the comparison between the mark and the print of the case, the within-variability of the suspects' finger and the between-variability of the mark with respect to a database, can then be based on representative data. Thus, these data allow an evaluation which may be more detailed than that obtained by the application of rules established long before the advent of these large databases or by the specialists experience. The goal of the present thesis is to evaluate likelihood ratios, computed based on the scores of an automated fingerprint identification system when the source of the tested and compared marks is known. These ratios must support the hypothesis which it is known to be true. Moreover, they should support this hypothesis more and more strongly with the addition of information in the form of additional minutiae. For the modeling of within- and between-variability, the necessary data were defined, and acquired for one finger of a first donor, and two fingers of a second donor. The database used for between-variability includes approximately 600000 inked prints. The minimal number of observations necessary for a robust estimation was determined for the two distributions used. Factors which influence these distributions were also analyzed: the number of minutiae included in the configuration and the configuration as such for both distributions, as well as the finger number and the general pattern for between-variability, and the orientation of the minutiae for within-variability. In the present study, the only factor for which no influence has been shown is the orientation of minutiae The results show that the likelihood ratios resulting from the use of the scores of an AFIS can be used for evaluation. Relatively low rates of likelihood ratios supporting the hypothesis known to be false have been obtained. The maximum rate of likelihood ratios supporting the hypothesis that the two impressions were left by the same finger when the impressions came from different fingers obtained is of 5.2 %, for a configuration of 6 minutiae. When a 7th then an 8th minutia are added, this rate lowers to 3.2 %, then to 0.8 %. In parallel, for these same configurations, the likelihood ratios obtained are on average of the order of 100,1000, and 10000 for 6,7 and 8 minutiae when the two impressions come from the same finger. These likelihood ratios can therefore be an important aid for decision making. Both positive evolutions linked to the addition of minutiae (a drop in the rates of likelihood ratios which can lead to an erroneous decision and an increase in the value of the likelihood ratio) were observed in a systematic way within the framework of the study. Approximations based on 3 scores for within-variability and on 10 scores for between-variability were found, and showed satisfactory results. Résumé : Dans le domaine des empreintes digitales, l'essor des outils informatisés a permis de créer de puissants algorithmes de recherche automatique. Ces algorithmes permettent, entre autres, de comparer une trace à une banque de données d'empreintes digitales de source connue. Ainsi, le lien entre la trace et l'une de ces sources peut être établi. Avec la croissance des capacités de ces systèmes, des potentiels de stockage de données, ainsi qu'avec une collaboration accrue au niveau international entre les services de police, la taille des banques de données augmente. Le défi actuel pour le domaine de l'identification par empreintes digitales consiste en la croissance de ces banques de données, qui peut permettre de trouver des impressions très similaires mais provenant de doigts distincts. Toutefois et simultanément, ces données et ces systèmes permettent une description des variabilités entre différentes appositions d'un même doigt, et entre les appositions de différents doigts, basées sur des larges quantités de données. Cette description statistique de l'intra- et de l'intervariabilité calculée à partir des minuties et de leurs positions relatives va s'insérer dans une approche d'interprétation probabiliste. Le calcul d'un rapport de vraisemblance, qui fait intervenir simultanément la comparaison entre la trace et l'empreinte du cas, ainsi que l'intravariabilité du doigt du suspect et l'intervariabilité de la trace par rapport à une banque de données, peut alors se baser sur des jeux de données représentatifs. Ainsi, ces données permettent d'aboutir à une évaluation beaucoup plus fine que celle obtenue par l'application de règles établies bien avant l'avènement de ces grandes banques ou par la seule expérience du spécialiste. L'objectif de la présente thèse est d'évaluer des rapports de vraisemblance calcul és à partir des scores d'un système automatique lorsqu'on connaît la source des traces testées et comparées. Ces rapports doivent soutenir l'hypothèse dont il est connu qu'elle est vraie. De plus, ils devraient soutenir de plus en plus fortement cette hypothèse avec l'ajout d'information sous la forme de minuties additionnelles. Pour la modélisation de l'intra- et l'intervariabilité, les données nécessaires ont été définies, et acquises pour un doigt d'un premier donneur, et deux doigts d'un second donneur. La banque de données utilisée pour l'intervariabilité inclut environ 600000 empreintes encrées. Le nombre minimal d'observations nécessaire pour une estimation robuste a été déterminé pour les deux distributions utilisées. Des facteurs qui influencent ces distributions ont, par la suite, été analysés: le nombre de minuties inclus dans la configuration et la configuration en tant que telle pour les deux distributions, ainsi que le numéro du doigt et le dessin général pour l'intervariabilité, et la orientation des minuties pour l'intravariabilité. Parmi tous ces facteurs, l'orientation des minuties est le seul dont une influence n'a pas été démontrée dans la présente étude. Les résultats montrent que les rapports de vraisemblance issus de l'utilisation des scores de l'AFIS peuvent être utilisés à des fins évaluatifs. Des taux de rapports de vraisemblance relativement bas soutiennent l'hypothèse que l'on sait fausse. Le taux maximal de rapports de vraisemblance soutenant l'hypothèse que les deux impressions aient été laissées par le même doigt alors qu'en réalité les impressions viennent de doigts différents obtenu est de 5.2%, pour une configuration de 6 minuties. Lorsqu'une 7ème puis une 8ème minutie sont ajoutées, ce taux baisse d'abord à 3.2%, puis à 0.8%. Parallèlement, pour ces mêmes configurations, les rapports de vraisemblance sont en moyenne de l'ordre de 100, 1000, et 10000 pour 6, 7 et 8 minuties lorsque les deux impressions proviennent du même doigt. Ces rapports de vraisemblance peuvent donc apporter un soutien important à la prise de décision. Les deux évolutions positives liées à l'ajout de minuties (baisse des taux qui peuvent amener à une décision erronée et augmentation de la valeur du rapport de vraisemblance) ont été observées de façon systématique dans le cadre de l'étude. Des approximations basées sur 3 scores pour l'intravariabilité et sur 10 scores pour l'intervariabilité ont été trouvées, et ont montré des résultats satisfaisants.
Resumo:
Aim To explore the respective power of climate and topography to predict the distribution of reptiles in Switzerland, hence at a mesoscale level. A more detailed knowledge of these relationships, in combination with maps of the potential distribution derived from the models, is a valuable contribution to the design of conservation strategies. Location All of Switzerland. Methods Generalized linear models are used to derive predictive habitat distribution models from eco-geographical predictors in a geographical information system, using species data from a field survey conducted between 1980 and 1999. Results The maximum amount of deviance explained by climatic models is 65%, and 50% by topographical models. Low values were obtained with both sets of predictors for three species that are widely distributed in all parts of the country (Anguis fragilis , Coronella austriaca , and Natrix natrix), a result that suggests that including other important predictors, such as resources, should improve the models in further studies. With respect to topographical predictors, low values were also obtained for two species where we anticipated a strong response to aspect and slope, Podarcis muralis and Vipera aspis . Main conclusions Overall, both models and maps derived from climatic predictors more closely match the actual reptile distributions than those based on topography. These results suggest that the distributional limits of reptile species with a restricted range in Switzerland are largely set by climatic, predominantly temperature-related, factors.
Resumo:
Protected areas are valuable in conserving tropical biodiversity, but an insufficient understanding of species diversity and distributions makes it difficult to evaluate their effectiveness. This is especially true on Borneo, a species rich island shared by three countries, and is particularly concerning for bats, a poorly known component of mammal diversity that may be highly susceptible to landscape changes. We reviewed the diversity, distributions and conservation status of 54 bat species to determine the representation of these taxa in Borneo's protected areas, and whether these reserves complement each other in terms of bat diversity. Lower and upper bound estimates of bat species composition were characterised in 23 protected areas and the proposed boundaries of the Heart of Borneo conservation area. We used lower and upper bound estimates of species composition. By using actual inventories, species representation was highly irregular, and even if some reserves were included in the Heart of Borneo, the protected area network would still exhibit low complementarity. By inferring species presence from distributions, composition between most reserves was similar, and complementarity was much higher. Predicting species richness using abundance information suggested that bat species representation in reserves may lie between these two extremes. We recommend that researchers better sample biodiversity over the island and address the conservation threats faced in Borneo both within and outside protected areas. While the Heart of Borneo Initiative is commendable, it should not divert attention from other conservation areas.
Resumo:
We extend PML theory to account for information on the conditional moments up to order four, but without assuming a parametric model, to avoid a risk of misspecification of the conditional distribution. The key statistical tool is the quartic exponential family, which allows us to generalize the PML2 and QGPML1 methods proposed in Gourieroux et al. (1984) to PML4 and QGPML2 methods, respectively. An asymptotic theory is developed. The key numerical tool that we use is the Gauss-Freud integration scheme that solves a computational problem that has previously been raised in several fields. Simulation exercises demonstrate the feasibility and robustness of the methods [Authors]
Resumo:
In occupational exposure assessment of airborne contaminants, exposure levels can either be estimated through repeated measurements of the pollutant concentration in air, expert judgment or through exposure models that use information on the conditions of exposure as input. In this report, we propose an empirical hierarchical Bayesian model to unify these approaches. Prior to any measurement, the hygienist conducts an assessment to generate prior distributions of exposure determinants. Monte-Carlo samples from these distributions feed two level-2 models: a physical, two-compartment model, and a non-parametric, neural network model trained with existing exposure data. The outputs of these two models are weighted according to the expert's assessment of their relevance to yield predictive distributions of the long-term geometric mean and geometric standard deviation of the worker's exposure profile (level-1 model). Bayesian inferences are then drawn iteratively from subsequent measurements of worker exposure. Any traditional decision strategy based on a comparison with occupational exposure limits (e.g. mean exposure, exceedance strategies) can then be applied. Data on 82 workers exposed to 18 contaminants in 14 companies were used to validate the model with cross-validation techniques. A user-friendly program running the model is available upon request.
Resumo:
Distribution of socio-economic features in urban space is an important source of information for land and transportation planning. The metropolization phenomenon has changed the distribution of types of professions in space and has given birth to different spatial patterns that the urban planner must know in order to plan a sustainable city. Such distributions can be discovered by statistical and learning algorithms through different methods. In this paper, an unsupervised classification method and a cluster detection method are discussed and applied to analyze the socio-economic structure of Switzerland. The unsupervised classification method, based on Ward's classification and self-organized maps, is used to classify the municipalities of the country and allows to reduce a highly-dimensional input information to interpret the socio-economic landscape. The cluster detection method, the spatial scan statistics, is used in a more specific manner in order to detect hot spots of certain types of service activities. The method is applied to the distribution services in the agglomeration of Lausanne. Results show the emergence of new centralities and can be analyzed in both transportation and social terms.
Resumo:
Boundaries for delta, representing a "quantitatively significant" or "substantively impressive" distinction, have not been established, analogous to the boundary of alpha, usually set at 0.05, for the stochastic or probabilistic component of "statistical significance". To determine what boundaries are being used for the "quantitative" decisions, we reviewed pertinent articles in three general medical journals. For each contrast of two means, contrast of two rates, or correlation coefficient, we noted the investigators' decisions about stochastic significance, stated in P values or confidence intervals, and about quantitative significance, indicated by interpretive comments. The boundaries between impressive and unimpressive distinctions were best formed by a ratio of greater than or equal to 1.2 for the smaller to the larger mean in 546 comparisons, by a standardized increment of greater than or equal to 0.28 and odds ratio of greater than or equal to 2.2 in 392 comparisons of two rates; and by an r value of greater than or equal to 0.32 in 154 correlation coefficients. Additional boundaries were also identified for "substantially" and "highly" significant quantitative distinctions. Although the proposed boundaries should be kept flexible, indexes and boundaries for decisions about "quantitative significance" are particularly useful when a value of delta must be chosen for calculating sample size before the research is done, and when the "statistical significance" of completed research is appraised for its quantitative as well as stochastic components.
Resumo:
Laser desorption ionisation mass spectrometry (LDI-MS) has demonstrated to be an excellent analytical method for the forensic analysis of inks on a questioned document. The ink can be analysed directly on its substrate (paper) and hence offers a fast method of analysis as sample preparation is kept to a minimum and more importantly, damage to the document is minimised. LDI-MS has also previously been reported to provide a high power of discrimination in the statistical comparison of ink samples and has the potential to be introduced as part of routine ink analysis. This paper looks into the methodology further and evaluates statistically the reproducibility and the influence of paper on black gel pen ink LDI-MS spectra; by comparing spectra of three different black gel pen inks on three different paper substrates. Although generally minimal, the influences of sample homogeneity and paper type were found to be sample dependent. This should be taken into account to avoid the risk of false differentiation of black gel pen ink samples. Other statistical approaches such as principal component analysis (PCA) proved to be a good alternative to correlation coefficients for the comparison of whole mass spectra.