Biblioteca Digital

995 resultados para Statistical Concepts

Implementing statistical learning methods through Bayesian networks. Part 1: a guide to Bayesian parameter estimation using forensic science data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As a thorough aggregation of probability and graph theory, Bayesian networks currently enjoy widespread interest as a means for studying factors that affect the coherent evaluation of scientific evidence in forensic science. Paper I of this series of papers intends to contribute to the discussion of Bayesian networks as a framework that is helpful for both illustrating and implementing statistical procedures that are commonly employed for the study of uncertainties (e.g. the estimation of unknown quantities). While the respective statistical procedures are widely described in literature, the primary aim of this paper is to offer an essentially non-technical introduction on how interested readers may use these analytical approaches - with the help of Bayesian networks - for processing their own forensic science data. Attention is mainly drawn to the structure and underlying rationale of a series of basic and context-independent network fragments that users may incorporate as building blocs while constructing larger inference models. As an example of how this may be done, the proposed concepts will be used in a second paper (Part II) for specifying graphical probability networks whose purpose is to assist forensic scientists in the evaluation of scientific evidence encountered in the context of forensic document examination (i.e. results of the analysis of black toners present on printed or copied documents).

Assessing inter-rater reliability when the raters are fixed: two concepts and two estimates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intraclass correlation (ICC) is an established tool to assess inter-rater reliability. In a seminal paper published in 1979, Shrout and Fleiss considered three statistical models for inter-rater reliability data with a balanced design. In their first two models, an infinite population of raters was considered, whereas in their third model, the raters in the sample were considered to be the whole population of raters. In the present paper, we show that the two distinct estimates of ICC developed for the first two models can both be applied to the third model and we discuss their different interpretations in this context.

Statistical hypothesis testing and common misinterpretations: should we abandon p-value in forensic science applications?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many people regard the concept of hypothesis testing as fundamental to inferential statistics. Various schools of thought, in particular frequentist and Bayesian, have promoted radically different solutions for taking a decision about the plausibility of competing hypotheses. Comprehensive philosophical comparisons about their advantages and drawbacks are widely available and continue to span over large debates in the literature. More recently, controversial discussion was initiated by an editorial decision of a scientific journal [1] to refuse any paper submitted for publication containing null hypothesis testing procedures. Since the large majority of papers published in forensic journals propose the evaluation of statistical evidence based on the so called p-values, it is of interest to expose the discussion of this journal's decision within the forensic science community. This paper aims to provide forensic science researchers with a primer on the main concepts and their implications for making informed methodological choices.

‘Reliability Concepts in Quantile-based Analysis of Lifetime Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reliability analysis is a well established branch of statistics that deals with the statistical study of different aspects of lifetimes of a system of components. As we pointed out earlier that major part of the theory and applications in connection with reliability analysis were discussed based on the measures in terms of distribution function. In the beginning chapters of the thesis, we have described some attractive features of quantile functions and the relevance of its use in reliability analysis. Motivated by the works of Parzen (1979), Freimer et al. (1988) and Gilchrist (2000), who indicated the scope of quantile functions in reliability analysis and as a follow up of the systematic study in this connection by Nair and Sankaran (2009), in the present work we tried to extend their ideas to develop necessary theoretical framework for lifetime data analysis. In Chapter 1, we have given the relevance and scope of the study and a brief outline of the work we have carried out. Chapter 2 of this thesis is devoted to the presentation of various concepts and their brief reviews, which were useful for the discussions in the subsequent chapters .In the introduction of Chapter 4, we have pointed out the role of ageing concepts in reliability analysis and in identifying life distributions .In Chapter 6, we have studied the first two L-moments of residual life and their relevance in various applications of reliability analysis. We have shown that the first L-moment of residual function is equivalent to the vitality function, which have been widely discussed in the literature .In Chapter 7, we have defined percentile residual life in reversed time (RPRL) and derived its relationship with reversed hazard rate (RHR). We have discussed the characterization problem of RPRL and demonstrated with an example that the RPRL for given does not determine the distribution uniquely

Sampling concepts

Relevância:

30.00% 30.00%

Publicador:

Concepts underlying experiments

Relevância:

30.00% 30.00%

Publicador:

Social cohesion in Latin America: concepts, frames of reference and Indicators

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Includes bibliography

Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyses of ecological data should account for the uncertainty in the process(es) that generated the data. However, accounting for these uncertainties is a difficult task, since ecology is known for its complexity. Measurement and/or process errors are often the only sources of uncertainty modeled when addressing complex ecological problems, yet analyses should also account for uncertainty in sampling design, in model specification, in parameters governing the specified model, and in initial and boundary conditions. Only then can we be confident in the scientific inferences and forecasts made from an analysis. Probability and statistics provide a framework that accounts for multiple sources of uncertainty. Given the complexities of ecological studies, the hierarchical statistical model is an invaluable tool. This approach is not new in ecology, and there are many examples (both Bayesian and non-Bayesian) in the literature illustrating the benefits of this approach. In this article, we provide a baseline for concepts, notation, and methods, from which discussion on hierarchical statistical modeling in ecology can proceed. We have also planted some seeds for discussion and tried to show where the practical difficulties lie. Our thesis is that hierarchical statistical modeling is a powerful way of approaching ecological analysis in the presence of inevitable but quantifiable uncertainties, even if practical issues sometimes require pragmatic compromises.

Probing the statistical properties of unknown texts: application to the Voynich manuscript

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.

A chrysophyte stomatocyst-based reconstruction of cold-season air temperature from Alpine Lake Silvaplana (AD 1500–2003); methods and concepts for quantitative inferences

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Relatively little is known about past cold-season temperature variability in high-Alpine regions because of a lack of natural cold-season temperature proxies as well as under-representation of high-altitude sites in meteorological, early-instrumental and documentary data sources. Recent studies have shown that chrysophyte stomatocysts, or simply cysts (sub-fossil algal remains of Chrysophyceae and Synurophyceae), are among the very few natural proxies that can be used to reconstruct cold-season temperatures. This study presents a quantitative, high-resolution (5-year), cold-season (Oct–May) temperature reconstruction based on sub-fossil chrysophyte stomatocysts in the annually laminated (varved) sediments of high-Alpine Lake Silvaplana, SE Switzerland (1,789 m a.s.l.), since AD 1500. We first explore the method used to translate an ecologically meaningful variable based on a biological proxy into a simple climate variable. A transfer function was applied to reconstruct the ‘date of spring mixing’ from cyst assemblages. Next, statistical regression models were tested to convert the reconstructed ‘dates of spring mixing’ into cold-season surface air temperatures with associated errors. The strengths and weaknesses of this approach are thoroughly tested. One much-debated, basic assumption for reconstructions (‘stationarity’), which states that only the environmental variable of interest has influenced cyst assemblages and the influence of confounding variables is negligible over time, is addressed in detail. Our inferences show that past cold-season air-temperature fluctuations were substantial and larger than those of other temperature reconstructions for Europe and the Alpine region. Interestingly, in this study, recent cold-season temperatures only just exceed those of previous, multi-decadal warm phases since AD 1500. These findings highlight the importance of local studies to assess natural climate variability at high altitudes.

Comparison of machine learning approaches and statistical methods for estimation of airborne chemical exposures

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^

Making genetic biodiversity measurable : a review of statistical multivariate methods to study variability at gene level

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Measures of agro-ecosystems genetic variability are essential to sustain scientific-based actions and policies tending to protect the ecosystem services they provide. To build the genetic variability datum it is necessary to deal with a large number and different types of variables. Molecular marker data is highly dimensional by nature, and frequently additional types of information are obtained, as morphological and physiological traits. This way, genetic variability studies are usually associated with the measurement of several traits on each entity. Multivariate methods are aimed at finding proximities between entities characterized by multiple traits by summarizing information in few synthetic variables. In this work we discuss and illustrate several multivariate methods used for different purposes to build the datum of genetic variability. We include methods applied in studies for exploring the spatial structure of genetic variability and the association of genetic data to other sources of information. Multivariate techniques allow the pursuit of the genetic variability datum, as a unifying notion that merges concepts of type, abundance and distribution of variability at gene level.

Combining statistical and semantic approaches to the translation of ontologies and taxonomies

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ontologies and taxonomies are widely used to organize concepts providing the basis for activities such as indexing, and as background knowledge for NLP tasks. As such, translation of these resources would prove useful to adapt these systems to new languages. However, we show that the nature of these resources is significantly different from the "free-text" paradigm used to train most statistical machine translation systems. In particular, we see significant differences in the linguistic nature of these resources and such resources have rich additional semantics. We demonstrate that as a result of these linguistic differences, standard SMT methods, in particular evaluation metrics, can produce poor performance. We then look to the task of leveraging these semantics for translation, which we approach in three ways: by adapting the translation system to the domain of the resource; by examining if semantics can help to predict the syntactic structure used in translation; and by evaluating if we can use existing translated taxonomies to disambiguate translations. We present some early results from these experiments, which shed light on the degree of success we may have with each approach

The empirical basis of substance use disorders diagnosis: research recommendations for the Diagnostic and Statistical Manual of Mental Disorders, fifth edition (DSM-V)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims This paper presents the recommendations, developed from a 3-year consultation process, for a program of research to underpin the development of diagnostic concepts and criteria in the Substance Use Disorders section of the Diagnostic and Statistical Manual of Mental Disorders (DSM) and potentially the relevant section of the next revision of the International Classification of Diseases (ICD). Methods A preliminary list of research topics was developed at the DSM-V Launch Conference in 2004. This led to the presentation of articles on these topics at a specific Substance Use Disorders Conference in February 2005, at the end of which a preliminary list of research questions was developed. This was further refined through an iterative process involving conference participants over the following year. Results Research questions have been placed into four categories: (1) questions that could be addressed immediately through secondary analyses of existing data sets; (2) items likely to require position papers to propose criteria or more focused questions with a view to subsequent analyses of existing data sets; (3) issues that could be proposed for literature reviews, but with a lower probability that these might progress to a data analytic phase; and (4) suggestions or comments that might not require immediate action, but that could be considered by the DSM-V and ICD 11 revision committees as part of their deliberations. Conclusions A broadly based research agenda for the development of diagnostic concepts and criteria for substance use disorders is presented.

The development of a research agenda for substance use disorders diagnosis in the Diagnostic and Statistical Manual of Mental Disorders, fifth edition (DSM-V)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims paper describes the background to the establishment of the Substance Use Disorders Workgroup, which was charged with developing the research agenda for the development of the next edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM). It summarizes 18 articles that were commissioned to inform that process. Methods A preliminary list of research topics, developed at the DSM-V Launch Conference in 2004, led to the identification of subjects that were subject to formal presentations and detailed discussion at the Substance Use Disorders Conference in February 2005. Results The 18 articles presented in this supplement examine: (1) categorical versus dimensional diagnoses; (2) the neurobiological basis of substance use disorders; (3) social and cultural perspectives; (4) the crosswalk between DSM-IV and the International Classification of Diseases Tenth Revision (ICD-10); (5) comorbidity of substance use disorders and mental health disorders; (6) subtypes of disorders; (7) issues in adolescence; (8) substance-specific criteria; (9) the place of non-substance addictive disorders; and (10) the available research resources. Conclusions In the final paper a broadly based research agenda for the development of diagnostic concepts and criteria for substance use disorders is presented.

«
1
2
3
4
5
6
7
8
...
66
67
»