998 resultados para 230204 Applied Statistics


Relevância:

100.00% 100.00%

Publicador:

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This dissertation examines knowledge and industrial knowledge creation processes. It looks at the way knowledge is created in industrial processes based on data, which is transformed into information and finally into knowledge. In the context of this dissertation the main tool for industrial knowledge creation are different statistical methods. This dissertation strives to define industrial statistics. This is done using an expert opinion survey, which was sent to a number of industrial statisticians. The survey was conducted to create a definition for this field of applied statistics and to demonstrate the wide applicability of statistical methods to industrial problems. In this part of the dissertation, traditional methods of industrial statistics are introduced. As industrial statistics are the main tool for knowledge creation, the basics of statistical decision making and statistical modeling are also included. The widely known Data Information Knowledge Wisdom (DIKW) hierarchy serves as a theoretical background for this dissertation. The way that data is transformed into information, information into knowledge and knowledge finally into wisdom is used as a theoretical frame of reference. Some scholars have, however, criticized the DIKW model. Based on these different perceptions of the knowledge creation process, a new knowledge creation process, based on statistical methods is proposed. In the context of this dissertation, the data is a source of knowledge in industrial processes. Because of this, the mathematical categorization of data into continuous and discrete types is explained. Different methods for gathering data from processes are clarified as well. There are two methods for data gathering in this dissertation: survey methods and measurements. The enclosed publications provide an example of the wide applicability of statistical methods in industry. In these publications data is gathered using surveys and measurements. Enclosed publications have been chosen so that in each publication, different statistical methods are employed in analyzing of data. There are some similarities between the analysis methods used in the publications, but mainly different methods are used. Based on this dissertation the use of statistical methods for industrial knowledge creation is strongly recommended. With statistical methods it is possible to handle large datasets and different types of statistical analysis results can easily be transformed into knowledge.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over-or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we compare three residuals to assess departures from the error assumptions as well as to detect outlying observations in log-Burr XII regression models with censored observations. These residuals can also be used for the log-logistic regression model, which is a special case of the log-Burr XII regression model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to the modified martingale-type residual in log-Burr XII regression models with censored data.