993 resultados para Statistical distributions


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Nuclear morphometry (NM) uses image analysis to measure features of the cell nucleus which are classified as: bulk properties, shape or form, and DNA distribution. Studies have used these measurements as diagnostic and prognostic indicators of disease with inconclusive results. The distributional properties of these variables have not been systematically investigated although much of the medical data exhibit nonnormal distributions. Measurements are done on several hundred cells per patient so summary measurements reflecting the underlying distribution are needed.^ Distributional characteristics of 34 NM variables from prostate cancer cells were investigated using graphical and analytical techniques. Cells per sample ranged from 52 to 458. A small sample of patients with benign prostatic hyperplasia (BPH), representing non-cancer cells, was used for general comparison with the cancer cells.^ Data transformations such as log, square root and 1/x did not yield normality as measured by the Shapiro-Wilks test for normality. A modulus transformation, used for distributions having abnormal kurtosis values, also did not produce normality.^ Kernel density histograms of the 34 variables exhibited non-normality and 18 variables also exhibited bimodality. A bimodality coefficient was calculated and 3 variables: DNA concentration, shape and elongation, showed the strongest evidence of bimodality and were studied further.^ Two analytical approaches were used to obtain a summary measure for each variable for each patient: cluster analysis to determine significant clusters and a mixture model analysis using a two component model having a Gaussian distribution with equal variances. The mixture component parameters were used to bootstrap the log likelihood ratio to determine the significant number of components, 1 or 2. These summary measures were used as predictors of disease severity in several proportional odds logistic regression models. The disease severity scale had 5 levels and was constructed of 3 components: extracapsulary penetration (ECP), lymph node involvement (LN+) and seminal vesicle involvement (SV+) which represent surrogate measures of prognosis. The summary measures were not strong predictors of disease severity. There was some indication from the mixture model results that there were changes in mean levels and proportions of the components in the lower severity levels. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The distribution of optimal local alignment scores of random sequences plays a vital role in evaluating the statistical significance of sequence alignments. These scores can be well described by an extreme-value distribution. The distribution’s parameters depend upon the scoring system employed and the random letter frequencies; in general they cannot be derived analytically, but must be estimated by curve fitting. For obtaining accurate parameter estimates, a form of the recently described ‘island’ method has several advantages. We describe this method in detail, and use it to investigate the functional dependence of these parameters on finite-length edge effects.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Research has suggested that understanding in well-structured settings often does not transfer to the everyday, less-structured problems encountered outside of school. Little is known, beyond anecdotal evidence, about how teachers' consideration of distributions as evidence in well-structured settings compares with their use in ill-structured problem contexts. A qualitative study of preservice secondary teachers examined their use of distributions as evidence in four tasks of varying complexity and ill-structuredness. Results suggest that teachers' incorporation of distributions in well-structured settings does not imply that they will be incorporated in less structured problems (and vice-versa). Implications for research and teaching are discussed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation develops a new figure of merit to measure the similarity (or dissimilarity) of Gaussian distributions through a novel concept that relates the Fisher distance to the percentage of data overlap. The derivations are expanded to provide a generalized mathematical platform for determining an optimal separating boundary of Gaussian distributions in multiple dimensions. Real-world data used for implementation and in carrying out feasibility studies were provided by Beckman-Coulter. It is noted that although the data used is flow cytometric in nature, the mathematics are general in their derivation to include other types of data as long as their statistical behavior approximate Gaussian distributions. ^ Because this new figure of merit is heavily based on the statistical nature of the data, a new filtering technique is introduced to accommodate for the accumulation process involved with histogram data. When data is accumulated into a frequency histogram, the data is inherently smoothed in a linear fashion, since an averaging effect is taking place as the histogram is generated. This new filtering scheme addresses data that is accumulated in the uneven resolution of the channels of the frequency histogram. ^ The qualitative interpretation of flow cytometric data is currently a time consuming and imprecise method for evaluating histogram data. This method offers a broader spectrum of capabilities in the analysis of histograms, since the figure of merit derived in this dissertation integrates within its mathematics both a measure of similarity and the percentage of overlap between the distributions under analysis. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article considers alternative methods to calculate the fair premium rate of crop insurance contracts based on county yields. The premium rate was calculated using parametric and nonparametric approaches to estimate the conditional agricultural yield density. These methods were applied to a data set of county yield provided by the Statistical and Geography Brazilian Institute (IBGE), for the period of 1990 through 2002, for soybean, corn and wheat, in the State of Paran. In this article, we propose methodological alternatives to pricing crop insurance contracts resulting in more accurate premium rates in a situation of limited data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present paper proposes an approach to obtaining the activation energy distribution for chemisorption of oxygen onto carbon surfaces, while simultaneously allowing for the activation energy dependence of the pre-exponential factor of the rate constant. Prior studies in this area have considered this factor to be uniform, thereby biasing estimated distributions. The results show that the derived activation energy distribution is not sensitive to the chemisorption mechanism because of the step function like property of the coverage. The activation energy distribution is essentially uniform for some carbons, and has two or possibly more discrete stages, suggestive of at least two types of sites, each with its own uniform distribution. The pre-exponential factors of the reactions are determined directly from the experimental data, and are found not to be constant as assumed in earlier work, but correlated with the activation energy. The latter results empirically follow an exponential function, supporting some earlier statistical and experimental work. The activation energy distribution obtained in the present paper permits improved correlation of chemisorption data in comparison to earlier studies. (C) 2000 Elsevier Science Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we study the possible microscopic origin of heavy-tailed probability density distributions for the price variation of financial instruments. We extend the standard log-normal process to include another random component in the so-called stochastic volatility models. We study these models under an assumption, akin to the Born-Oppenheimer approximation, in which the volatility has already relaxed to its equilibrium distribution and acts as a background to the evolution of the price process. In this approximation, we show that all models of stochastic volatility should exhibit a scaling relation in the time lag of zero-drift modified log-returns. We verify that the Dow-Jones Industrial Average index indeed follows this scaling. We then focus on two popular stochastic volatility models, the Heston and Hull-White models. In particular, we show that in the Hull-White model the resulting probability distribution of log-returns in this approximation corresponds to the Tsallis (t-Student) distribution. The Tsallis parameters are given in terms of the microscopic stochastic volatility model. Finally, we show that the log-returns for 30 years Dow Jones index data is well fitted by a Tsallis distribution, obtaining the relevant parameters. (c) 2007 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When examining a rock mass, joint sets and their orientations can play a significant role with regard to how the rock mass will behave. To identify joint sets present in the rock mass, the orientation of individual fracture planer can be measured on exposed rock faces and the resulting data can be examined for heterogeneity. In this article, the expectation-maximization algorithm is used to lit mixtures of Kent component distributions to the fracture data to aid in the identification of joint sets. An additional uniform component is also included in the model to accommodate the noise present in the data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Resonance phenomena associated with the unimolecular dissociation of H2S --> SH + H have been investigated quantum mechanically by the Lanczos homogeneous filter diagonalization method using a newly developed potential energy surface (J. Chem. Phys. 2001, 114, 320). Resonance energies, widths (rates), and product state distributions have been obtained. Both dissociation rates and product state distributions of SH show, strong fluctuations, indicating that the dissociation of H2S is essentially irregular. Statistical analysis of neighboring level spacing and width distributions also confirms this behavior. The dissociation rates and product state distributions are compared to the predictions of quantum phase space theory.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intensity Modulated Radiotherapy (IMRT) is a technique introduced to shape more precisely the dose distributions to the tumour, providing a higher dose escalation in the volume to irradiate and simultaneously decreasing the dose in the organs at risk which consequently reduces the treatment toxicity. This technique is widely used in prostate and head and neck (H&N) tumours. Given the complexity and the use of high doses in this technique it’s necessary to ensure as a safe and secure administration of the treatment, through the use of quality control programmes for IMRT. The purpose of this study was to evaluate statistically the quality control measurements that are made for the IMRT plans in prostate and H&N patients, before the beginning of the treatment, analysing their variations, the percentage of rejected and repeated measurements, the average, standard deviations and the proportion relations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern real-time systems, with a more flexible and adaptive nature, demand approaches for timeliness evaluation based on probabilistic measures of meeting deadlines. In this context, simulation can emerge as an adequate solution to understand and analyze the timing behaviour of actual systems. However, care must be taken with the obtained outputs under the penalty of obtaining results with lack of credibility. Particularly important is to consider that we are more interested in values from the tail of a probability distribution (near worst-case probabilities), instead of deriving confidence on mean values. We approach this subject by considering the random nature of simulation output data. We will start by discussing well known approaches for estimating distributions out of simulation output, and the confidence which can be applied to its mean values. This is the basis for a discussion on the applicability of such approaches to derive confidence on the tail of distributions, where the worst-case is expected to be.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Power laws, also known as Pareto-like laws or Zipf-like laws, are commonly used to explain a variety of real world distinct phenomena, often described merely by the produced signals. In this paper, we study twelve cases, namely worldwide technological accidents, the annual revenue of America׳s largest private companies, the number of inhabitants in America׳s largest cities, the magnitude of earthquakes with minimum moment magnitude equal to 4, the total burned area in forest fires occurred in Portugal, the net worth of the richer people in America, the frequency of occurrence of words in the novel Ulysses, by James Joyce, the total number of deaths in worldwide terrorist attacks, the number of linking root domains of the top internet domains, the number of linking root domains of the top internet pages, the total number of human victims of tornadoes occurred in the U.S., and the number of inhabitants in the 60 most populated countries. The results demonstrate the emergence of statistical characteristics, very close to a power law behavior. Furthermore, the parametric characterization reveals complex relationships present at higher level of description.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Presented at 23rd International Conference on Real-Time Networks and Systems (RTNS 2015). 4 to 6, Nov, 2015, Main Track. Lille, France.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper develops a methodology to estimate the entire population distributions from bin-aggregated sample data. We do this through the estimation of the parameters of mixtures of distributions that allow for maximal parametric flexibility. The statistical approach we develop enables comparisons of the full distributions of height data from potential army conscripts across France's 88 departments for most of the nineteenth century. These comparisons are made by testing for differences-of-means stochastic dominance. Corrections for possible measurement errors are also devised by taking advantage of the richness of the data sets. Our methodology is of interest to researchers working on historical as well as contemporary bin-aggregated or histogram-type data, something that is still widely done since much of the information that is publicly available is in that form, often due to restrictions due to political sensitivity and/or confidentiality concerns.