993 resultados para Statistical distributions
Resumo:
Density functional theory for adsorption in carbons is adapted here to incorporate a random distribution of pore wall thickness in the solid, and it is shown that the mean pore wall thickness is intimately related to the pore size distribution characteristics. For typical carbons the pore walls are estimated to comprise only about two graphene layers, and application of the modified density functional theory approach shows that the commonly used assumption of infinitely thick walls can severely affect the results for adsorption in small pores under both supercritical and subcritical conditions. Under supercritical conditions the Henry's law coefficient is overpredicted by as much as a factor of 2, while under subcritical conditions pore wall heterogeneity appears to modify transitions in small pores into a sequence of smaller ones corresponding to pores with different wall thicknesses. The results suggest the need to improve current pore size distrubution analysis methods to allow for pore wall heterogeneity. The density functional theory is further extended here to allow for interpore adsorbate interactions, and it appears that these interaction are negligible for small molecules such as nitrogen but significant for more strongly interacting heavier molecules such as butane, for which the traditional independent pore model may not be adequate.
Resumo:
For Markov processes on the positive integers with the origin as an absorbing state, Ferrari, Kesten, Martinez and Picco studied the existence of quasi-stationary and limiting conditional distributions by characterizing quasi-stationary distributions as fixed points of a transformation Phi on the space of probability distributions on {1, 2,.. }. In the case of a birth-death process, the components of Phi(nu) can be written down explicitly for any given distribution nu. Using this explicit representation, we will show that Phi preserves likelihood ratio ordering between distributions. A conjecture of Kryscio and Lefevre concerning the quasi-stationary distribution of the SIS logistic epidemic follows as a corollary.
Resumo:
This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.
Resumo:
The effect of number of samples and selection of data for analysis on the calculation of surface motor unit potential (SMUP) size in the statistical method of motor unit number estimates (MUNE) was determined in 10 normal subjects and 10 with amyotrophic lateral sclerosis (ALS). We recorded 500 sequential compound muscle action potentials (CMAPs) at three different stable stimulus intensities (10–50% of maximal CMAP). Estimated mean SMUP sizes were calculated using Poisson statistical assumptions from the variance of 500 sequential CMAP obtained at each stimulus intensity. The results with the 500 data points were compared with smaller subsets from the same data set. The results using a range of 50–80% of the 500 data points were compared with the full 500. The effect of restricting analysis to data between 5–20% of the CMAP and to standard deviation limits was also assessed. No differences in mean SMUP size were found with stimulus intensity or use of different ranges of data. Consistency was improved with a greater sample number. Data within 5% of CMAP size gave both increased consistency and reduced mean SMUP size in many subjects, but excluded valid responses present at that stimulus intensity. These changes were more prominent in ALS patients in whom the presence of isolated SMUP responses was a striking difference from normal subjects. Noise, spurious data, and large SMUP limited the Poisson assumptions. When these factors are considered, consistent statistical MUNE can be calculated from a continuous sequence of data points. A 2 to 2.5 SD or 10% window are reasonable methods of limiting data for analysis. Muscle Nerve 27: 320–331, 2003
Resumo:
A growing number of predicting corporate failure models has emerged since 60s. Economic and social consequences of business failure can be dramatic, thus it is not surprise that the issue has been of growing interest in academic research as well as in business context. The main purpose of this study is to compare the predictive ability of five developed models based on three statistical techniques (Discriminant Analysis, Logit and Probit) and two models based on Artificial Intelligence (Neural Networks and Rough Sets). The five models were employed to a dataset of 420 non-bankrupt firms and 125 bankrupt firms belonging to the textile and clothing industry, over the period 2003–09. Results show that all the models performed well, with an overall correct classification level higher than 90%, and a type II error always less than 2%. The type I error increases as we move away from the year prior to failure. Our models contribute to the discussion of corporate financial distress causes. Moreover it can be used to assist decisions of creditors, investors and auditors. Additionally, this research can be of great contribution to devisers of national economic policies that aim to reduce industrial unemployment.
Resumo:
A growing number of predicting corporate failure models has emerged since 60s. Economic and social consequences of business failure can be dramatic, thus it is not surprise that the issue has been of growing interest in academic research as well as in business context. The main purpose of this study is to compare the predictive ability of five developed models based on three statistical techniques (Discriminant Analysis, Logit and Probit) and two models based on Artificial Intelligence (Neural Networks and Rough Sets). The five models were employed to a dataset of 420 non-bankrupt firms and 125 bankrupt firms belonging to the textile and clothing industry, over the period 2003–09. Results show that all the models performed well, with an overall correct classification level higher than 90%, and a type II error always less than 2%. The type I error increases as we move away from the year prior to failure. Our models contribute to the discussion of corporate financial distress causes. Moreover it can be used to assist decisions of creditors, investors and auditors. Additionally, this research can be of great contribution to devisers of national economic policies that aim to reduce industrial unemployment.
Resumo:
Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.
Resumo:
Wyner - Ziv (WZ) video coding is a particular case of distributed video coding (DVC), the recent video coding paradigm based on the Slepian - Wolf and Wyner - Ziv theorems which exploits the source temporal correlation at the decoder and not at the encoder as in predictive video coding. Although some progress has been made in the last years, WZ video coding is still far from the compression performance of predictive video coding, especially for high and complex motion contents. The WZ video codec adopted in this study is based on a transform domain WZ video coding architecture with feedback channel-driven rate control, whose modules have been improved with some recent coding tools. This study proposes a novel motion learning approach to successively improve the rate-distortion (RD) performance of the WZ video codec as the decoding proceeds, making use of the already decoded transform bands to improve the decoding process for the remaining transform bands. The results obtained reveal gains up to 2.3 dB in the RD curves against the performance for the same codec without the proposed motion learning approach for high motion sequences and long group of pictures (GOP) sizes.
Resumo:
In music genre classification, most approaches rely on statistical characteristics of low-level features computed on short audio frames. In these methods, it is implicitly considered that frames carry equally relevant information loads and that either individual frames, or distributions thereof, somehow capture the specificities of each genre. In this paper we study the representation space defined by short-term audio features with respect to class boundaries, and compare different processing techniques to partition this space. These partitions are evaluated in terms of accuracy on two genre classification tasks, with several types of classifiers. Experiments show that a randomized and unsupervised partition of the space, used in conjunction with a Markov Model classifier lead to accuracies comparable to the state of the art. We also show that unsupervised partitions of the space tend to create less hubs.
Resumo:
Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.
Resumo:
Copyright © 2014 The Authors. Oikos © 2014 Nordic Society Oikos.
Resumo:
Tese de Doutoramento, Ciências do Mar (Ecologia Marinha)
Resumo:
Tese de Doutoramento, Ciências do Mar (Biologia Marinha)
Resumo:
The species abundance distribution (SAD) has been a central focus of community ecology for over fifty years, and is currently the subject of widespread renewed interest. The gambin model has recently been proposed as a model that provides a superior fit to commonly preferred SAD models. It has also been argued that the model's single parameter (α) presents a potentially informative ecological diversity metric, because it summarises the shape of the SAD in a single number. Despite this potential, few empirical tests of the model have been undertaken, perhaps because the necessary methods and software for fitting the model have not existed. Here, we derive a maximum likelihood method to fit the model, and use it to undertake a comprehensive comparative analysis of the fit of the gambin model. The functions and computational code to fit the model are incorporated in a newly developed free-to-download R package (gambin). We test the gambin model using a variety of datasets and compare the fit of the gambin model to fits obtained using the Poisson lognormal, logseries and zero-sum multinomial distributions. We found that gambin almost universally provided a better fit to the data and that the fit was consistent for a variety of sample grain sizes. We demonstrate how α can be used to differentiate intelligibly between community structures of Azorean arthropods sampled in different land use types. We conclude that gambin presents a flexible model capable of fitting a wide variety of observed SAD data, while providing a useful index of SAD form in its single fitted parameter. As such, gambin has wide potential applicability in the study of SADs, and ecology more generally.