964 resultados para statistical framework


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Frequent episode discovery is one of the methods used for temporal pattern discovery in sequential data. An episode is a partially ordered set of nodes with each node associated with an event type. For more than a decade, algorithms existed for episode discovery only when the associated partial order is total (serial episode) or trivial (parallel episode). Recently, the literature has seen algorithms for discovering episodes with general partial orders. In frequent pattern mining, the threshold beyond which a pattern is inferred to be interesting is typically user-defined and arbitrary. One way of addressing this issue in the pattern mining literature has been based on the framework of statistical hypothesis testing. This paper presents a method of assessing statistical significance of episode patterns with general partial orders. A method is proposed to calculate thresholds, on the non-overlapped frequency, beyond which an episode pattern would be inferred to be statistically significant. The method is first explained for the case of injective episodes with general partial orders. An injective episode is one where event-types are not allowed to repeat. Later it is pointed out how the method can be extended to the class of all episodes. The significance threshold calculations for general partial order episodes proposed here also generalize the existing significance results for serial episodes. Through simulations studies, the usefulness of these statistical thresholds in pruning uninteresting patterns is illustrated. (C) 2014 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Imaging flow cytometry is an emerging technology that combines the statistical power of flow cytometry with spatial and quantitative morphology of digital microscopy. It allows high-throughput imaging of cells with good spatial resolution, while they are in flow. This paper proposes a general framework for the processing/classification of cells imaged using imaging flow cytometer. Each cell is localized by finding an accurate cell contour. Then, features reflecting cell size, circularity and complexity are extracted for the classification using SVM. Unlike the conventional iterative, semi-automatic segmentation algorithms such as active contour, we propose a noniterative, fully automatic graph-based cell localization. In order to evaluate the performance of the proposed framework, we have successfully classified unstained label-free leukaemia cell-lines MOLT, K562 and HL60 from video streams captured using custom fabricated cost-effective microfluidics-based imaging flow cytometer. The proposed system is a significant development in the direction of building a cost-effective cell analysis platform that would facilitate affordable mass screening camps looking cellular morphology for disease diagnosis. Lay description In this article, we propose a novel framework for processing the raw data generated using microfluidics based imaging flow cytometers. Microfluidics microscopy or microfluidics based imaging flow cytometry (mIFC) is a recent microscopy paradigm, that combines the statistical power of flow cytometry with spatial and quantitative morphology of digital microscopy, which allows us imaging cells while they are in flow. In comparison to the conventional slide-based imaging systems, mIFC is a nascent technology enabling high throughput imaging of cells and is yet to take the form of a clinical diagnostic tool. The proposed framework process the raw data generated by the mIFC systems. The framework incorporates several steps: beginning from pre-processing of the raw video frames to enhance the contents of the cell, localising the cell by a novel, fully automatic, non-iterative graph based algorithm, extraction of different quantitative morphological parameters and subsequent classification of cells. In order to evaluate the performance of the proposed framework, we have successfully classified unstained label-free leukaemia cell-lines MOLT, K562 and HL60 from video streams captured using cost-effective microfluidics based imaging flow cytometer. The cell lines of HL60, K562 and MOLT were obtained from ATCC (American Type Culture Collection) and are separately cultured in the lab. Thus, each culture contains cells from its own category alone and thereby provides the ground truth. Each cell is localised by finding a closed cell contour by defining a directed, weighted graph from the Canny edge images of the cell such that the closed contour lies along the shortest weighted path surrounding the centroid of the cell from a starting point on a good curve segment to an immediate endpoint. Once the cell is localised, morphological features reflecting size, shape and complexity of the cells are extracted and used to develop a support vector machine based classification system. We could classify the cell-lines with good accuracy and the results were quite consistent across different cross validation experiments. We hope that imaging flow cytometers equipped with the proposed framework for image processing would enable cost-effective, automated and reliable disease screening in over-loaded facilities, which cannot afford to hire skilled personnel in large numbers. Such platforms would potentially facilitate screening camps in low income group countries; thereby transforming the current health care paradigms by enabling rapid, automated diagnosis for diseases like cancer.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The rapid evolution of nanotechnology appeals for the understanding of global response of nanoscale systems based on atomic interactions, hence necessitates novel, sophisticated, and physically based approaches to bridge the gaps between various length and time scales. In this paper, we propose a group of statistical thermodynamics methods for the simulations of nanoscale systems under quasi-static loading at finite temperature, that is, molecular statistical thermodynamics (MST) method, cluster statistical thermodynamics (CST) method, and the hybrid molecular/cluster statistical thermodynamics (HMCST) method. These methods, by treating atoms as oscillators and particles simultaneously, as well as clusters, comprise different spatial and temporal scales in a unified framework. One appealing feature of these methods is their "seamlessness" or consistency in the same underlying atomistic model in all regions consisting of atoms and clusters, and hence can avoid the ghost force in the simulation. On the other hand, compared with conventional MD simulations, their high computational efficiency appears very attractive, as manifested by the simulations of uniaxial compression and nanoindenation. (C) 2008 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

English: We describe an age-structured statistical catch-at-length analysis (A-SCALA) based on the MULTIFAN-CL model of Fournier et al. (1998). The analysis is applied independently to both the yellowfin and the bigeye tuna populations of the eastern Pacific Ocean (EPO). We model the populations from 1975 to 1999, based on quarterly time steps. Only a single stock for each species is assumed for each analysis, but multiple fisheries that are spatially separate are modeled to allow for spatial differences in catchability and selectivity. The analysis allows for error in the effort-fishing mortality relationship, temporal trends in catchability, temporal variation in recruitment, relationships between the environment and recruitment and between the environment and catchability, and differences in selectivity and catchability among fisheries. The model is fit to total catch data and proportional catch-at-length data conditioned on effort. The A-SCALA method is a statistical approach, and therefore recognizes that the data collected from the fishery do not perfectly represent the population. Also, there is uncertainty in our knowledge about the dynamics of the system and uncertainty about how the observed data relate to the real population. The use of likelihood functions allow us to model the uncertainty in the data collected from the population, and the inclusion of estimable process error allows us to model the uncertainties in the dynamics of the system. The statistical approach allows for the calculation of confidence intervals and the testing of hypotheses. We use a Bayesian version of the maximum likelihood framework that includes distributional constraints on temporal variation in recruitment, the effort-fishing mortality relationship, and catchability. Curvature penalties for selectivity parameters and penalties on extreme fishing mortality rates are also included in the objective function. The mode of the joint posterior distribution is used as an estimate of the model parameters. Confidence intervals are calculated using the normal approximation method. It should be noted that the estimation method includes constraints and priors and therefore the confidence intervals are different from traditionally calculated confidence intervals. Management reference points are calculated, and forward projections are carried out to provide advice for making management decisions for the yellowfin and bigeye populations. Spanish: Describimos un análisis estadístico de captura a talla estructurado por edad, A-SCALA (del inglés age-structured statistical catch-at-length analysis), basado en el modelo MULTIFAN- CL de Fournier et al. (1998). Se aplica el análisis independientemente a las poblaciones de atunes aleta amarilla y patudo del Océano Pacífico oriental (OPO). Modelamos las poblaciones de 1975 a 1999, en pasos trimestrales. Se supone solamente una sola población para cada especie para cada análisis, pero se modelan pesquerías múltiples espacialmente separadas para tomar en cuenta diferencias espaciales en la capturabilidad y selectividad. El análisis toma en cuenta error en la relación esfuerzo-mortalidad por pesca, tendencias temporales en la capturabilidad, variación temporal en el reclutamiento, relaciones entre el medio ambiente y el reclutamiento y entre el medio ambiente y la capturabilidad, y diferencias en selectividad y capturabilidad entre pesquerías. Se ajusta el modelo a datos de captura total y a datos de captura a talla proporcional condicionados sobre esfuerzo. El método A-SCALA es un enfoque estadístico, y reconoce por lo tanto que los datos obtenidos de la pesca no representan la población perfectamente. Además, hay incertidumbre en nuestros conocimientos de la dinámica del sistema e incertidumbre sobre la relación entre los datos observados y la población real. El uso de funciones de verosimilitud nos permite modelar la incertidumbre en los datos obtenidos de la población, y la inclusión de un error de proceso estimable nos permite modelar las incertidumbres en la dinámica del sistema. El enfoque estadístico permite calcular intervalos de confianza y comprobar hipótesis. Usamos una versión bayesiana del marco de verosimilitud máxima que incluye constreñimientos distribucionales sobre la variación temporal en el reclutamiento, la relación esfuerzo-mortalidad por pesca, y la capturabilidad. Se incluyen también en la función objetivo penalidades por curvatura para los parámetros de selectividad y penalidades por tasas extremas de mortalidad por pesca. Se usa la moda de la distribución posterior conjunta como estimación de los parámetros del modelo. Se calculan los intervalos de confianza usando el método de aproximación normal. Cabe destacar que el método de estimación incluye constreñimientos y distribuciones previas y por lo tanto los intervalos de confianza son diferentes de los intervalos de confianza calculados de forma tradicional. Se calculan puntos de referencia para el ordenamiento, y se realizan proyecciones a futuro para asesorar la toma de decisiones para el ordenamiento de las poblaciones de aleta amarilla y patudo.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a method to integrate environmental time series into stock assessment models and to test the significance of correlations between population processes and the environmental time series. Parameters that relate the environmental time series to population processes are included in the stock assessment model, and likelihood ratio tests are used to determine if the parameters improve the fit to the data significantly. Two approaches are considered to integrate the environmental relationship. In the environmental model, the population dynamics process (e.g. recruitment) is proportional to the environmental variable, whereas in the environmental model with process error it is proportional to the environmental variable, but the model allows an additional temporal variation (process error) constrained by a log-normal distribution. The methods are tested by using simulation analysis and compared to the traditional method of correlating model estimates with environmental variables outside the estimation procedure. In the traditional method, the estimates of recruitment were provided by a model that allowed the recruitment only to have a temporal variation constrained by a log-normal distribution. We illustrate the methods by applying them to test the statistical significance of the correlation between sea-surface temperature (SST) and recruitment to the snapper (Pagrus auratus) stock in the Hauraki Gulf–Bay of Plenty, New Zealand. Simulation analyses indicated that the integrated approach with additional process error is superior to the traditional method of correlating model estimates with environmental variables outside the estimation procedure. The results suggest that, for the snapper stock, recruitment is positively correlated with SST at the time of spawning.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Atlases and statistical models play important roles in the personalization and simulation of cardiac physiology. For the study of the heart, however, the construction of comprehensive atlases and spatio-temporal models is faced with a number of challenges, in particular the need to handle large and highly variable image datasets, the multi-region nature of the heart, and the presence of complex as well as small cardiovascular structures. In this paper, we present a detailed atlas and spatio-temporal statistical model of the human heart based on a large population of 3D+time multi-slice computed tomography sequences, and the framework for its construction. It uses spatial normalization based on nonrigid image registration to synthesize a population mean image and establish the spatial relationships between the mean and the subjects in the population. Temporal image registration is then applied to resolve each subject-specific cardiac motion and the resulting transformations are used to warp a surface mesh representation of the atlas to fit the images of the remaining cardiac phases in each subject. Subsequently, we demonstrate the construction of a spatio-temporal statistical model of shape such that the inter-subject and dynamic sources of variation are suitably separated. The framework is applied to a 3D+time data set of 138 subjects. The data is drawn from a variety of pathologies, which benefits its generalization to new subjects and physiological studies. The obtained level of detail and the extendability of the atlas present an advantage over most cardiac models published previously. © 1982-2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical dialog systems (SDSs) are motivated by the need for a data-driven framework that reduces the cost of laboriously handcrafting complex dialog managers and that provides robustness against the errors created by speech recognizers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework. However, exact model representation and optimization is computationally intractable. Hence, the practical application of POMDP-based systems requires efficient algorithms and carefully constructed approximations. This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems. © 1963-2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We report an empirical study of n-gram posterior probability confidence measures for statistical machine translation (SMT). We first describe an efficient and practical algorithm for rapidly computing n-gram posterior probabilities from large translation word lattices. These probabilities are shown to be a good predictor of whether or not the n-gram is found in human reference translations, motivating their use as a confidence measure for SMT. Comprehensive n-gram precision and word coverage measurements are presented for a variety of different language pairs, domains and conditions. We analyze the effect on reference precision of using single or multiple references, and compare the precision of posteriors computed from k-best lists to those computed over the full evidence space of the lattice. We also demonstrate improved confidence by combining multiple lattices in a multi-source translation framework. © 2012 The Author(s).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The advent of virtualization and cloud computing technologies necessitates the development of effective mechanisms for the estimation and reservation of resources needed by content providers to deliver large numbers of video-on-demand (VOD) streams through the cloud. Unfortunately, capacity planning for the QoS-constrained delivery of a large number of VOD streams is inherently difficult as VBR encoding schemes exhibit significant bandwidth variability. In this paper, we present a novel resource management scheme to make such allocation decisions using a mixture of per-stream reservations and an aggregate reservation, shared across all streams to accommodate peak demands. The shared reservation provides capacity slack that enables statistical multiplexing of peak rates, while assuring analytically bounded frame-drop probabilities, which can be adjusted by trading off buffer space (and consequently delay) and bandwidth. Our two-tiered bandwidth allocation scheme enables the delivery of any set of streams with less bandwidth (or equivalently with higher link utilization) than state-of-the-art deterministic smoothing approaches. The algorithm underlying our proposed frame-work uses three per-stream parameters and is linear in the number of servers, making it particularly well suited for use in an on-line setting. We present results from extensive trace-driven simulations, which confirm the efficiency of our scheme especially for small buffer sizes and delay bounds, and which underscore the significant realizable bandwidth savings, typically yielding losses that are an order of magnitude or more below our analytically derived bounds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A framework for adaptive and non-adaptive statistical compressive sensing is developed, where a statistical model replaces the standard sparsity model of classical compressive sensing. We propose within this framework optimal task-specific sensing protocols specifically and jointly designed for classification and reconstruction. A two-step adaptive sensing paradigm is developed, where online sensing is applied to detect the signal class in the first step, followed by a reconstruction step adapted to the detected class and the observed samples. The approach is based on information theory, here tailored for Gaussian mixture models (GMMs), where an information-theoretic objective relationship between the sensed signals and a representation of the specific task of interest is maximized. Experimental results using synthetic signals, Landsat satellite attributes, and natural images of different sizes and with different noise levels show the improvements achieved using the proposed framework when compared to more standard sensing protocols. The underlying formulation can be applied beyond GMMs, at the price of higher mathematical and computational complexity. © 1991-2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The increasing complexity of new manufacturing processes and the continuously growing range of fabrication options mean that critical decisions about the insertion of new technologies must be made as early as possible in the design process. Mitigating the technology risks under limited knowledge is a key factor and major requirement to secure a successful development of the new technologies. In order to address this challenge, a risk mitigation methodology that incorporates both qualitative and quantitative analysis is required. This paper outlines the methodology being developed under a major UK grand challenge project - 3D-Mintegration. The main focus is on identifying the risks through identification of the product key characteristics using a product breakdown approach. The assessment of the identified risks uses quantification and prioritisation techniques to evaluate and rank the risks. Traditional statistical process control based on process capability and six sigma concepts are applied to measure the process capability as a result of the risks that have been identified. This paper also details a numerical approach that can be used to undertake risk analysis. This methodology is based on computational framework where modelling and statistical techniques are integrated. Also, an example of modeling and simulation technique is given using focused ion beam which is among the investigated in the project manufacturing processes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ocean biogeochemistry (OBGC) models span a wide variety of complexities, including highly simplified nutrient-restoring schemes, nutrient–phytoplankton–zooplankton–detritus (NPZD) models that crudely represent the marine biota, models that represent a broader trophic structure by grouping organisms as plankton functional types (PFTs) based on their biogeochemical role (dynamic green ocean models) and ecosystem models that group organisms by ecological function and trait. OBGC models are now integral components of Earth system models (ESMs), but they compete for computing resources with higher resolution dynamical setups and with other components such as atmospheric chemistry and terrestrial vegetation schemes. As such, the choice of OBGC in ESMs needs to balance model complexity and realism alongside relative computing cost. Here we present an intercomparison of six OBGC models that were candidates for implementation within the next UK Earth system model (UKESM1). The models cover a large range of biological complexity (from 7 to 57 tracers) but all include representations of at least the nitrogen, carbon, alkalinity and oxygen cycles. Each OBGC model was coupled to the ocean general circulation model Nucleus for European Modelling of the Ocean (NEMO) and results from physically identical hindcast simulations were compared. Model skill was evaluated for biogeochemical metrics of global-scale bulk properties using conventional statistical techniques. The computing cost of each model was also measured in standardised tests run at two resource levels. No model is shown to consistently outperform all other models across all metrics. Nonetheless, the simpler models are broadly closer to observations across a number of fields and thus offer a high-efficiency option for ESMs that prioritise high-resolution climate dynamics. However, simpler models provide limited insight into more complex marine biogeochemical processes and ecosystem pathways, and a parallel approach of low-resolution climate dynamics and high-complexity biogeochemistry is desirable in order to provide additional insights into biogeochemistry–climate interactions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, Bayesian statistical software has been developed for age-depth modeling (wiggle-match dating) of sequences of densely spaced radiocarbon dates from peat cores. The method is described in non-statistical terms, and is compared with an alternative method of chronological ordering of 14C dates. Case studies include the dating of the start of agriculture in the northeastern part of the Netherlands, and of a possible Hekla-3 tephra layer in the same country. We discuss future enhancements in Bayesian age modeling.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.