989 resultados para Statistical methodologies
Resumo:
Accurate knowledge of traffic demands in a communication network enables or enhances a variety of traffic engineering and network management tasks of paramount importance for operational networks. Directly measuring a complete set of these demands is prohibitively expensive because of the huge amounts of data that must be collected and the performance impact that such measurements would impose on the regular behavior of the network. As a consequence, we must rely on statistical techniques to produce estimates of actual traffic demands from partial information. The performance of such techniques is however limited due to their reliance on limited information and the high amount of computations they incur, which limits their convergence behavior. In this paper we study strategies to improve the convergence of a powerful statistical technique based on an Expectation-Maximization iterative algorithm. First we analyze modeling approaches to generating starting points. We call these starting points informed priors since they are obtained using actual network information such as packet traces and SNMP link counts. Second we provide a very fast variant of the EM algorithm which extends its computation range, increasing its accuracy and decreasing its dependence on the quality of the starting point. Finally, we study the convergence characteristics of our EM algorithm and compare it against a recently proposed Weighted Least Squares approach.
Resumo:
Accurate knowledge of traffic demands in a communication network enables or enhances a variety of traffic engineering and network management tasks of paramount importance for operational networks. Directly measuring a complete set of these demands is prohibitively expensive because of the huge amounts of data that must be collected and the performance impact that such measurements would impose on the regular behavior of the network. As a consequence, we must rely on statistical techniques to produce estimates of actual traffic demands from partial information. The performance of such techniques is however limited due to their reliance on limited information and the high amount of computations they incur, which limits their convergence behavior. In this paper we study a two-step approach for inferring network traffic demands. First we elaborate and evaluate a modeling approach for generating good starting points to be fed to iterative statistical inference techniques. We call these starting points informed priors since they are obtained using actual network information such as packet traces and SNMP link counts. Second we provide a very fast variant of the EM algorithm which extends its computation range, increasing its accuracy and decreasing its dependence on the quality of the starting point. Finally, we evaluate and compare alternative mechanisms for generating starting points and the convergence characteristics of our EM algorithm against a recently proposed Weighted Least Squares approach.
Resumo:
We demonstrate that if two probability distributions D and E of sufficiently small min-entropy have statistical difference ε, then the direct-product distributions D^l and E^l have statistical difference at least roughly ε\s√l, provided that l is sufficiently small, smaller than roughly ε^{4/3}. Previously known bounds did not work for few repetitions l, requiring l>ε^2.
Resumo:
Quality of Service (QoS) guarantees are required by an increasing number of applications to ensure a minimal level of fidelity in the delivery of application data units through the network. Application-level QoS does not necessarily follow from any transport-level QoS guarantees regarding the delivery of the individual cells (e.g. ATM cells) which comprise the application's data units. The distinction between application-level and transport-level QoS guarantees is due primarily to the fragmentation that occurs when transmitting large application data units (e.g. IP packets, or video frames) using much smaller network cells, whereby the partial delivery of a data unit is useless; and, bandwidth spent to partially transmit the data unit is wasted. The data units transmitted by an application may vary in size while being constant in rate, which results in a variable bit rate (VBR) data flow. That data flow requires QoS guarantees. Statistical multiplexing is inadequate, because no guarantees can be made and no firewall property exists between different data flows. In this paper, we present a novel resource management paradigm for the maintenance of application-level QoS for VBR flows. Our paradigm is based on Statistical Rate Monotonic Scheduling (SRMS), in which (1) each application generates its variable-size data units at a fixed rate, (2) the partial delivery of data units is of no value to the application, and (3) the QoS guarantee extended to the application is the probability that an arbitrary data unit will be successfully transmitted through the network to/from the application.
Resumo:
Statistical Rate Monotonic Scheduling (SRMS) is a generalization of the classical RMS results of Liu and Layland [LL73] for periodic tasks with highly variable execution times and statistical QoS requirements. The main tenet of SRMS is that the variability in task resource requirements could be smoothed through aggregation to yield guaranteed QoS. This aggregation is done over time for a given task and across multiple tasks for a given period of time. Similar to RMS, SRMS has two components: a feasibility test and a scheduling algorithm. SRMS feasibility test ensures that it is possible for a given periodic task set to share a given resource without violating any of the statistical QoS constraints imposed on each task in the set. The SRMS scheduling algorithm consists of two parts: a job admission controller and a scheduler. The SRMS scheduler is a simple, preemptive, fixed-priority scheduler. The SRMS job admission controller manages the QoS delivered to the various tasks through admit/reject and priority assignment decisions. In particular, it ensures the important property of task isolation, whereby tasks do not infringe on each other. In this paper we present the design and implementation of SRMS within the KURT Linux Operating System [HSPN98, SPH 98, Sri98]. KURT Linux supports conventional tasks as well as real-time tasks. It provides a mechanism for transitioning from normal Linux scheduling to a mixed scheduling of conventional and real-time tasks, and to a focused mode where only real-time tasks are scheduled. We overview the technical issues that we had to overcome in order to integrate SRMS into KURT Linux and present the API we have developed for scheduling periodic real-time tasks using SRMS.
Resumo:
Statistical properties offast-slow Ellias-Grossberg oscillators are studied in response to deterministic and noisy inputs. Oscillatory responses remain stable in noise due to the slow inhibitory variable, which establishes an adaptation level that centers the oscillatory responses of the fast excitatory variable to deterministic and noisy inputs. Competitive interactions between oscillators improve the stability in noise. Although individual oscillation amplitudes decrease with input amplitude, the average to'tal activity increases with input amplitude, thereby suggesting that oscillator output is evaluated by a slow process at downstream network sites.
Resumo:
A method to solve the stationary state probability is presented for the first-order bang-bang phase-locked loop (BBPLL) with nonzero loop delay. This is based on a delayed Markov chain model and a state How diagram for tracing the state history due to the loop delay. As a result, an eigenequation is obtained, and its closed form solutions are derived for some cases. After obtaining the state probability, statistical characteristics such as mean gain of the binary phase detector and timing error variance are calculated and demonstrated.
Resumo:
A comparison study was carried out between a wireless sensor node with a bare die flip-chip mounted and its reference board with a BGA packaged transceiver chip. The main focus is the return loss (S parameter S11) at the antenna connector, which was highly depended on the impedance mismatch. Modeling including the different interconnect technologies, substrate properties and passive components, was performed to simulate the system in Ansoft Designer software. Statistical methods, such as the use of standard derivation and regression, were applied to the RF performance analysis, to see the impacts of the different parameters on the return loss. Extreme value search, following on the previous analysis, can provide the parameters' values for the minimum return loss. Measurements fit the analysis and simulation well and showed a great improvement of the return loss from -5dB to -25dB for the target wireless sensor node.
Resumo:
Electron microscopy (EM) has advanced in an exponential way since the first transmission electron microscope (TEM) was built in the 1930’s. The urge to ‘see’ things is an essential part of human nature (talk of ‘seeing is believing’) and apart from scanning tunnel microscopes which give information about the surface, EM is the only imaging technology capable of really visualising atomic structures in depth down to single atoms. With the development of nanotechnology the demand to image and analyse small things has become even greater and electron microscopes have found their way from highly delicate and sophisticated research grade instruments to key-turn and even bench-top instruments for everyday use in every materials research lab on the planet. The semiconductor industry is as dependent on the use of EM as life sciences and pharmaceutical industry. With this generalisation of use for imaging, the need to deploy advanced uses of EM has become more and more apparent. The combination of several coinciding beams (electron, ion and even light) to create DualBeam or TripleBeam instruments for instance enhances the usefulness from pure imaging to manipulating on the nanoscale. And when it comes to the analytic power of EM with the many ways the highly energetic electrons and ions interact with the matter in the specimen there is a plethora of niches which evolved during the last two decades, specialising in every kind of analysis that can be thought of and combined with EM. In the course of this study the emphasis was placed on the application of these advanced analytical EM techniques in the context of multiscale and multimodal microscopy – multiscale meaning across length scales from micrometres or larger to nanometres, multimodal meaning numerous techniques applied to the same sample volume in a correlative manner. In order to demonstrate the breadth and potential of the multiscale and multimodal concept an integration of it was attempted in two areas: I) Biocompatible materials using polycrystalline stainless steel and II) Semiconductors using thin multiferroic films. I) The motivation to use stainless steel (316L medical grade) comes from the potential modulation of endothelial cell growth which can have a big impact on the improvement of cardio-vascular stents – which are mainly made of 316L – through nano-texturing of the stent surface by focused ion beam (FIB) lithography. Patterning with FIB has never been reported before in connection with stents and cell growth and in order to gain a better understanding of the beam-substrate interaction during patterning a correlative microscopy approach was used to illuminate the patterning process from many possible angles. Electron backscattering diffraction (EBSD) was used to analyse the crystallographic structure, FIB was used for the patterning and simultaneously visualising the crystal structure as part of the monitoring process, scanning electron microscopy (SEM) and atomic force microscopy (AFM) were employed to analyse the topography and the final step being 3D visualisation through serial FIB/SEM sectioning. II) The motivation for the use of thin multiferroic films stems from the ever-growing demand for increased data storage at lesser and lesser energy consumption. The Aurivillius phase material used in this study has a high potential in this area. Yet it is necessary to show clearly that the film is really multiferroic and no second phase inclusions are present even at very low concentrations – ~0.1vol% could already be problematic. Thus, in this study a technique was developed to analyse ultra-low density inclusions in thin multiferroic films down to concentrations of 0.01%. The goal achieved was a complete structural and compositional analysis of the films which required identification of second phase inclusions (through elemental analysis EDX(Energy Dispersive X-ray)), localise them (employing 72 hour EDX mapping in the SEM), isolate them for the TEM (using FIB) and give an upper confidence limit of 99.5% to the influence of the inclusions on the magnetic behaviour of the main phase (statistical analysis).
Resumo:
BACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.
Resumo:
This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.
Resumo:
BACKGROUND: Dropouts and missing data are nearly-ubiquitous in obesity randomized controlled trails, threatening validity and generalizability of conclusions. Herein, we meta-analytically evaluate the extent of missing data, the frequency with which various analytic methods are employed to accommodate dropouts, and the performance of multiple statistical methods. METHODOLOGY/PRINCIPAL FINDINGS: We searched PubMed and Cochrane databases (2000-2006) for articles published in English and manually searched bibliographic references. Articles of pharmaceutical randomized controlled trials with weight loss or weight gain prevention as major endpoints were included. Two authors independently reviewed each publication for inclusion. 121 articles met the inclusion criteria. Two authors independently extracted treatment, sample size, drop-out rates, study duration, and statistical method used to handle missing data from all articles and resolved disagreements by consensus. In the meta-analysis, drop-out rates were substantial with the survival (non-dropout) rates being approximated by an exponential decay curve (e(-lambdat)) where lambda was estimated to be .0088 (95% bootstrap confidence interval: .0076 to .0100) and t represents time in weeks. The estimated drop-out rate at 1 year was 37%. Most studies used last observation carried forward as the primary analytic method to handle missing data. We also obtained 12 raw obesity randomized controlled trial datasets for empirical analyses. Analyses of raw randomized controlled trial data suggested that both mixed models and multiple imputation performed well, but that multiple imputation may be more robust when missing data are extensive. CONCLUSION/SIGNIFICANCE: Our analysis offers an equation for predictions of dropout rates useful for future study planning. Our raw data analyses suggests that multiple imputation is better than other methods for handling missing data in obesity randomized controlled trials, followed closely by mixed models. We suggest these methods supplant last observation carried forward as the primary method of analysis.
Resumo:
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.
Resumo:
A framework for adaptive and non-adaptive statistical compressive sensing is developed, where a statistical model replaces the standard sparsity model of classical compressive sensing. We propose within this framework optimal task-specific sensing protocols specifically and jointly designed for classification and reconstruction. A two-step adaptive sensing paradigm is developed, where online sensing is applied to detect the signal class in the first step, followed by a reconstruction step adapted to the detected class and the observed samples. The approach is based on information theory, here tailored for Gaussian mixture models (GMMs), where an information-theoretic objective relationship between the sensed signals and a representation of the specific task of interest is maximized. Experimental results using synthetic signals, Landsat satellite attributes, and natural images of different sizes and with different noise levels show the improvements achieved using the proposed framework when compared to more standard sensing protocols. The underlying formulation can be applied beyond GMMs, at the price of higher mathematical and computational complexity. © 1991-2012 IEEE.