Biblioteca Digital

951 resultados para statistical distribution

Conducting statistical tests of hypotheses : five common misconceptions found in transportation research

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.

Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros

A statistical framework for natural feature representation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a robust stochastic framework for the incorporation of visual observations into conventional estimation, data fusion, navigation and control algorithms. The representation combines Isomap, a non-linear dimensionality reduction algorithm, with expectation maximization, a statistical learning scheme. The joint probability distribution of this representation is computed offline based on existing training data. The training phase of the algorithm results in a nonlinear and non-Gaussian likelihood model of natural features conditioned on the underlying visual states. This generative model can be used online to instantiate likelihoods corresponding to observed visual features in real-time. The instantiated likelihoods are expressed as a Gaussian mixture model and are conveniently integrated within existing non-linear filtering algorithms. Example applications based on real visual data from heterogenous, unstructured environments demonstrate the versatility of the generative models.

An approach to statistical lip modelling for speaker identification via chromatic feature extraction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a novel technique for the tracking of moving lips for the purpose of speaker identification. In our system, a model of the lip contour is formed directly from chromatic information in the lip region. Iterative refinement of contour point estimates is not required. Colour features are extracted from the lips via concatenated profiles taken around the lip contour. Reduction of order in lip features is obtained via principal component analysis (PCA) followed by linear discriminant analysis (LDA). Statistical speaker models are built from the lip features based on the Gaussian mixture model (GMM). Identification experiments performed on the M2VTS¹ database, show encouraging results

Statistical eye model for normal eyes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose. To create a binocular statistical eye model based on previously measured ocular biometric data. Methods. Thirty-nine parameters were determined for a group of 127 healthy subjects (37 male, 90 female; 96.8% Caucasian) with an average age of 39.9 ± 12.2 years and spherical equivalent refraction of −0.98 ± 1.77 D. These parameters described the biometry of both eyes and the subjects' age. Missing parameters were complemented by data from a previously published study. After confirmation of the Gaussian shape of their distributions, these parameters were used to calculate their mean and covariance matrices. These matrices were then used to calculate a multivariate Gaussian distribution. From this, an amount of random biometric data could be generated, which were then randomly selected to create a realistic population of random eyes. Results. All parameters had Gaussian distributions, with the exception of the parameters that describe total refraction (i.e., three parameters per eye). After these non-Gaussian parameters were omitted from the model, the generated data were found to be statistically indistinguishable from the original data for the remaining 33 parameters (TOST [two one-sided t tests]; P < 0.01). Parameters derived from the generated data were also significantly indistinguishable from those calculated with the original data (P > 0.05). The only exception to this was the lens refractive index, for which the generated data had a significantly larger SD. Conclusions. A statistical eye model can describe the biometric variations found in a population and is a useful addition to the classic eye models.

Three-dimensional geological modelling and multivariate statistical analysis of water chemistry data to analyse and visualise aquifer structure and groundwater composition in the Wairau Plain, Marlborough District, New Zealand

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Concerns regarding groundwater contamination with nitrate and the long-term sustainability of groundwater resources have prompted the development of a multi-layered three dimensional (3D) geological model to characterise the aquifer geometry of the Wairau Plain, Marlborough District, New Zealand. The 3D geological model which consists of eight litho-stratigraphic units has been subsequently used to synthesise hydrogeological and hydrogeochemical data for different aquifers in an approach that aims to demonstrate how integration of water chemistry data within the physical framework of a 3D geological model can help to better understand and conceptualise groundwater systems in complex geological settings. Multivariate statistical techniques(e.g. Principal Component Analysis and Hierarchical Cluster Analysis) were applied to groundwater chemistry data to identify hydrochemical facies which are characteristic of distinct evolutionary pathways and a common hydrologic history of groundwaters. Principal Component Analysis on hydrochemical data demonstrated that natural water-rock interactions, redox potential and human agricultural impact are the key controls of groundwater quality in the Wairau Plain. Hierarchical Cluster Analysis revealed distinct hydrochemical water quality groups in the Wairau Plain groundwater system. Visualisation of the results of the multivariate statistical analyses and distribution of groundwater nitrate concentrations in the context of aquifer lithology highlighted the link between groundwater chemistry and the lithology of host aquifers. The methodology followed in this study can be applied in a variety of hydrogeological settings to synthesise geological, hydrogeological and hydrochemical data and present them in a format readily understood by a wide range of stakeholders. This enables a more efficient communication of the results of scientific studies to the wider community.

Statistical modelling of wind effects on signal propagation for wireless sensor networks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A wireless sensor network system must have the ability to tolerate harsh environmental conditions and reduce communication failures. In a typical outdoor situation, the presence of wind can introduce movement in the foliage. This motion of vegetation structures causes large and rapid signal fading in the communication link and must be accounted for when deploying a wireless sensor network system in such conditions. This thesis examines the fading characteristics experienced by wireless sensor nodes due to the effect of varying wind speed in a foliage obstructed transmission path. It presents extensive measurement campaigns at two locations with the approach of a typical wireless sensor networks configuration. The significance of this research lies in the varied approaches of its different experiments, involving a variety of vegetation types, scenarios and the use of different polarisations (vertical and horizontal). Non–line of sight (NLoS) scenario conditions investigate the wind effect based on different vegetation densities including that of the Acacia tree, Dogbane tree and tall grass. Whereas the line of sight (LoS) scenario investigates the effect of wind when the grass is swaying and affecting the ground-reflected component of the signal. Vegetation type and scenarios are envisaged to simulate real life working conditions of wireless sensor network systems in outdoor foliated environments. The results from the measurements are presented in statistical models involving first and second order statistics. We found that in most of the cases, the fading amplitude could be approximated by both Lognormal and Nakagami distribution, whose m parameter was found to depend on received power fluctuations. Lognormal distribution is known as the result of slow fading characteristics due to shadowing. This study concludes that fading caused by variations in received power due to wind in wireless sensor networks systems are found to be insignificant. There is no notable difference in Nakagami m values for low, calm, and windy wind speed categories. It is also shown in the second order analysis, the duration of the deep fades are very short, 0.1 second for 10 dB attenuation below RMS level for vertical polarization and 0.01 second for 10 dB attenuation below RMS level for horizontal polarization. Another key finding is that the received signal strength for horizontal polarisation demonstrates more than 3 dB better performances than the vertical polarisation for LoS and near LoS (thin vegetation) conditions and up to 10 dB better for denser vegetation conditions.

Asymptotic behaviour of the posterior distribution in overfitted mixture models

Relevância:

30.00% 30.00%

Publicador:

A comparison of methods for classifying clinical samples based on proteomics data : a case study for statistical and machine learning approaches

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an n≪p constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems.

Distribution feeder loads classification and decomposition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Load in distribution networks is normally measured at the 11kV supply points; little or no information is known about the type of customers and their contributions to the load. This paper proposes statistical methods to decompose an unknown distribution feeder load to its customer load sector/subsector profiles. The approach used in this paper should assist electricity suppliers in economic load management, strategic planning and future network reinforcements.

Bayesian hierarchical models in statistical quality control methods to improve healthcare in hospitals

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Quality oriented management systems and methods have become the dominant business and governance paradigm. From this perspective, satisfying customers’ expectations by supplying reliable, good quality products and services is the key factor for an organization and even government. During recent decades, Statistical Quality Control (SQC) methods have been developed as the technical core of quality management and continuous improvement philosophy and now are being applied widely to improve the quality of products and services in industrial and business sectors. Recently SQC tools, in particular quality control charts, have been used in healthcare surveillance. In some cases, these tools have been modified and developed to better suit the health sector characteristics and needs. It seems that some of the work in the healthcare area has evolved independently of the development of industrial statistical process control methods. Therefore analysing and comparing paradigms and the characteristics of quality control charts and techniques across the different sectors presents some opportunities for transferring knowledge and future development in each sectors. Meanwhile considering capabilities of Bayesian approach particularly Bayesian hierarchical models and computational techniques in which all uncertainty are expressed as a structure of probability, facilitates decision making and cost-effectiveness analyses. Therefore, this research investigates the use of quality improvement cycle in a health vii setting using clinical data from a hospital. The need of clinical data for monitoring purposes is investigated in two aspects. A framework and appropriate tools from the industrial context are proposed and applied to evaluate and improve data quality in available datasets and data flow; then a data capturing algorithm using Bayesian decision making methods is developed to determine economical sample size for statistical analyses within the quality improvement cycle. Following ensuring clinical data quality, some characteristics of control charts in the health context including the necessity of monitoring attribute data and correlated quality characteristics are considered. To this end, multivariate control charts from an industrial context are adapted to monitor radiation delivered to patients undergoing diagnostic coronary angiogram and various risk-adjusted control charts are constructed and investigated in monitoring binary outcomes of clinical interventions as well as postintervention survival time. Meanwhile, adoption of a Bayesian approach is proposed as a new framework in estimation of change point following control chart’s signal. This estimate aims to facilitate root causes efforts in quality improvement cycle since it cuts the search for the potential causes of detected changes to a tighter time-frame prior to the signal. This approach enables us to obtain highly informative estimates for change point parameters since probability distribution based results are obtained. Using Bayesian hierarchical models and Markov chain Monte Carlo computational methods, Bayesian estimators of the time and the magnitude of various change scenarios including step change, linear trend and multiple change in a Poisson process are developed and investigated. The benefits of change point investigation is revisited and promoted in monitoring hospital outcomes where the developed Bayesian estimator reports the true time of the shifts, compared to priori known causes, detected by control charts in monitoring rate of excess usage of blood products and major adverse events during and after cardiac surgery in a local hospital. The development of the Bayesian change point estimators are then followed in a healthcare surveillances for processes in which pre-intervention characteristics of patients are viii affecting the outcomes. In this setting, at first, the Bayesian estimator is extended to capture the patient mix, covariates, through risk models underlying risk-adjusted control charts. Variations of the estimator are developed to estimate the true time of step changes and linear trends in odds ratio of intensive care unit outcomes in a local hospital. Secondly, the Bayesian estimator is extended to identify the time of a shift in mean survival time after a clinical intervention which is being monitored by riskadjusted survival time control charts. In this context, the survival time after a clinical intervention is also affected by patient mix and the survival function is constructed using survival prediction model. The simulation study undertaken in each research component and obtained results highly recommend the developed Bayesian estimators as a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances as well as industrial and business contexts. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The empirical results and simulations indicate that the Bayesian estimators are a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The advantages of the Bayesian approach seen in general context of quality control may also be extended in the industrial and business domains where quality monitoring was initially developed.

Characteristics of spatial variation, size distribution, formation and growth of particles in urban environments

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis is the first comprehensive study of important parameters relating to aerosols' impact on climate and human health, namely spatial variation, particle size distribution and new particle formation. We determined the importance of spatial variation of particle number concentration in microscale environments, developed a method for particle size parameterisation and provided knowledge about the chemistry of new particle formation. This is a significant contribution to our understanding of processes behind the transformation and dynamics of urban aerosols. This PhD project included extensive measurements of air quality parameters using state of the art instrumentation at each of the 25 sites within the Brisbane metropolitan area and advanced statistical analysis.

Statistical analysis of reduction in tensile strength of cotton strips as a measure of soil microbial activity

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The cotton strip assay (CSA) is an established technique for measuring soil microbial activity. The technique involves burying cotton strips and measuring their tensile strength after a certain time. This gives a measure of the rotting rate, R, of the cotton strips. R is then a measure of soil microbial activity. This paper examines properties of the technique and indicates how the assay can be optimised. Humidity conditioning of the cotton strips before measuring their tensile strength reduced the within and between day variance and enabled the distribution of the tensile strength measurements to approximate normality. The test data came from a three-way factorial experiment (two soils, two temperatures, three moisture levels). The cotton strips were buried in the soil for intervals of time ranging up to 6 weeks. This enabled the rate of loss of cotton tensile strength with time to be studied under a range of conditions. An inverse cubic model accounted for greater than 90% of the total variation within each treatment combination. This offers support for summarising the decomposition process by a single parameter R. The approximate variance of the decomposition rate was estimated from a function incorporating the variance of tensile strength and the differential of the function for the rate of decomposition, R, with respect to tensile strength. This variance function has a minimum when the measured strength is approximately 2/3 that of the original strength. The estimates of R are almost unbiased and relatively robust against the cotton strips being left in the soil for more or less than the optimal time. We conclude that the rotting rate X should be measured using the inverse cubic equation, and that the cotton strips should be left in the soil until their strength has been reduced to about 2/3.

Spatially averaged model of complex-plasma discharge with self-consistent electron energy distribution

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A global, or averaged, model for complex low-pressure argon discharge plasmas containing dust grains is presented. The model consists of particle and power balance equations taking into account power loss on the dust grains and the discharge wall. The electron energy distribution is determined by a Boltzmann equation. The effects of the dust and the external conditions, such as the input power and neutral gas pressure, on the electron energy distribution, the electron temperature, the electron and ion number densities, and the dust charge are investigated. It is found that the dust subsystem can strongly affect the stationary state of the discharge by dynamically modifying the electron energy distribution, the electron temperature, the creation and loss of the plasma particles, as well as the power deposition. In particular, the power loss to the dust grains can take up a significant portion of the input power, often even exceeding the loss to the wall.

A statistical approach to quantify the impact on voltage quality caused by PV generators

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A probabilistic method is proposed to evaluate voltage quality of grid-connected photovoltaic (PV) power systems. The random behavior of solar irradiation is described in statistical terms and the resulting voltage fluctuation probability distribution is then derived. Reactive power capabilities of the PV generators are then analyzed and their operation under constant power factor mode is examined. By utilizing the reactive power capability of the PV-generators to the full, it is shown that network voltage quality can be greatly enhanced.

«
1
2
...
6
7
8
9
10
11
12
...
63
64
»