882 resultados para Methods: Data Analysis
Resumo:
There is a current lack of understanding regarding the use of unregistered vehicles on public roads and road-related areas, and the links between the driving of unregistered vehicles and a range of dangerous driving behaviours. This report documents the findings of data analysis conducted to investigate the links between unlicensed driving and the driving of unregistered vehicles, and is an important initial undertaking into understanding these behaviours. This report examines de-identified data from two sources: crash data; and offence data. The data was extracted from the Queensland Department of Transport and Main Roads (TMR) databases and covered the period from 2003 to 2008.
Resumo:
Selection criteria and misspecification tests for the intra-cluster correlation structure (ICS) in longitudinal data analysis are considered. In particular, the asymptotical distribution of the correlation information criterion (CIC) is derived and a new method for selecting a working ICS is proposed by standardizing the selection criterion as the p-value. The CIC test is found to be powerful in detecting misspecification of the working ICS structures, while with respect to the working ICS selection, the standardized CIC test is also shown to have satisfactory performance. Some simulation studies and applications to two real longitudinal datasets are made to illustrate how these criteria and tests might be useful.
Resumo:
A modeling paradigm is proposed for covariate, variance and working correlation structure selection for longitudinal data analysis. Appropriate selection of covariates is pertinent to correct variance modeling and selecting the appropriate covariates and variance function is vital to correlation structure selection. This leads to a stepwise model selection procedure that deploys a combination of different model selection criteria. Although these criteria find a common theoretical root based on approximating the Kullback-Leibler distance, they are designed to address different aspects of model selection and have different merits and limitations. For example, the extended quasi-likelihood information criterion (EQIC) with a covariance penalty performs well for covariate selection even when the working variance function is misspecified, but EQIC contains little information on correlation structures. The proposed model selection strategies are outlined and a Monte Carlo assessment of their finite sample properties is reported. Two longitudinal studies are used for illustration.
Resumo:
The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (Liang and Zeger, 1986). However, the efficiency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (Qin and Lawless, 1994). Analyses show that this hybrid method is more efficient than a GEE using a misspecified working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efficient parameter estimates. In simulations, the hybrid method's finite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efficient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.
Resumo:
A graphical method is presented for Hall data analysis, including the temperature variation of activation energy due to screening. This method removes the discrepancies noted in the analysis of recently reported Hall data on Si(In).
Resumo:
The use of near infrared (NIR) hyperspectral imaging and hyperspectral image analysis for distinguishing between hard, intermediate and soft maize kernels from inbred lines was evaluated. NIR hyperspectral images of two sets (12 and 24 kernels) of whole maize kernels were acquired using a Spectral Dimensions MatrixNIR camera with a spectral range of 960-1662 nm and a sisuChema SWIR (short wave infrared) hyperspectral pushbroom imaging system with a spectral range of 1000-2498 nm. Exploratory principal component analysis (PCA) was used on absorbance images to remove background, bad pixels and shading. On the cleaned images. PCA could be used effectively to find histological classes including glassy (hard) and floury (soft) endosperm. PCA illustrated a distinct difference between glassy and floury endosperm along principal component (PC) three on the MatrixNIR and PC two on the sisuChema with two distinguishable clusters. Subsequently partial least squares discriminant analysis (PLS-DA) was applied to build a classification model. The PLS-DA model from the MatrixNIR image (12 kernels) resulted in root mean square error of prediction (RMSEP) value of 0.18. This was repeated on the MatrixNIR image of the 24 kernels which resulted in RMSEP of 0.18. The sisuChema image yielded RMSEP value of 0.29. The reproducible results obtained with the different data sets indicate that the method proposed in this paper has a real potential for future classification uses.
Resumo:
A new clustering technique, based on the concept of immediato neighbourhood, with a novel capability to self-learn the number of clusters expected in the unsupervized environment, has been developed. The method compares favourably with other clustering schemes based on distance measures, both in terms of conceptual innovations and computational economy. Test implementation of the scheme using C-l flight line training sample data in a simulated unsupervized mode has brought out the efficacy of the technique. The technique can easily be implemented as a front end to established pattern classification systems with supervized learning capabilities to derive unified learning systems capable of operating in both supervized and unsupervized environments. This makes the technique an attractive proposition in the context of remotely sensed earth resources data analysis wherein it is essential to have such a unified learning system capability.
Resumo:
Bangalore is experiencing unprecedented urbanisation in recent times due to concentrated developmental activities with impetus on IT (Information Technology) and BT (Biotechnology) sectors. The concentrated developmental activities has resulted in the increase in population and consequent pressure on infrastructure, natural resources, ultimately giving rise to a plethora of serious challenges such as urban flooding, climate change, etc. One of the perceived impact at local levels is the increase in sensible heat flux from the land surface to the atmosphere, which is also referred as heat island effect. In this communication, we report the changes in land surface temperature (LST) with respect to land cover changes during 1973 to 2007. A novel technique combining the information from sub-pixel class proportions with information from classified image (using signatures of the respective classes collected from the ground) has been used to achieve more reliable classification. The analysis showed positive correlation with the increase in paved surfaces and LST. 466% increase in paved surfaces (buildings, roads, etc.) has lead to the increase in LST by about 2 ºC during the last 2 decades, confirming urban heat island phenomenon. LSTs’ were relatively lower (~ 4 to 7 ºC) at land uses such as vegetation (parks/forests) and water bodies which act as heat sinks.
Resumo:
In this paper, we consider applying derived knowledge base regarding the sensitivity and specificity of damage(s) to be detected by an SHM system being designed and qualified. These efforts are necessary toward developing capabilities in SHM system to classify reliably various probable damages through sequence of monitoring, i.e., damage precursor identification, detection of damage and monitoring its progression. We consider the particular problem of visual and ultrasonic NDE based SHM system design requirements, where the damage detection sensitivity and specificity data definitions for a class of structural components are established. Methodologies for SHM system specification creation are discussed in details. Examples are shown to illustrate how the physics of damage detection scheme limits particular damage detection sensitivity and specificity and further how these information can be used in algorithms to combine various different NDE schemes in an SHM system to enhance efficiency and effectiveness. Statistical and data driven models to determine the sensitivity and probability of damage detection (POD) has been demonstrated for plate with varying one-sided line crack using optical and ultrasonic based inspection techniques.
Resumo:
Anthropogenic aerosols play a crucial role in our environment, climate, and health. Assessment of spatial and temporal variation in anthropogenic aerosols is essential to determine their impact. Aerosols are of natural and anthropogenic origin and together constitute a composite aerosol system. Information about either component needs elimination of the other from the composite aerosol system. In the present work we estimated the anthropogenic aerosol fraction (AF) over the Indian region following two different approaches and inter-compared the estimates. We espouse multi-satellite data analysis and model simulations (using the CHIMERE Chemical transport model) to derive natural aerosol distribution, which was subsequently used to estimate AF over the Indian subcontinent. These two approaches are significantly different from each other. Natural aerosol satellite-derived information was extracted in terms of optical depth while model simulations yielded mass concentration. Anthropogenic aerosol fraction distribution was studied over two periods in 2008: premonsoon (March-May) and winter (November-February) in regard to the known distinct seasonality in aerosol loading and type over the Indian region. Although both techniques have derived the same property, considerable differences were noted in temporal and spatial distribution. Satellite retrieval of AF showed maximum values during the pre-monsoon and summer months while lowest values were observed in winter. On the other hand, model simulations showed the highest concentration of AF in winter and the lowest during pre-monsoon and summer months. Both techniques provided an annual average AF of comparable magnitude (similar to 0.43 +/- 0.06 from the satellite and similar to 0.48 +/- 0.19 from the model). For winter months the model-estimated AF was similar to 0.62 +/- 0.09, significantly higher than that (0.39 +/- 0.05) estimated from the satellite, while during pre-monsoon months satellite-estimated AF was similar to 0.46 +/- 0.06 and the model simulation estimation similar to 0.53 +/- 0.14. Preliminary results from this work indicate that model-simulated results are nearer to the actual variation as compared to satellite estimation in view of general seasonal variation in aerosol concentrations.
Resumo:
The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.
It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.
The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.