977 resultados para statistical techniques
Resumo:
Electricity market price forecast is a changeling yet very important task for electricity market managers and participants. Due to the complexity and uncertainties in the power grid, electricity prices are highly volatile and normally carry with spikes. which may be (ens or even hundreds of times higher than the normal price. Such electricity spikes are very difficult to be predicted. So far. most of the research on electricity price forecast is based on the normal range electricity prices. This paper proposes a data mining based electricity price forecast framework, which can predict the normal price as well as the price spikes. The normal price can be, predicted by a previously proposed wavelet and neural network based forecast model, while the spikes are forecasted based on a data mining approach. This paper focuses on the spike prediction and explores the reasons for price spikes based on the measurement of a proposed composite supply-demand balance index (SDI) and relative demand index (RDI). These indices are able to reflect the relationship among electricity demand, electricity supply and electricity reserve capacity. The proposed model is based on a mining database including market clearing price, trading hour. electricity), demand, electricity supply and reserve. Bayesian classification and similarity searching techniques are used to mine the database to find out the internal relationships between electricity price spikes and these proposed. The mining results are used to form the price spike forecast model. This proposed model is able to generate forecasted price spike, level of spike and associated forecast confidence level. The model is tested with the Queensland electricity market data with promising results. Crown Copyright (C) 2004 Published by Elsevier B.V. All rights reserved.
Resumo:
Objective: This paper compares four techniques used to assess change in neuropsychological test scores before and after coronary artery bypass graft surgery (CABG), and includes a rationale for the classification of a patient as overall impaired. Methods: A total of 55 patients were tested before and after surgery on the MicroCog neuropsychological test battery. A matched control group underwent the same testing regime to generate test–retest reliabilities and practice effects. Two techniques designed to assess statistical change were used: the Reliable Change Index (RCI), modified for practice, and the Standardised Regression-based (SRB) technique. These were compared against two fixed cutoff techniques (standard deviation and 20% change methods). Results: The incidence of decline across test scores varied markedly depending on which technique was used to describe change. The SRB method identified more patients as declined on most measures. In comparison, the two fixed cutoff techniques displayed relatively reduced sensitivity in the detection of change. Conclusions: Overall change in an individual can be described provided the investigators choose a rational cutoff based on likely spread of scores due to chance. A cutoff value of ≥20% of test scores used provided acceptable probability based on the number of tests commonly encountered. Investigators must also choose a test battery that minimises shared variance among test scores.
Resumo:
Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.
Resumo:
This paper describes how modern machine learning techniques can be used in conjunction with statistical methods to forecast short term movements in exchange rates, producing models suitable for use in trading. It compares the results achieved by two different techniques, and shows how they can be used in a complementary fashion. The paper draws on experience of both inter- and intra-day forecasting taken from earlier studies conducted by Logica and Chemical Bank Quantitative Research and Trading (QRT) group's experience in developing trading models.
Resumo:
Using techniques from Statistical Physics, the annealed VC entropy for hyperplanes in high dimensional spaces is calculated as a function of the margin for a spherical Gaussian distribution of inputs.
Resumo:
As wireless network technologies evolve towards an All-IP framework, Next Generation Wireless Communication Devices demand better use of spectral resources by employing advanced techniques of silence suppression. This paper presents an analysis of VoIP call data and compares the statistical results based on observed patterns of talk spurts and silence lengths to those achieved by a modified on-off voice model for silence suppression in wireless networks. As talk spurts and silence lengths are sensitive to varying word lengths, temporal structure and other prosodic aspects of speech, the impact of the use of various languages, dialects and gender of speakers on these results is also assessed.
Resumo:
Using methods of statistical physics, we study the average number and kernel size of general sparse random matrices over GF(q), with a given connectivity profile, in the thermodynamical limit of large matrices. We introduce a mapping of GF(q) matrices onto spin systems using the representation of the cyclic group of order q as the q-th complex roots of unity. This representation facilitates the derivation of the average kernel size of random matrices using the replica approach, under the replica symmetric ansatz, resulting in saddle point equations for general connectivity distributions. Numerical solutions are then obtained for particular cases by population dynamics. Similar techniques also allow us to obtain an expression for the exact and average number of random matrices for any general connectivity profile. We present numerical results for particular distributions.
Resumo:
In this thesis we use statistical physics techniques to study the typical performance of four families of error-correcting codes based on very sparse linear transformations: Sourlas codes, Gallager codes, MacKay-Neal codes and Kanter-Saad codes. We map the decoding problem onto an Ising spin system with many-spins interactions. We then employ the replica method to calculate averages over the quenched disorder represented by the code constructions, the arbitrary messages and the random noise vectors. We find, as the noise level increases, a phase transition between successful decoding and failure phases. This phase transition coincides with upper bounds derived in the information theory literature in most of the cases. We connect the practical decoding algorithm known as probability propagation with the task of finding local minima of the related Bethe free-energy. We show that the practical decoding thresholds correspond to noise levels where suboptimal minima of the free-energy emerge. Simulations of practical decoding scenarios using probability propagation agree with theoretical predictions of the replica symmetric theory. The typical performance predicted by the thermodynamic phase transitions is shown to be attainable in computation times that grow exponentially with the system size. We use the insights obtained to design a method to calculate the performance and optimise parameters of the high performance codes proposed by Kanter and Saad.
Resumo:
Background - MHC Class I molecules present antigenic peptides to cytotoxic T cells, which forms an integral part of the adaptive immune response. Peptides are bound within a groove formed by the MHC heavy chain. Previous approaches to MHC Class I-peptide binding prediction have largely concentrated on the peptide anchor residues located at the P2 and C-terminus positions. Results - A large dataset comprising MHC-peptide structural complexes was created by re-modelling pre-determined x-ray crystallographic structures. Static energetic analysis, following energy minimisation, was performed on the dataset in order to characterise interactions between bound peptides and the MHC Class I molecule, partitioning the interactions within the groove into van der Waals, electrostatic and total non-bonded energy contributions. Conclusion - The QSAR techniques of Genetic Function Approximation (GFA) and Genetic Partial Least Squares (G/PLS) algorithms were used to identify key interactions between the two molecules by comparing the calculated energy values with experimentally-determined BL50 data. Although the peptide termini binding interactions help ensure the stability of the MHC Class I-peptide complex, the central region of the peptide is also important in defining the specificity of the interaction. As thermodynamic studies indicate that peptide association and dissociation may be driven entropically, it may be necessary to incorporate entropic contributions into future calculations.
Resumo:
We report results of an experimental study, complemented by detailed statistical analysis of the experimental data, on the development of a more effective control method of drug delivery using a pH sensitive acrylic polymer. New copolymers based on acrylic acid and fatty acid are constructed from dodecyl castor oil and a tercopolymer based on methyl methacrylate, acrylic acid and acryl amide were prepared using this new approach. Water swelling characteristics of fatty acid, acrylic acid copolymer and tercopolymer respectively in acid and alkali solutions have been studied by a step-change method. The antibiotic drug cephalosporin and paracetamol have also been incorporated into the polymer blend through dissolution with the release of the antibiotic drug being evaluated in bacterial stain media and buffer solution. Our results show that the rate of release of paracetamol getss affected by the pH factor and also by the nature of polymer blend. Our experimental data have later been statistically analyzed to quantify the precise nature of polymer decay rates on the pH density of the relevant polymer solvents. The time evolution of the polymer decay rates indicate a marked transition from a linear to a strictly non-linear regime depending on the whether the chosen sample is a general copolymer (linear) or a tercopolymer (non-linear). Non-linear data extrapolation techniques have been used to make probabilistic predictions about the variation in weight percentages of retained polymers at all future times, thereby quantifying the degree of efficacy of the new method of drug delivery.
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
A comprehensive investigation of sensitive ecosystems in South Florida with the main goal of determining the identity, spatial distribution, and sources of both organic biocides and trace elements in different environmental compartments is reported. This study presents the development and validation of a fractionation and isolation method of twelve polar acidic herbicides commonly applied in the vicinity of the study areas, including e.g. 2,4-D, MCPA, dichlorprop, mecroprop, picloram in surface water. Solid phase extraction (SPE) was used to isolate the analytes from abiotic matrices containing large amounts of dissolved organic material. Atmospheric-pressure ionization (API) with electrospray ionization in negative mode (ESP-) in a Quadrupole Ion Trap mass spectrometer was used to perform the characterization of the herbicides of interest. ^ The application of Laser Ablation-ICP-MS methodology in the analysis of soils and sediments is reported in this study. The analytical performance of the method was evaluated on certified standards and real soil and sediment samples. Residential soils were analyzed to evaluate feasibility of using the powerful technique as a routine and rapid method to monitor potential contaminated sites. Forty eight sediments were also collected from semi pristine areas in South Florida to conduct screening of baseline levels of bioavailable elements in support of risk evaluation. The LA-ICP-MS data were used to perform a statistical evaluation of the elemental composition as a tool for environmental forensics. ^ A LA-ICP-MS protocol was also developed and optimized for the elemental analysis of a wide range of elements in polymeric filters containing atmospheric dust. A quantitative strategy based on internal and external standards allowed for a rapid determination of airborne trace elements in filters containing both contemporary African dust and local dust emissions. These distributions were used to qualitative and quantitative assess differences of composition and to establish provenance and fluxes to protected regional ecosystems such as coral reefs and national parks. ^
Resumo:
A comprehensive investigation of sensitive ecosystems in South Florida with the main goal of determining the identity, spatial distribution, and sources of both organic biocides and trace elements in different environmental compartments is reported. This study presents the development and validation of a fractionation and isolation method of twelve polar acidic herbicides commonly applied in the vicinity of the study areas, including e.g. 2,4-D, MCPA, dichlorprop, mecroprop, picloram in surface water. Solid phase extraction (SPE) was used to isolate the analytes from abiotic matrices containing large amounts of dissolved organic material. Atmospheric-pressure ionization (API) with electrospray ionization in negative mode (ESP-) in a Quadrupole Ion Trap mass spectrometer was used to perform the characterization of the herbicides of interest. The application of Laser Ablation-ICP-MS methodology in the analysis of soils and sediments is reported in this study. The analytical performance of the method was evaluated on certified standards and real soil and sediment samples. Residential soils were analyzed to evaluate feasibility of using the powerful technique as a routine and rapid method to monitor potential contaminated sites. Forty eight sediments were also collected from semi pristine areas in South Florida to conduct screening of baseline levels of bioavailable elements in support of risk evaluation. The LA-ICP-MS data were used to perform a statistical evaluation of the elemental composition as a tool for environmental forensics. A LA-ICP-MS protocol was also developed and optimized for the elemental analysis of a wide range of elements in polymeric filters containing atmospheric dust. A quantitative strategy based on internal and external standards allowed for a rapid determination of airborne trace elements in filters containing both contemporary African dust and local dust emissions. These distributions were used to qualitative and quantitative assess differences of composition and to establish provenance and fluxes to protected regional ecosystems such as coral reefs and national parks.
Resumo:
This research is funded by UK Medical Research Council grant number MR/L011115/1. We would like to thank the 105 experts in behaviour change who have committed their time and offered their expertise for study 2 of this research. We are also very grateful to all those who sent us peer-reviewed behaviour change intervention descriptions for study 1. Finally, we would like thank Dr. Emma Beard and Dr. Dan Dediu for their statistical input and to all the researchers, particularly Holly Walton, who have assisted in the coding of papers for study 1.