87 resultados para Statistical Prediction
Resumo:
Statistical tests of Load-Unload Response Ratio (LURR) signals are carried in order to verify statistical robustness of the previous studies using the Lattice Solid Model (MORA et al., 2002b). In each case 24 groups of samples with the same macroscopic parameters (tidal perturbation amplitude A, period T and tectonic loading rate k) but different particle arrangements are employed. Results of uni-axial compression experiments show that before the normalized time of catastrophic failure, the ensemble average LURR value rises significantly, in agreement with the observations of high LURR prior to the large earthquakes. In shearing tests, two parameters are found to control the correlation between earthquake occurrence and tidal stress. One is, A/(kT) controlling the phase shift between the peak seismicity rate and the peak amplitude of the perturbation stress. With an increase of this parameter, the phase shift is found to decrease. Another parameter, AT/k, controls the height of the probability density function (Pdf) of modeled seismicity. As this parameter increases, the Pdf becomes sharper and narrower, indicating a strong triggering. Statistical studies of LURR signals in shearing tests also suggest that except in strong triggering cases, where LURR cannot be calculated due to poor data in unloading cycles, the larger events are more likely to occur in higher LURR periods than the smaller ones, supporting the LURR hypothesis.
Resumo:
This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.
Resumo:
The Accelerating Moment Release (AMR) preceding earthquakes with magnitude above 5 in Australia that occurred during the last 20 years was analyzed to test the Critical Point Hypothesis. Twelve earthquakes in the catalog were chosen based on a criterion for the number of nearby events. Results show that seven sequences with numerous events recorded leading up to the main earthquake exhibited accelerating moment release. Two occurred near in time and space to other earthquakes preceded by AM R. The remaining three sequences had very few events in the catalog so the lack of AMR detected in the analysis may be related to catalog incompleteness. Spatio-temporal scanning of AMR parameters shows that 80% of the areas in which AMR occurred experienced large events. In areas of similar background seismicity with no large events, 10 out of 12 cases exhibit no AMR, and two others are false alarms where AMR was observed but no large event followed. The relationship between AMR and Load-Unload Response Ratio (LURR) was studied. Both methods predict similar critical region sizes, however, the critical point time using AMR is slightly earlier than the time of the critical point LURR anomaly.
Resumo:
A modified UNIQUAC model has been extended to describe and predict the equilibrium relative humidity and moisture content for wood. The method is validated over a range of moisture content from oven-dried state to fiber saturation point, and over a temperature range of 20-70 degrees C. Adjustable parameters and binary interaction parameters of the UNIQUAC model were estimated from experimental data for Caribbean pine and Hoop pine as well as data available in the literature. The two group-interaction parameters for the wood-moisture system were consistent with using function group contributions for H2O, -OH and -CHO. The result reconfirms that the main contributors to water adsorption in cell walls are the hydroxyl groups of the carbohydrates in cellulose and hemicelluloses. This provides some physical insight into the intermolecular force and energy between bound water and the wood material. (c) 2006 Elsevier Ltd. All rights reserved.
Resumo:
Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Non-technical losses (NTL) identification and prediction are important tasks for many utilities. Data from customer information system (CIS) can be used for NTL analysis. However, in order to accurately and efficiently perform NTL analysis, the original data from CIS need to be pre-processed before any detailed NTL analysis can be carried out. In this paper, we propose a feature selection based method for CIS data pre-processing in order to extract the most relevant information for further analysis such as clustering and classifications. By removing irrelevant and redundant features, feature selection is an essential step in data mining process in finding optimal subset of features to improve the quality of result by giving faster time processing, higher accuracy and simpler results with fewer features. Detailed feature selection analysis is presented in the paper. Both time-domain and load shape data are compared based on the accuracy, consistency and statistical dependencies between features.
Resumo:
Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We recently evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach in delineating breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.
Resumo:
In recent years, the phrase 'genomic medicine' has increasingly been used to describe a new development in medicine that holds great promise for human health. This new approach to health care uses the knowledge of an individual's genetic make-up to identify those that are at a higher risk of developing certain diseases and to intervene at an earlier stage to prevent these diseases. Identifying genes that are involved in disease aetiology will provide researchers with tools to develop better treatments and cures. A major role within this field is attributed to 'predictive genomic medicine', which proposes screening healthy individuals to identify those who carry alleles that increase their susceptibility to common diseases, such as cancers and heart disease. Physicians could then intervene even before the disease manifests and advise individuals with a higher genetic risk to change their behaviour - for instance, to exercise or to eat a healthier diet - or offer drugs or other medical treatment to reduce their chances of developing these diseases. These promises have fallen on fertile ground among politicians, health-care providers and the general public, particularly in light of the increasing costs of health care in developed societies. Various countries have established databases on the DNA and health information of whole populations as a first step towards genomic medicine. Biomedical research has also identified a large number of genes that could be used to predict someone's risk of developing a certain disorder. But it would be premature to assume that genomic medicine will soon become reality, as many problems remain to be solved. Our knowledge about most disease genes and their roles is far from sufficient to make reliable predictions about a patient’s risk of actually developing a disease. In addition, genomic medicine will create new political, social, ethical and economic challenges that will have to be addressed in the near future.
Resumo:
Three main models of parameter setting have been proposed: the Variational model proposed by Yang (2002; 2004), the Structured Acquisition model endorsed by Baker (2001; 2005), and the Very Early Parameter Setting (VEPS) model advanced by Wexler (1998). The VEPS model contends that parameters are set early. The Variational model supposes that children employ statistical learning mechanisms to decide among competing parameter values, so this model anticipates delays in parameter setting when critical input is sparse, and gradual setting of parameters. On the Structured Acquisition model, delays occur because parameters form a hierarchy, with higher-level parameters set before lower-level parameters. Assuming that children freely choose the initial value, children sometimes will miss-set parameters. However when that happens, the input is expected to trigger a precipitous rise in one parameter value and a corresponding decline in the other value. We will point to the kind of child language data that is needed in order to adjudicate among these competing models.
Resumo:
Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We previously evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach to delineate breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.
Resumo:
Multifrequency bioimpedance analysis has the potential to provide a non-invasive technique for determining body composition in live cattle. A bioimpedance meter developed for use in clinical medicine was adapted and evaluated in 2 experiments using a total of 31 cattle. Prediction equations were obtained for total body water, extracellular body water, intracellular body water, carcass water and carcass protein. There were strong correlations between the results obtained through chemical markers and bioimpedance analysis when determined in cattle that had a wide range of liveweights and conditions. The r(2) values obtained were 0.87 and 0.91 for total body water and extracellular body water respectively. Bioimpedance also correlated with carcass water, measured by chemical analysis (r(2) = 0.72), but less well with carcass protein (r(2) = 0.46). These correlations were improved by inclusion of liveweight and sex as variables in multiple regression analysis. However, the resultant equations were poor predictors of protein and water content in the carcasses of a group of small underfed beef cattle, that had a narrow range of liveweights. In this case, although there was no statistical difference between the predicted and measured values overall, bioimpedance analysis did not detect the differences in carcass protein between the 2 groups that were apparent following chemical analysis. Further work is required to determine the sensitivity of the technique in small underfed cattle, and its potential use in heavier well fed cattle close to slaughter weight.
Resumo:
Multi-frequency bioimpedance analysis (MFBIA) was used to determine the impedance, reactance and resistance of 103 lamb carcasses (17.1-34.2 kg) immediately after slaughter and evisceration. Carcasses were halved, frozen and one half subsequently homogenized and analysed for water, crude protein and fat content. Three measures of carcass length were obtained. Diagonal length between the electrodes (right side biceps femoris to left side of neck) explained a greater proportion of the variance in water mass than did estimates of spinal length and was selected for use in the index L-2/Z to predict the mass of chemical components in the carcass. Use of impedance (Z) measured at the characteristic frequency (Z(c)) instead of 50 kHz (Z(50)) did not improve the power of the model to predict the mass of water, protein or fat in the carcass. While L-2/Z(50) explained a significant proportion of variation in the masses of body water (r(2) 0.64), protein (r(2) 0.34) and fat (r(2) 0.35), its inclusion in multi-variate indices offered small or no increases in predictive capacity when hot carcass weight (HCW) and a measure of rib fat-depth (GR) were present in the model. Optimized equations were able to account for 65-90 % of the variance observed in the weight of chemical components in the carcass. It is concluded that single frequency impedance data do not provide better prediction of carcass composition than can be obtained from measures of HCW and GR. Indices of intracellular water mass derived from impedance at zero frequency and the characteristic frequency explained a similar proportion of the variance in carcass protein mass as did the index L-2/Z(50).
Resumo:
Motivation: Prediction methods for identifying binding peptides could minimize the number of peptides required to be synthesized and assayed, and thereby facilitate the identification of potential T-cell epitopes. We developed a bioinformatic method for the prediction of peptide binding to MHC class II molecules. Results: Experimental binding data and expert knowledge of anchor positions and binding motifs were combined with an evolutionary algorithm (EA) and an artificial neural network (ANN): binding data extraction --> peptide alignment --> ANN training and classification. This method, termed PERUN, was implemented for the prediction of peptides that bind to HLA-DR4(B1*0401). The respective positive predictive values of PERUN predictions of high-, moderate-, low- and zero-affinity binder-a were assessed as 0.8, 0.7, 0.5 and 0.8 by cross-validation, and 1.0, 0.8, 0.3 and 0.7 by experimental binding. This illustrates the synergy between experimentation and computer modeling, and its application to the identification of potential immunotheraaeutic peptides.
Resumo:
The present study examined the relative importance of outcome expectancies and self-efficacy [1] in the prediction of alcohol dependence [2] and alcohol consumption in a sample of young adult drinkers drawn from a milieu previously reported as supportive of risky drinking. In predicting alcohol dependence, outcome expectancies were found to mediate self-efficacy and the same pattern was found for both males and females. This suggests that male and female drinkers may become more similar as they progress along the drinking continuum from risky drinking to dependent drinking. However, in women, in comparison to men, a greater array of expectancies and self-efficacy scales were found to predict heavy drinking, as measured by quantity and frequency. These results suggest that heavy drinking women are particularly at risk of developing drinking related complications and that preventative education needs to take into account gender differences.