972 resultados para Regression method
Resumo:
2000 Mathematics Subject Classification: 62J12, 62K15, 91B42, 62H99.
Resumo:
2000 Mathematics Subject Classification: 62F10, 62J05, 62P30
Resumo:
2000 Mathematics Subject Classification: Primary 60G55; secondary 60G25.
Resumo:
Considering the so-called "multinomial discrete choice" model the focus of this paper is on the estimation problem of the parameters. Especially, the basic question arises how to carry out the point and interval estimation of the parameters when the model is mixed i.e. includes both individual and choice-specific explanatory variables while a standard MDC computer program is not available for use. The basic idea behind the solution is the use of the Cox-proportional hazards method of survival analysis which is available in any standard statistical package and provided a data structure satisfying certain special requirements it yields the MDC solutions desired. The paper describes the features of the data set to be analysed.
Resumo:
The L-moments based index-flood procedure had been successfully applied for Regional Flood Frequency Analysis (RFFA) for the Island of Newfoundland in 2002 using data up to 1998. This thesis, however, considered both Labrador and the Island of Newfoundland using the L-Moments index-flood method with flood data up to 2013. For Labrador, the homogeneity test showed that Labrador can be treated as a single homogeneous region and the generalized extreme value (GEV) was found to be more robust than any other frequency distributions. The drainage area (DA) is the only significant variable for estimating the index-flood at ungauged sites in Labrador. In previous studies, the Island of Newfoundland has been considered as four homogeneous regions (A,B,C and D) as well as two Water Survey of Canada's Y and Z sub-regions. Homogeneous regions based on Y and Z was found to provide more accurate quantile estimates than those based on four homogeneous regions. Goodness-of-fit test results showed that the generalized extreme value (GEV) distribution is most suitable for the sub-regions; however, the three-parameter lognormal (LN3) gave a better performance in terms of robustness. The best fitting regional frequency distribution from 2002 has now been updated with the latest flood data, but quantile estimates with the new data were not very different from the previous study. Overall, in terms of quantile estimation, in both Labrador and the Island of Newfoundland, the index-flood procedure based on L-moments is highly recommended as it provided consistent and more accurate result than other techniques such as the regression on quantile technique that is currently used by the government.
Resumo:
Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.
While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.
For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.
Resumo:
Quantile regression (QR) was first introduced by Roger Koenker and Gilbert Bassett in 1978. It is robust to outliers which affect least squares estimator on a large scale in linear regression. Instead of modeling mean of the response, QR provides an alternative way to model the relationship between quantiles of the response and covariates. Therefore, QR can be widely used to solve problems in econometrics, environmental sciences and health sciences. Sample size is an important factor in the planning stage of experimental design and observational studies. In ordinary linear regression, sample size may be determined based on either precision analysis or power analysis with closed form formulas. There are also methods that calculate sample size based on precision analysis for QR like C.Jennen-Steinmetz and S.Wellek (2005). A method to estimate sample size for QR based on power analysis was proposed by Shao and Wang (2009). In this paper, a new method is proposed to calculate sample size based on power analysis under hypothesis test of covariate effects. Even though error distribution assumption is not necessary for QR analysis itself, researchers have to make assumptions of error distribution and covariate structure in the planning stage of a study to obtain a reasonable estimate of sample size. In this project, both parametric and nonparametric methods are provided to estimate error distribution. Since the method proposed can be implemented in R, user is able to choose either parametric distribution or nonparametric kernel density estimation for error distribution. User also needs to specify the covariate structure and effect size to carry out sample size and power calculation. The performance of the method proposed is further evaluated using numerical simulation. The results suggest that the sample sizes obtained from our method provide empirical powers that are closed to the nominal power level, for example, 80%.
Resumo:
The viscosity of ionic liquids (ILs) has been modeled as a function of temperature and at atmospheric pressure using a new method based on the UNIFAC–VISCO method. This model extends the calculations previously reported by our group (see Zhao et al. J. Chem. Eng. Data 2016, 61, 2160–2169) which used 154 experimental viscosity data points of 25 ionic liquids for regression of a set of binary interaction parameters and ion Vogel–Fulcher–Tammann (VFT) parameters. Discrepancies in the experimental data of the same IL affect the quality of the correlation and thus the development of the predictive method. In this work, mathematical gnostics was used to analyze the experimental data from different sources and recommend one set of reliable data for each IL. These recommended data (totally 819 data points) for 70 ILs were correlated using this model to obtain an extended set of binary interaction parameters and ion VFT parameters, with a regression accuracy of 1.4%. In addition, 966 experimental viscosity data points for 11 binary mixtures of ILs were collected from literature to establish this model. All the binary data consist of 128 training data points used for the optimization of binary interaction parameters and 838 test data points used for the comparison of the pure evaluated values. The relative average absolute deviation (RAAD) for training and test is 2.9% and 3.9%, respectively.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
We present a detailed analysis of the application of a multi-scale Hierarchical Reconstruction method for solving a family of ill-posed linear inverse problems. When the observations on the unknown quantity of interest and the observation operators are known, these inverse problems are concerned with the recovery of the unknown from its observations. Although the observation operators we consider are linear, they are inevitably ill-posed in various ways. We recall in this context the classical Tikhonov regularization method with a stabilizing function which targets the specific ill-posedness from the observation operators and preserves desired features of the unknown. Having studied the mechanism of the Tikhonov regularization, we propose a multi-scale generalization to the Tikhonov regularization method, so-called the Hierarchical Reconstruction (HR) method. First introduction of the HR method can be traced back to the Hierarchical Decomposition method in Image Processing. The HR method successively extracts information from the previous hierarchical residual to the current hierarchical term at a finer hierarchical scale. As the sum of all the hierarchical terms, the hierarchical sum from the HR method provides an reasonable approximate solution to the unknown, when the observation matrix satisfies certain conditions with specific stabilizing functions. When compared to the Tikhonov regularization method on solving the same inverse problems, the HR method is shown to be able to decrease the total number of iterations, reduce the approximation error, and offer self control of the approximation distance between the hierarchical sum and the unknown, thanks to using a ladder of finitely many hierarchical scales. We report numerical experiments supporting our claims on these advantages the HR method has over the Tikhonov regularization method.
Resumo:
Background: Body composition is affected by diseases, and affects responses to medical treatments, dosage of medicines, etc., while an abnormal body composition contributes to the causation of many chronic diseases. While we have reliable biochemical tests for certain nutritional parameters of body composition, such as iron or iodine status, and we have harnessed nuclear physics to estimate the body’s content of trace elements, the very basic quantification of body fat content and muscle mass remains highly problematic. Both body fat and muscle mass are vitally important, as they have opposing influences on chronic disease, but they have seldom been estimated as part of population health surveillance. Instead, most national surveys have merely reported BMI and waist, or sometimes the waist/hip ratio; these indices are convenient but do not have any specific biological meaning. Anthropometry offers a practical and inexpensive method for muscle and fat estimation in clinical and epidemiological settings; however, its use is imperfect due to many limitations, such as a shortage of reference data, misuse of terminology, unclear assumptions, and the absence of properly validated anthropometric equations. To date, anthropometric methods are not sensitive enough to detect muscle and fat loss. Aims: The aim of this thesis is to estimate Adipose/fat and muscle mass in health disease and during weight loss through; 1. evaluating and critiquing the literature, to identify the best-published prediction equations for adipose/fat and muscle mass estimation; 2. to derive and validate adipose tissue and muscle mass prediction equations; and 3.to evaluate the prediction equations along with anthropometric indices and the best equations retrieved from the literature in health, metabolic illness and during weight loss. Methods: a Systematic review using Cochrane Review method was used for reviewing muscle mass estimation papers that used MRI as the reference method. Fat mass estimation papers were critically reviewed. Mixed ethnic, age and body mass data that underwent whole body magnetic resonance imaging to quantify adipose tissue and muscle mass (dependent variable) and anthropometry (independent variable) were used in the derivation/validation analysis. Multiple regression and Bland-Altman plot were applied to evaluate the prediction equations. To determine how well the equations identify metabolic illness, English and Scottish health surveys were studied. Statistical analysis using multiple regression and binary logistic regression were applied to assess model fit and associations. Also, populations were divided into quintiles and relative risk was analysed. Finally, the prediction equations were evaluated by applying them to a pilot study of 10 subjects who underwent whole-body MRI, anthropometric measurements and muscle strength before and after weight loss to determine how well the equations identify adipose/fat mass and muscle mass change. Results: The estimation of fat mass has serious problems. Despite advances in technology and science, prediction equations for the estimation of fat mass depend on limited historical reference data and remain dependent upon assumptions that have not yet been properly validated for different population groups. Muscle mass does not have the same conceptual problems; however, its measurement is still problematic and reference data are scarce. The derivation and validation analysis in this thesis was satisfactory, compared to prediction equations in the literature they were similar or even better. Applying the prediction equations in metabolic illness and during weight loss presented an understanding on how well the equations identify metabolic illness showing significant associations with diabetes, hypertension, HbA1c and blood pressure. And moderate to high correlations with MRI-measured adipose tissue and muscle mass before and after weight loss. Conclusion: Adipose tissue mass and to an extent muscle mass can now be estimated for many purposes as population or groups means. However, these equations must not be used for assessing fatness and categorising individuals. Further exploration in different populations and health surveys would be valuable.
Resumo:
Assessing the fit of a model is an important final step in any statistical analysis, but this is not straightforward when complex discrete response models are used. Cross validation and posterior predictions have been suggested as methods to aid model criticism. In this paper a comparison is made between four methods of model predictive assessment in the context of a three level logistic regression model for clinical mastitis in dairy cattle; cross validation, a prediction using the full posterior predictive distribution and two “mixed” predictive methods that incorporate higher level random effects simulated from the underlying model distribution. Cross validation is considered a gold standard method but is computationally intensive and thus a comparison is made between posterior predictive assessments and cross validation. The analyses revealed that mixed prediction methods produced results close to cross validation whilst the full posterior predictive assessment gave predictions that were over-optimistic (closer to the observed disease rates) compared with cross validation. A mixed prediction method that simulated random effects from both higher levels was best at identifying the outlying level two (farm-year) units of interest. It is concluded that this mixed prediction method, simulating random effects from both higher levels, is straightforward and may be of value in model criticism of multilevel logistic regression, a technique commonly used for animal health data with a hierarchical structure.
A method for the estimation of potential evapotranspiration and/or open pan evaporation over Brazil.
Resumo:
This paper presents a simple regression model to estimate potential evapotranspiration and/or open pan evaporation data for a wide network of stations in Brazil. The model uses the readily available data sets like geocoordinates (latitude) and precipitation as inputs. Potential evapotranspiration presents a high correlation with the precipitation during summer months and with latitude during winter months. It also shows association with longitude and elevation; the magnitude of variation appears to be very small. This model gave a R2 varying from 0.460 to 0.902 for different months. The model is also extended to weekly periods of individual years ant tested with the open pan evaporation data of Bebedouro and Mandacaru. The agreement between observed and predicted values appears to be good.
Resumo:
The purpose of this study was to validate and cross-validate the Beunen-Malina-Freitas method for non-invasive prediction of adult height in girls. A sample of 420 girls aged 10–15 years from the Madeira Growth Study were measured at yearly intervals and then 8 years later. Anthropometric dimensions (lengths, breadths, circumferences, and skinfolds) were measured; skeletal age was assessed using the Tanner-Whitehouse 3 method and menarcheal status (present or absent) was recorded. Adult height was measured and predicted using stepwise, forward, and maximum R2 regression techniques. Multiple correlations, mean differences, standard errors of prediction, and error boundaries were calculated. A sample of the Leuven Longitudinal Twin Study was used to cross-validate the regressions. Age-specific coefficients of determination (R2) between predicted and measured adult height varied between 0.57 and 0.96, while standard errors of prediction varied between 1.1 and 3.9 cm. The cross-validation confirmed the validity of the Beunen-Malina-Freitas method in girls aged 12–15 years, but at lower ages the cross-validation was less consistent. We conclude that the Beunen-Malina-Freitas method is valid for the prediction of adult height in girls aged 12–15 years. It is applicable to European populations or populations of European ancestry.
Resumo:
Objective: Information on factors associated with suicide among young individuals in Ireland is limited. The aim of this study was to identify socio-demographic characteristics and circumstances of death associated with age among individuals who died by suicide. Methods: The study examined 121 consecutive suicides (2007–2012) occurring in the southern eastern part of Ireland (Cork city and county). Data were obtained from coroners, family informants, and health care professionals. A comparison was made between 15-24-year-old and 25-34-year-old individuals. Socio-demographic characteristics of the deceased, methods of suicide, history of alcohol and drug abuse, and findings from toxicological analysis of blood and urine samples taken at post mortem were included. Pearson’s χ2 tests and binary logistic regression analysis were performed. Results: Alcohol and/or drugs were detected through toxicological analysis for the majority of the total sample (79.5%), which did not differentiate between 15-24-year-old and 25-34-year-old individuals (74.1% and 86.2% respectively). Compared to 25-34-year-old individuals, 15-24-year-old individuals were more likely to engage in suicide by hanging (88.5%). Younger individuals were less likely to die by intentional drug overdose and carbon monoxide poisoning compared to older individuals. Younger individuals who died between Saturday and Monday were more likely to have had alcohol before dying. Substance abuse histories were similar in the two age groups. Conclusion: Based on this research it is recommended that strategies to reduce substance abuse be applied among 25-34-year-old individuals at risk of suicide. The wide use of hanging in young people should be taken into consideration for future means restriction strategies.