973 resultados para Fitting model
Resumo:
We investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 families were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two complementary approaches for analyzing correlated time-to-onset family data were considered: the generalized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to changing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression parameters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for estimating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma.
Resumo:
Sorghum is the main dryland summer crop in NE Australia and a number of agricultural businesses would benefit from an ability to forecast production likelihood at regional scale. In this study we sought to develop a simple agro-climatic modelling approach for predicting shire (statistical local area) sorghum yield. Actual shire yield data, available for the period 1983-1997 from the Australian Bureau of Statistics, were used to train the model. Shire yield was related to a water stress index (SI) that was derived from the agro-climatic model. The model involved a simple fallow and crop water balance that was driven by climate data available at recording stations within each shire. Parameters defining the soil water holding capacity, maximum number of sowings (MXNS) in any year, planting rainfall requirement, and critical period for stress during the crop cycle were optimised as part of the model fitting procedure. Cross-validated correlations (CVR) ranged from 0.5 to 0.9 at shire scale. When aggregated to regional and national scales, 78-84% of the annual variation in sorghum yield was explained. The model was used to examine trends in sorghum productivity and the approach to using it in an operational forecasting system was outlined. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.
Resumo:
In this paper, we examine the problem of fitting a hypersphere to a set of noisy measurements of points on its surface. Our work generalises an estimator of Delogne (Proc. IMEKO-Symp. Microwave Measurements 1972,117-123) which he proposed for circles and which has been shown by Kasa (IEEE Trans. Instrum. Meas. 25, 1976, 8-14) to be convenient for its ease of analysis and computation. We also generalise Chan's 'circular functional relationship' to describe the distribution of points. We derive the Cramer-Rao lower bound (CRLB) under this model and we derive approximations for the mean and variance for fixed sample sizes when the noise variance is small. We perform a statistical analysis of the estimate of the hypersphere's centre. We examine the existence of the mean and variance of the estimator for fixed sample sizes. We find that the mean exists when the number of sample points is greater than M + 1, where M is the dimension of the hypersphere. The variance exists when the number of sample points is greater than M + 2. We find that the bias approaches zero as the noise variance diminishes and that the variance approaches the CRLB. We provide simulation results to support our findings.
Resumo:
We describe a template model for perception of edge blur and identify a crucial early nonlinearity in this process. The main principle is to spatially filter the edge image to produce a 'signature', and then find which of a set of templates best fits that signature. Psychophysical blur-matching data strongly support the use of a second-derivative signature, coupled to Gaussian first-derivative templates. The spatial scale of the best-fitting template signals the edge blur. This model predicts blur-matching data accurately for a wide variety of Gaussian and non-Gaussian edges, but it suffers a bias when edges of opposite sign come close together in sine-wave gratings and other periodic images. This anomaly suggests a second general principle: the region of an image that 'belongs' to a given edge should have a consistent sign or direction of luminance gradient. Segmentation of the gradient profile into regions of common sign is achieved by implementing the second-derivative 'signature' operator as two first-derivative operators separated by a half-wave rectifier. This multiscale system of nonlinear filters predicts perceived blur accurately for periodic and aperiodic waveforms. We also outline its extension to 2-D images and infer the 2-D shape of the receptive fields.
Resumo:
Non-linear relationships are common in microbiological research and often necessitate the use of the statistical techniques of non-linear regression or curve fitting. In some circumstances, the investigator may wish to fit an exponential model to the data, i.e., to test the hypothesis that a quantity Y either increases or decays exponentially with increasing X. This type of model is straight forward to fit as taking logarithms of the Y variable linearises the relationship which can then be treated by the methods of linear regression.
Resumo:
In some circumstances, there may be no scientific model of the relationship between X and Y that can be specified in advance and indeed the objective of the investigation may be to provide a ‘curve of best fit’ for predictive purposes. In such an example, the fitting of successive polynomials may be the best approach. There are various strategies to decide on the polynomial of best fit depending on the objectives of the investigation.
Resumo:
We discuss aggregation of data from neuropsychological patients and the process of evaluating models using data from a series of patients. We argue that aggregation can be misleading but not aggregating can also result in information loss. The basis for combining data needs to be theoretically defined, and the particular method of aggregation depends on the theoretical question and characteristics of the data. We present examples, often drawn from our own research, to illustrate these points. We also argue that statistical models and formal methods of model selection are a useful way to test theoretical accounts using data from several patients in multiple-case studies or case series. Statistical models can often measure fit in a way that explicitly captures what a theory allows; the parameter values that result from model fitting often measure theoretically important dimensions and can lead to more constrained theories or new predictions; and model selection allows the strength of evidence for models to be quantified without forcing this into the artificial binary choice that characterizes hypothesis testing methods. Methods that aggregate and then formally model patient data, however, are not automatically preferred to other methods. Which method is preferred depends on the question to be addressed, characteristics of the data, and practical issues like availability of suitable patients, but case series, multiple-case studies, single-case studies, statistical models, and process models should be complementary methods when guided by theory development.
Resumo:
This paper provides the most fully comprehensive evidence to date on whether or not monetary aggregates are valuable for forecasting US inflation in the early to mid 2000s. We explore a wide range of different definitions of money, including different methods of aggregation and different collections of included monetary assets. We use non-linear, artificial intelligence techniques, namely, recurrent neural networks, evolution strategies and kernel methods in our forecasting experiment. In the experiment, these three methodologies compete to find the best fitting US inflation forecasting models and are then compared to forecasts from a naive random walk model. The best models were non-linear autoregressive models based on kernel methods. Our findings do not provide much support for the usefulness of monetary aggregates in forecasting inflation. There is evidence in the literature that evolutionary methods can be used to evolve kernels hence our future work should combine the evolutionary and kernel methods to get the benefits of both.
Resumo:
The quantitative analysis of receptor-mediated effect is based on experimental concentration-response data in which the independent variable, the concentration of a receptor ligand, is linked with a dependent variable, the biological response. The steps between the drug–receptor interaction and the subsequent biological effect are to some extent unknown. The shape of the fitting curve of the experimental data may give some in-sights into the nature of the concentration–receptor–response (C-R-R) mechanism. It can be evaluated by non-linear regression analysis of the experimental data points of the independent and dependent variables, which could be considered as a history of the interaction between the drug and receptors. However, this information is not enough to evaluate such important parameters of the mechanism as the dissociation constant (affinity) and efficacy. There are two ways to provide more detailed information about the C-R-R mechanism: (i) an experimental way for obtaining data with new or
Resumo:
Setting out from the database of Operophtera brumata, L. in between 1973 and 2000 due to the Light Trap Network in Hungary, we introduce a simple theta-logistic population dynamical model based on endogenous and exogenous factors, only. We create an indicator set from which we can choose some elements with which we can improve the fitting results the most effectively. Than we extend the basic simple model with additive climatic factors. The parameter optimization is based on the minimized root mean square error. The best model is chosen according to the Akaike Information Criterion. Finally we run the calibrated extended model with daily outputs of the regional climate model RegCM3.1, regarding 1961-1990 as reference period and 2021-2050 with 2071-2100 as future predictions. The results of the three time intervals are fitted with Beta distributions and compared statistically. The expected changes are discussed.
Resumo:
Ecological models have often been used in order to answer questions that are in the limelight of recent researches such as the possible effects of climate change. The methodology of tactical models is a very useful tool comparison to those complex models requiring relatively large set of input parameters. In this study, a theoretical strategic model (TEGM ) was adapted to the field data on the basis of a 24-year long monitoring database of phytoplankton in the Danube River at the station of G¨od, Hungary (at 1669 river kilometer – hereafter referred to as “rkm”). The Danubian Phytoplankton Growth Model (DPGM) is able to describe the seasonal dynamics of phytoplankton biomass (mg L−1) based on daily temperature, but takes the availability of light into consideration as well. In order to improve fitting, the 24-year long database was split in two parts in accordance with environmental sustainability. The period of 1979–1990 has a higher level of nutrient excess compared with that of the 1991–2002. The authors assume that, in the above-mentioned periods, phytoplankton responded to temperature in two different ways, thus two submodels were developed, DPGM-sA and DPGMsB. Observed and simulated data correlated quite well. Findings suggest that linear temperature rise brings drastic change to phytoplankton only in case of high nutrient load and it is mostly realized through the increase of yearly total biomass.
Resumo:
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as ƒ-test is performed during each node's split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.
Resumo:
Perception of simultaneity and temporal order is studied with simultaneity judgment (SJ) and temporal-order judgment (TOJ) tasks. In the former, observers report whether presentation of two stimuli was subjectively simultaneous; in the latter, they report which stimulus was subjectively presented first. SJ and TOJ tasks typically give discrepant results, which has prompted the view that performance is mediated by different processes in each task. We looked at these discrepancies from a model that yields psychometric functions whose parameters characterize the timing, decisional, and response processes involved in SJ and TOJ tasks. We analyzed 12 data sets from published studies in which both tasks had been used in within-subjects designs, all of which had reported differences in performance across tasks. Fitting the model jointly to data from both tasks, we tested the hypothesis that common timing processes sustain simultaneity and temporal order judgments, with differences in performance arising from task-dependent decisional and response processes. The results supported this hypothesis, also showing that model psychometric functions account for aspects of SJ and TOJ data that classical analyses overlook. Implications for research on perception of simultaneity and temporal order are discussed.
Resumo:
A modified UNIFAC–VISCO group contribution method was developed for the correlation and prediction of viscosity of ionic liquids as a function of temperature at 0.1 MPa. In this original approach, cations and anions were regarded as peculiar molecular groups. The significance of this approach comes from the ability to calculate the viscosity of mixtures of ionic liquids as well as pure ionic liquids. Binary interaction parameters for selected cations and anions were determined by fitting the experimental viscosity data available in literature for selected ionic liquids. The temperature dependence on the viscosity of the cations and anions were fitted to a Vogel–Fulcher–Tamman behavior. Binary interaction parameters and VFT type fitting parameters were then used to determine the viscosity of pure and mixtures of ionic liquids with different combinations of cations and anions to ensure the validity of the prediction method. Consequently, the viscosities of binary ionic liquid mixtures were then calculated by using this prediction method. In this work, the viscosity data of pure ionic liquids and of binary mixtures of ionic liquids are successfully calculated from 293.15 K to 363.15 K at 0.1 MPa. All calculated viscosity data showed excellent agreement with experimental data with a relative absolute average deviation lower than 1.7%.