965 resultados para linear calibration model


Relevância:

90.00% 90.00%

Publicador:

Resumo:

E. L. DeLosh, J. R. Busemeyer, and M. A. McDaniel (1997) found that when learning a positive, linear relationship between a continuous predictor (x) and a continuous criterion (y), trainees tend to underestimate y on items that ask the trainee to extrapolate. In 3 experiments, the authors examined the phenomenon and found that the tendency to underestimate y is reliable only in the so-called lower extrapolation region-that is, new values of x that lie between zero and the edge of the training region. Existing models of function learning, such as the extrapolation-association model (DeLosh et al., 1997) and the population of linear experts model (M. L. Kalish, S. Lewandowsky, & J. Kruschke, 2004), cannot account for these results. The authors show that with minor changes, both models can predict the correct pattern of results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Exploratory analysis of data in all sciences seeks to find common patterns to gain insights into the structure and distribution of the data. Typically visualisation methods like principal components analysis are used but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this technical report we discuss a complementary approach based on a non-linear probabilistic model. The generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate far more structure than a two dimensional principal components plot could, and deal at the same time with missing data. We show that using the generative topographic mapping provides us with an optimal method to explore the data while being able to replace missing values in a dataset, particularly where a large proportion of the data is missing.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

For a submitted query to multiple search engines finding relevant results is an important task. This paper formulates the problem of aggregation and ranking of multiple search engines results in the form of a minimax linear programming model. Besides the novel application, this study detects the most relevant information among a return set of ranked lists of documents retrieved by distinct search engines. Furthermore, two numerical examples aree used to illustrate the usefulness of the proposed approach.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A rapid method for the analysis of biomass feedstocks was established to identify the quality of the pyrolysis products likely to impact on bio-oil production. A total of 15 Lolium and Festuca grasses known to exhibit a range of Klason lignin contents were analysed by pyroprobe-GC/MS (Py-GC/MS) to determine the composition of the thermal degradation products of lignin. The identification of key marker compounds which are the derivatives of the three major lignin subunits (G, H, and S) allowed pyroprobe-GC/MS to be statistically correlated to the Klason lignin content of the biomass using the partial least-square method to produce a calibration model. Data from this multivariate modelling procedure was then applied to identify likely "key marker" ions representative of the lignin subunits from the mass spectral data. The combined total abundance of the identified key markers for the lignin subunits exhibited a linear relationship with the Klason lignin content. In addition the effect of alkali metal concentration on optimum pyrolysis characteristics was also examined. Washing of the grass samples removed approximately 70% of the metals and changed the characteristics of the thermal degradation process and products. Overall the data indicate that both the organic and inorganic specification of the biofuel impacts on the pyrolysis process and that pyroprobe-GC/MS is a suitable analytical technique to asses lignin composition. © 2007 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The measurement of different aspects of information society has been problematic over along time, and the International Telecommunication Union (ITU) is spearheading in developing a single ICT index. In Geneva during the first World Summit on Information Society (WSIS) in December 2003, the heads of states declared their commitment to the importance of benchmarking and measuring progress toward the information society. Consequently, they re-affirmed their Geneva commitments in their second summit held in Tunis in 2005. In this paper, we propose a multiplicative linear programming model to measure Opportunity Index. We also compared our results with the common measure of ICT opportunity index and we found that the two indices are consistent in their measurement of digital opportunity though differences still exist among regions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62J12, 62F35

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The cell:cell bond between an immune cell and an antigen presenting cell is a necessary event in the activation of the adaptive immune response. At the juncture between the cells, cell surface molecules on the opposing cells form non-covalent bonds and a distinct patterning is observed that is termed the immunological synapse. An important binding molecule in the synapse is the T-cell receptor (TCR), that is responsible for antigen recognition through its binding with a major-histocompatibility complex with bound peptide (pMHC). This bond leads to intracellular signalling events that culminate in the activation of the T-cell, and ultimately leads to the expression of the immune eector function. The temporal analysis of the TCR bonds during the formation of the immunological synapse presents a problem to biologists, due to the spatio-temporal scales (nanometers and picoseconds) that compare with experimental uncertainty limits. In this study, a linear stochastic model, derived from a nonlinear model of the synapse, is used to analyse the temporal dynamics of the bond attachments for the TCR. Mathematical analysis and numerical methods are employed to analyse the qualitative dynamics of the nonequilibrium membrane dynamics, with the specic aim of calculating the average persistence time for the TCR:pMHC bond. A single-threshold method, that has been previously used to successfully calculate the TCR:pMHC contact path sizes in the synapse, is applied to produce results for the average contact times of the TCR:pMHC bonds. This method is extended through the development of a two-threshold method, that produces results suggesting the average time persistence for the TCR:pMHC bond is in the order of 2-4 seconds, values that agree with experimental evidence for TCR signalling. The study reveals two distinct scaling regimes in the time persistent survival probability density prole of these bonds, one dominated by thermal uctuations and the other associated with the TCR signalling. Analysis of the thermal fluctuation regime reveals a minimal contribution to the average time persistence calculation, that has an important biological implication when comparing the probabilistic models to experimental evidence. In cases where only a few statistics can be gathered from experimental conditions, the results are unlikely to match the probabilistic predictions. The results also identify a rescaling relationship between the thermal noise and the bond length, suggesting a recalibration of the experimental conditions, to adhere to this scaling relationship, will enable biologists to identify the start of the signalling regime for previously unobserved receptor:ligand bonds. Also, the regime associated with TCR signalling exhibits a universal decay rate for the persistence probability, that is independent of the bond length.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Spectral unmixing (SU) is a technique to characterize mixed pixels of the hyperspectral images measured by remote sensors. Most of the existing spectral unmixing algorithms are developed using the linear mixing models. Since the number of endmembers/materials present at each mixed pixel is normally scanty compared with the number of total endmembers (the dimension of spectral library), the problem becomes sparse. This thesis introduces sparse hyperspectral unmixing methods for the linear mixing model through two different scenarios. In the first scenario, the library of spectral signatures is assumed to be known and the main problem is to find the minimum number of endmembers under a reasonable small approximation error. Mathematically, the corresponding problem is called the $\ell_0$-norm problem which is NP-hard problem. Our main study for the first part of thesis is to find more accurate and reliable approximations of $\ell_0$-norm term and propose sparse unmixing methods via such approximations. The resulting methods are shown considerable improvements to reconstruct the fractional abundances of endmembers in comparison with state-of-the-art methods such as having lower reconstruction errors. In the second part of the thesis, the first scenario (i.e., dictionary-aided semiblind unmixing scheme) will be generalized as the blind unmixing scenario that the library of spectral signatures is also estimated. We apply the nonnegative matrix factorization (NMF) method for proposing new unmixing methods due to its noticeable supports such as considering the nonnegativity constraints of two decomposed matrices. Furthermore, we introduce new cost functions through some statistical and physical features of spectral signatures of materials (SSoM) and hyperspectral pixels such as the collaborative property of hyperspectral pixels and the mathematical representation of the concentrated energy of SSoM for the first few subbands. Finally, we introduce sparse unmixing methods for the blind scenario and evaluate the efficiency of the proposed methods via simulations over synthetic and real hyperspectral data sets. The results illustrate considerable enhancements to estimate the spectral library of materials and their fractional abundances such as smaller values of spectral angle distance (SAD) and abundance angle distance (AAD) as well.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A novel surrogate model is proposed in lieu of computational fluid dynamic (CFD) code for fast nonlinear aerodynamic modeling. First, a nonlinear function is identified on selected interpolation points defined by discrete empirical interpolation method (DEIM). The flow field is then reconstructed by a least square approximation of flow modes extracted by proper orthogonal decomposition (POD). The proposed model is applied in the prediction of limit cycle oscillation for a plunge/pitch airfoil and a delta wing with linear structural model, results are validate against a time accurate CFD-FEM code. The results show the model is able to replicate the aerodynamic forces and flow fields with sufficient accuracy while requiring a fraction of CFD cost.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Classical regression analysis can be used to model time series. However, the assumption that model parameters are constant over time is not necessarily adapted to the data. In phytoplankton ecology, the relevance of time-varying parameter values has been shown using a dynamic linear regression model (DLRM). DLRMs, belonging to the class of Bayesian dynamic models, assume the existence of a non-observable time series of model parameters, which are estimated on-line, i.e. after each observation. The aim of this paper was to show how DLRM results could be used to explain variation of a time series of phytoplankton abundance. We applied DLRM to daily concentrations of Dinophysis cf. acuminata, determined in Antifer harbour (French coast of the English Channel), along with physical and chemical covariates (e.g. wind velocity, nutrient concentrations). A single model was built using 1989 and 1990 data, and then applied separately to each year. Equivalent static regression models were investigated for the purpose of comparison. Results showed that most of the Dinophysis cf. acuminata concentration variability was explained by the configuration of the sampling site, the wind regime and tide residual flow. Moreover, the relationships of these factors with the concentration of the microalga varied with time, a fact that could not be detected with static regression. Application of dynamic models to phytoplankton time series, especially in a monitoring context, is discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Understanding the mode-locked response of excitable systems to periodic forcing has important applications in neuroscience. For example it is known that spatially extended place cells in the hippocampus are driven by the theta rhythm to generate a code conveying information about spatial location. Thus it is important to explore the role of neuronal dendrites in generating the response to periodic current injection. In this paper we pursue this using a compartmental model, with linear dynamics for each compartment, coupled to an active soma model that generates action potentials. By working with the piece-wise linear McKean model for the soma we show how the response of the whole neuron model (soma and dendrites) can be written in closed form. We exploit this to construct a stroboscopic map describing the response of the spatially extended model to periodic forcing. A linear stability analysis of this map, together with a careful treatment of the non-differentiability of the soma model, allows us to construct the Arnol'd tongue structure for 1:q states (one action potential for q cycles of forcing). Importantly we show how the presence of quasi-active membrane in the dendrites can influence the shape of tongues. Direct numerical simulations confirm our theory and further indicate that resonant dendritic membrane can enlarge the windows in parameter space for chaotic behavior. These simulations also show that the spatially extended neuron model responds differently to global as opposed to point forcing. In the former case spatio-temporal patterns of activity within an Arnol'd tongue are standing waves, whilst in the latter they are traveling waves.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow’s milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.