974 resultados para Variables selection


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective To discuss generalized estimating equations as an extension of generalized linear models by commenting on the paper of Ziegler and Vens "Generalized Estimating Equations. Notes on the Choice of the Working Correlation Matrix". Methods Inviting an international group of experts to comment on this paper. Results Several perspectives have been taken by the discussants. Econometricians have established parallels to the generalized method of moments (GMM). Statisticians discussed model assumptions and the aspect of missing data Applied statisticians; commented on practical aspects in data analysis. Conclusions In general, careful modeling correlation is encouraged when considering estimation efficiency and other implications, and a comparison of choosing instruments in GMM and generalized estimating equations, (GEE) would be worthwhile. Some theoretical drawbacks of GEE need to be further addressed and require careful analysis of data This particularly applies to the situation when data are missing at random.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work, the quantitative analysis of glucose, triglycerides and cholesterol (total and HDL) in both rat and human blood plasma was performed without any kind of pretreatment of samples, by using near infrared spectroscopy (NIR) combined with multivariate methods. For this purpose, different techniques and algorithms used to pre-process data, to select variables and to build multivariate regression models were compared between each other, such as partial least squares regression (PLS), non linear regression by artificial neural networks, interval partial least squares regression (iPLS), genetic algorithm (GA), successive projections algorithm (SPA), amongst others. Related to the determinations of rat blood plasma samples, the variables selection algorithms showed satisfactory results both for the correlation coefficients (R²) and for the values of root mean square error of prediction (RMSEP) for the three analytes, especially for triglycerides and cholesterol-HDL. The RMSEP values for glucose, triglycerides and cholesterol-HDL obtained through the best PLS model were 6.08, 16.07 e 2.03 mg dL-1, respectively. In the other case, for the determinations in human blood plasma, the predictions obtained by the PLS models provided unsatisfactory results with non linear tendency and presence of bias. Then, the ANN regression was applied as an alternative to PLS, considering its ability of modeling data from non linear systems. The root mean square error of monitoring (RMSEM) for glucose, triglycerides and total cholesterol, for the best ANN models, were 13.20, 10.31 e 12.35 mg dL-1, respectively. Statistical tests (F and t) suggest that NIR spectroscopy combined with multivariate regression methods (PLS and ANN) are capable to quantify the analytes (glucose, triglycerides and cholesterol) even when they are present in highly complex biological fluids, such as blood plasma

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this study was to evaluate the potential of near-infrared reflectance spectroscopy (NIRS) as a rapid and non-destructive method to determine the soluble solid content (SSC), pH and titratable acidity of intact plums. Samples of plum with a total solids content ranging from 5.7 to 15%, pH from 2.72 to 3.84 and titratable acidity from 0.88 a 3.6% were collected from supermarkets in Natal-Brazil, and NIR spectra were acquired in the 714 2500 nm range. A comparison of several multivariate calibration techniques with respect to several pre-processing data and variable selection algorithms, such as interval Partial Least Squares (iPLS), genetic algorithm (GA), successive projections algorithm (SPA) and ordered predictors selection (OPS), was performed. Validation models for SSC, pH and titratable acidity had a coefficient of correlation (R) of 0.95 0.90 and 0.80, as well as a root mean square error of prediction (RMSEP) of 0.45ºBrix, 0.07 and 0.40%, respectively. From these results, it can be concluded that NIR spectroscopy can be used as a non-destructive alternative for measuring the SSC, pH and titratable acidity in plums

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present a model of Bayesian network for continuous variables, where densities and conditional densities are estimated with B-spline MoPs. We use a novel approach to directly obtain conditional densities estimation using B-spline properties. In particular we implement naive Bayes and wrapper variables selection. Finally we apply our techniques to the problem of predicting neurons morphological variables from electrophysiological ones.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, the comparison of orthogonal descriptors and Leaps-and-Bounds regression analysis is performed. The results obtained by using orthogonal descriptors are better than that obtained by using Leaps-and-Bounds regression for the data set of nitrobenzenes used in this study. Leaps-and-Bounds regression can be used effectively for selection of variables in quantitative structure-activity/property relationship(QSAR/QSPR) studies. Consequently, orthogonalisation of descriptors is also a good method for variable selection for studies on QSAR/QSPR.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present a novel method for the light-curve characterization of Pan-STARRS1 Medium Deep Survey (PS1 MDS) extragalactic sources into stochastic variables (SVs) and burst-like (BL) transients, using multi-band image-differencing time-series data. We select detections in difference images associated with galaxy hosts using a star/galaxy catalog extracted from the deep PS1 MDS stacked images, and adopt a maximum a posteriori formulation to model their difference-flux time-series in four Pan-STARRS1 photometric bands gP1, rP1, iP1, and zP1. We use three deterministic light-curve models to fit BL transients; a Gaussian, a Gamma distribution, and an analytic supernova (SN) model, and one stochastic light-curve model, the Ornstein-Uhlenbeck process, in order to fit variability that is characteristic of active galactic nuclei (AGNs). We assess the quality of fit of the models band-wise and source-wise, using their estimated leave-out-one cross-validation likelihoods and corrected Akaike information criteria. We then apply a K-means clustering algorithm on these statistics, to determine the source classification in each band. The final source classification is derived as a combination of the individual filter classifications, resulting in two measures of classification quality, from the averages across the photometric filters of (1) the classifications determined from the closest K-means cluster centers, and (2) the square distances from the clustering centers in the K-means clustering spaces. For a verification set of AGNs and SNe, we show that SV and BL occupy distinct regions in the plane constituted by these measures. We use our clustering method to characterize 4361 extragalactic image difference detected sources, in the first 2.5 yr of the PS1 MDS, into 1529 BL, and 2262 SV, with a purity of 95.00% for AGNs, and 90.97% for SN based on our verification sets. We combine our light-curve classifications with their nuclear or off-nuclear host galaxy offsets, to define a robust photometric sample of 1233 AGNs and 812 SNe. With these two samples, we characterize their variability and host galaxy properties, and identify simple photometric priors that would enable their real-time identification in future wide-field synoptic surveys.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The growing demand for steels with tighter compositional specifications led the Companhia Siderúrgica Nacional (CSN) to develop more efficient processes. To solve this problem this paper aims to identify the operational variables more impacting in the desulfurization process, specifically in torpedo car, as well as its causes and solutions. Then select and test, with laboratorial and industrial tests, desulfurizing agents based of CaC 2, CaO, CaCO3, and Mg to assess the cost per quantity of product desulfurized. The mixture with best results was not that one with highest content of CaC2. It is believed that this mixture showed better efficiency because of the increased agitation of the bath, produced by the releasing of gas from compound CaCO3 present in this mixture. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In a matched experimental design, the effectiveness of matching in reducing bias and increasing power depends on the strength of the association between the matching variable and the outcome of interest. In particular, in the design of a community health intervention trial, the effectiveness of a matched design, where communities are matched according to some community characteristic, depends on the strength of the correlation between the matching characteristic and the change in the health behavior being measured. We attempt to estimate the correlation between community characteristics and changes in health behaviors in four datasets from community intervention trials and observational studies. Community characteristics that are highly correlated with changes in health behaviors would potentially be effective matching variables in studies of health intervention programs designed to change those behaviors. Among the community characteristics considered, the urban-rural character of the community was the most highly correlated with changes in health behaviors. The correlations between Per Capita Income, Percent Low Income & Percent aged over 65 and changes in health behaviors were marginally statistically significant (p < 0.08).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Body fat distribution is a cardiovascular health risk factor in adults. Body fat distribution can be measured through various methods including anthropometry. It is not clear which anthropometric index is suitable for epidemiologic studies of fat distribution and cardiovascular disease. The purpose of the present study was to select a measure of body fat distribution from among a series of indices (those traditionally used in the literature and others constructed from the analysis) that is most highly correlated with lipid-related variables and is independent of overall fatness. Subjects were Mexican-American men and women (N = 1004) from a study of gallbladder disease in Starr County, Texas. Multivariate associations were sought between lipid profile measures (lipids, lipoproteins, and apolipoproteins) and two sets of anthropometric variables (4 circumferences and 6 skinfolds). This was done to assess the association between lipid-related measures and the two sets of anthropometric variables and guide the construction of indices.^ Two indices emerged from the analysis that seemed to be highly correlated with lipid profile measures independent of obesity. These indices are: 2*arm circumference-thigh skinfold in pre- and post-menopausal women and arm/thigh circumference ratio in men. Next, using the sum of all skinfolds to represent obesity and the selected body fat distribution indices, the following hypotheses were tested: (1) state of obesity and centrally/upper distributed body fat are equally predictive of lipids, lipoproteins and apolipoproteins, and (2) the correlation among the lipid-related measures is not altered by obesity and body fat distribution.^ With respect to the first hypothesis, the present study found that most lipids, lipoproteins and apolipoproteins were significantly associated with both overall fatness and anatomical location of body fat in both sex and menopausal groups. However, within men and post-menopausal women, certain lipid profile measures (triglyceride and HDLT among post-menopausal women and apos C-II, CIII, and E among men) had substantially higher correlation with body fat distribution as compared with overall fatness.^ With respect to the second hypothesis, both obesity and body fat distribution were found to alter the association among plasma lipid variables in men and women. There was a suggestion from the data that the pattern of correlations among men and post-menopausal women are more comparable. Among men correlations involving apo A-I, HDLT, and HDL$\sb2$ seemed greatly influenced by obesity, and A-II by fat distribution; among post-menopausal women correlations involving apos A-I and A-II were highly affected by the location of body fat.^ Thus, these data point out that not only can obesity and fat distribution affect levels of single measures, they also can markedly influence the pattern of relationship among measures. The fact that such changes are seen for both obesity and fat distribution is significant, since the indices employed were chosen because they were independent of one another. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper studies feature subset selection in classification using a multiobjective estimation of distribution algorithm. We consider six functions, namely area under ROC curve, sensitivity, specificity, precision, F1 measure and Brier score, for evaluation of feature subsets and as the objectives of the problem. One of the characteristics of these objective functions is the existence of noise in their values that should be appropriately handled during optimization. Our proposed algorithm consists of two major techniques which are specially designed for the feature subset selection problem. The first one is a solution ranking method based on interval values to handle the noise in the objectives of this problem. The second one is a model estimation method for learning a joint probabilistic model of objectives and variables which is used to generate new solutions and advance through the search space. To simplify model estimation, l1 regularized regression is used to select a subset of problem variables before model learning. The proposed algorithm is compared with a well-known ranking method for interval-valued objectives and a standard multiobjective genetic algorithm. Particularly, the effects of the two new techniques are experimentally investigated. The experimental results show that the proposed algorithm is able to obtain comparable or better performance on the tested datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Principal Topic A small firm is unlikely to possess internally the full range of knowledge and skills that it requires or could benefit from for the development of its business. The ability to acquire suitable external expertise - defined as knowledge or competence that is rare in the firm and acquired from the outside - when needed thus becomes a competitive factor in itself. Access to external expertise enables the firm to focus on its core competencies and removes the necessity to internalize every skill and competence. However, research on how small firms access external expertise is still scarce. The present study contributes to this under-developed discussion by analysing the role of trust and strong ties in the small firm's selection and evaluation of sources of external expertise (henceforth referred to as the 'business advisor' or 'advisor'). Granovetter (1973, 1361) defines the strength of a network tie as 'a (probably linear) combination of the amount of time, the emotional intensity, the intimacy (mutual confiding) and the reciprocal services which characterize the tie'. Strong ties in the context of the present investigation refer to sources of external expertise who are well known to the owner-manager, and who may be either informal (e.g., family, friends) or professional advisors (e.g., consultants, enterprise support officers, accountants or solicitors). Previous research has suggested that strong and weak ties have different fortes and the choice of business advisors could thus be critical to business performance) While previous research results suggest that small businesses favour previously well known business advisors, prior studies have also pointed out that an excessive reliance on a network of well known actors might hamper business development, as the range of expertise available through strong ties is limited. But are owner-managers of small businesses aware of this limitation and does it matter to them? Or does working with a well-known advisor compensate for it? Hence, our research model first examines the impact of the strength of tie on the business advisor's perceived performance. Next, we ask what encourages a small business owner-manager to seek advice from a strong tie. A recent exploratory study by Welter and Kautonen (2005) drew attention to the central role of trust in this context. However, while their study found support for the general proposition that trust plays an important role in the choice of advisors, how trust and its different dimensions actually affect this choice remained ambiguous. The present paper develops this discussion by considering the impact of the different dimensions of perceived trustworthiness, defined as benevolence, integrity and ability, on the strength of tie. Further, we suggest that the dimensions of perceived trustworthiness relevant in the choice of a strong tie vary between professional and informal advisors. Methodology/Key Propositions Our propositions are examined empirically based on survey data comprising 153 Finnish small businesses. The data are analysed utilizing the partial least squares (PLS) approach to structural equation modelling with SmartPLS 2.0. Being non-parametric, the PLS algorithm is particularly well-suited to analysing small datasets with non-normally distributed variables. Results and Implications The path model shows that the stronger the tie, the more positively the advisor's performance is perceived. Hypothesis 1, that strong ties will be associated with higher perceptions of performance is clearly supported. Benevolence is clearly the most significant predictor of the choice of a strong tie for external expertise. While ability also reaches a moderate level of statistical significance, integrity does not have a statistically significant impact on the choice of a strong tie. Hence, we found support for two out of three independent variables included in Hypothesis 2. Path coefficients differed between the professional and informal advisor subsamples. The results of the exploratory group comparison show that Hypothesis 3a regarding ability being associated with strong ties more pronouncedly when choosing a professional advisor was not supported. Hypothesis 3b arguing that benevolence is more strongly associated with strong ties in the context of choosing an informal advisor received some support because the path coefficient in the informal advisor subsample was much larger than in the professional advisor subsample. Hypothesis 3c postulating that integrity would be more strongly associated with strong ties in the choice of a professional advisor was supported. Integrity is the most important dimension of trustworthiness in this context. However, integrity is of no concern, or even negative, when using strong ties to choose an informal advisor. The findings of this study have practical relevance to the enterprise support community. First of all, given that the strength of tie has a significant positive impact on the advisor's perceived performance, this implies that small business owners appreciate working with advisors in long-term relationships. Therefore, advisors are well advised to invest into relationship building and maintenance in their work with small firms. Secondly, the results show that, especially in the context of professional advisors, the advisor's perceived integrity and benevolence weigh more than ability. This again emphasizes the need to invest time and effort into building a personal relationship with the owner-manager, rather than merely maintaining a professional image and credentials. Finally, this study demonstrates that the dimensions of perceived trustworthiness are orthogonal with different effects on the strength of tie and ultimately perceived performance. This means that entrepreneurs and advisors should consider the specific dimensions of ability, benevolence and integrity, rather than rely on general perceptions of trustworthiness in their advice relationships.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For a sustainable building industry, not only should the environmental and economic indicators be evaluated but also the societal indicators for building. Current indicators can be in conflict with each other, thus decision making is difficult to clearly quantify and assess sustainability. For the sustainable building, the objectives of decreasing both adverse environmental impact and cost are in conflict. In addition, even though both objectives may be satisfied, building management systems may present other problems such as convenience of occupants, flexibility of building, or technical maintenance, which are difficult to quantify as exact assessment data. These conflicting problems confronting building managers or planners render building management more difficult. This paper presents a methodology to evaluate a sustainable building considering socio-economic and environmental characteristics of buildings, and is intended to assist the decision making for building planners or practitioners. The suggested methodology employs three main concepts: linguistic variables, fuzzy numbers, and an analytic hierarchy process. The linguistic variables are used to represent the degree of appropriateness of qualitative indicators, which are vague or uncertain. These linguistic variables are then translated into fuzzy numbers to reflect their uncertainties and aggregated into the final fuzzy decision value using a hierarchical structure. Through a case study, the suggested methodology is applied to the evaluation of a building. The result demonstrates that the suggested approach can be a useful tool for evaluating a building for sustainability.