973 resultados para score test information matrix artificial regression
Resumo:
We present the most comprehensive comparison to date of the predictive benefit of genetics in addition to currently used clinical variables, using genotype data for 33 single-nucleotide polymorphisms (SNPs) in 1,547 Caucasian men from the placebo arm of the REduction by DUtasteride of prostate Cancer Events (REDUCE®) trial. Moreover, we conducted a detailed comparison of three techniques for incorporating genetics into clinical risk prediction. The first method was a standard logistic regression model, which included separate terms for the clinical covariates and for each of the genetic markers. This approach ignores a substantial amount of external information concerning effect sizes for these Genome Wide Association Study (GWAS)-replicated SNPs. The second and third methods investigated two possible approaches to incorporating meta-analysed external SNP effect estimates - one via a weighted PCa 'risk' score based solely on the meta analysis estimates, and the other incorporating both the current and prior data via informative priors in a Bayesian logistic regression model. All methods demonstrated a slight improvement in predictive performance upon incorporation of genetics. The two methods that incorporated external information showed the greatest receiver-operating-characteristic AUCs increase from 0.61 to 0.64. The value of our methods comparison is likely to lie in observations of performance similarities, rather than difference, between three approaches of very different resource requirements. The two methods that included external information performed best, but only marginally despite substantial differences in complexity.
Resumo:
BACKGROUND: Obesity is strongly associated with major depressive disorder (MDD) and various other diseases. Genome-wide association studies have identified multiple risk loci robustly associated with body mass index (BMI). In this study, we aimed to investigate whether a genetic risk score (GRS) combining multiple BMI risk loci might have utility in prediction of obesity in patients with MDD. METHODS: Linear and logistic regression models were conducted to predict BMI and obesity, respectively, in three independent large case-control studies of major depression (Radiant, GSK-Munich, PsyCoLaus). The analyses were first performed in the whole sample and then separately in depressed cases and controls. An unweighted GRS was calculated by summation of the number of risk alleles. A weighted GRS was calculated as the sum of risk alleles at each locus multiplied by their effect sizes. Receiver operating characteristic (ROC) analysis was used to compare the discriminatory ability of predictors of obesity. RESULTS: In the discovery phase, a total of 2,521 participants (1,895 depressed patients and 626 controls) were included from the Radiant study. Both unweighted and weighted GRS were highly associated with BMI (P <0.001) but explained only a modest amount of variance. Adding 'traditional' risk factors to GRS significantly improved the predictive ability with the area under the curve (AUC) in the ROC analysis, increasing from 0.58 to 0.66 (95% CI, 0.62-0.68; χ(2) = 27.68; P <0.0001). Although there was no formal evidence of interaction between depression status and GRS, there was further improvement in AUC in the ROC analysis when depression status was added to the model (AUC = 0.71; 95% CI, 0.68-0.73; χ(2) = 28.64; P <0.0001). We further found that the GRS accounted for more variance of BMI in depressed patients than in healthy controls. Again, GRS discriminated obesity better in depressed patients compared to healthy controls. We later replicated these analyses in two independent samples (GSK-Munich and PsyCoLaus) and found similar results. CONCLUSIONS: A GRS proved to be a highly significant predictor of obesity in people with MDD but accounted for only modest amount of variance. Nevertheless, as more risk loci are identified, combining a GRS approach with information on non-genetic risk factors could become a useful strategy in identifying MDD patients at higher risk of developing obesity.
Resumo:
Aim To disentangle the effects of environmental and geographical processes driving phylogenetic distances among clades of maritime pine (Pinus pinaster). To assess the implications for conservation management of combining molecular information with species distribution models (SDMs; which predict species distribution based on known occurrence records and on environmental variables). Location Western Mediterranean Basin and European Atlantic coast. Methods We undertook two cluster analyses for eight genetically defined pine clades based on climatic niche and genetic similarities. We assessed niche similarity by means of a principal component analysis and Schoener's D metric. To calculate genetic similarity, we used the unweighted pair group method with arithmetic mean based on Nei's distance using 266 single nucleotide polymorphisms. We then assessed the contribution of environmental and geographical distances to phylogenetic distance by means of Mantel regression with variance partitioning. Finally, we compared the projection obtained from SDMs fitted from the species level (SDMsp) and composed from the eight clade-level models (SDMcm). Results Genetically and environmentally defined clusters were identical. Environmental and geographical distances explained 12.6% of the phylogenetic distance variation and, overall, geographical and environmental overlap among clades was low. Large differences were detected between SDMsp and SDMcm (57.75% of disagreement in the areas predicted as suitable). Main conclusions The genetic structure within the maritime pine subspecies complex is primarily a consequence of its demographic history, as seen by the high proportion of unexplained variation in phylogenetic distances. Nevertheless, our results highlight the contribution of local environmental adaptation in shaping the lower-order, phylogeographical distribution patterns and spatial genetic structure of maritime pine: (1) genetically and environmentally defined clusters are consistent, and (2) environment, rather than geography, explained a higher proportion of variation in phylogenetic distance. SDMs, key tools in conservation management, better characterize the fundamental niche of the species when they include molecular information.
Resumo:
A simple, precise, specific, repeatable and discriminating dissolution test for primaquine (PQ) matrix tablets was developed and validated according to ICH and FDA guidelines. Two UV assaying methods were validated for determination of PQ released in 0.1 M hydrochloric acid and water media. Both methods were linear (R²>0.999), precise (R.S.D.<1.87%) and accurate (97.65-99.97%). Dissolution efficiency (69-88%) and equivalence of formulations (f2) was assessed in different media and apparatuses (basket/100 rpm and paddle/50 rpm) tested. Discriminating condition was 900 mL aqueous medium, basket at 100 rpm and sampling times at 1, 4 and 8 h. Repeatability (R.S.D.<2.71%) and intermediate precision (R.S.D.<2.06%) of dissolution method were satisfactory.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
The Artificial Neural Networks (ANNs) are mathematical models method capable of estimating non-linear response plans. The advantage of these models is to present different responses of the statistical models. Thus, the objective of this study was to develop and to test ANNs for estimating rainfall erosivity index (EI30) as a function of the geographical location for the state of Rio de Janeiro, Brazil and generating a thematic visualization map. The characteristics of latitude, longitude e altitude using ANNs were acceptable to estimating EI30 and allowing visualization of the space variability of EI30. Thus, ANN is a potential option for the estimate of climatic variables in substitution to the traditional methods of interpolation.
Resumo:
The along-scan radiometric gradient causes severe interpretation problems in Landsat images of tropical forests. It creates a decreasing trend in pixel values with the column number of the image. In practical applications it has been corrected assuming the trend to be linear within structurally similar forests. This has improved the relation between floristic and remote sensing information, but just in some cases. I use 3 Landsat images and 105 floristic inventories to test the assumption of linearity, and to examine how the gradient and linear corrections affect the relation between floristic and Landsat data. Results suggest the gradient to be linear in infrared bands. Also, the relation between floristic and Landsat data could be conditioned by the distribution of the sampling sites and the direction in which images are mosaicked. Additionally, there seems to be a conjunction between the radiometric gradient and a natural east-west vegetation gradient common in Western Amazonia. This conjunction might have enhanced artificially correlations between field and remotely-sensed information in previous studies. Linear corrections may remove such artificial enhancement, but along with true and relevant spectral information about floristic patterns, because they can´t separate the radiometric gradient from a natural one.
Resumo:
The mortality rate of older patients with intertrochanteric fractures has been increasing with the aging of populations in China. The purpose of this study was: 1) to develop an artificial neural network (ANN) using clinical information to predict the 1-year mortality of elderly patients with intertrochanteric fractures, and 2) to compare the ANN's predictive ability with that of logistic regression models. The ANN model was tested against actual outcomes of an intertrochanteric femoral fracture database in China. The ANN model was generated with eight clinical inputs and a single output. ANN's performance was compared with a logistic regression model created with the same inputs in terms of accuracy, sensitivity, specificity, and discriminability. The study population was composed of 2150 patients (679 males and 1471 females): 1432 in the training group and 718 new patients in the testing group. The ANN model that had eight neurons in the hidden layer had the highest accuracies among the four ANN models: 92.46 and 85.79% in both training and testing datasets, respectively. The areas under the receiver operating characteristic curves of the automatically selected ANN model for both datasets were 0.901 (95%CI=0.814-0.988) and 0.869 (95%CI=0.748-0.990), higher than the 0.745 (95%CI=0.612-0.879) and 0.728 (95%CI=0.595-0.862) of the logistic regression model. The ANN model can be used for predicting 1-year mortality in elderly patients with intertrochanteric fractures. It outperformed a logistic regression on multiple performance measures when given the same variables.
Resumo:
Ordered probit regression was used to analyze data of sensory acceptance tests designed to study the effect of brand name on the acceptability of beer samples. Eight different brands of Pilsen beer were evaluated by 101 consumers in two sessions of acceptance tests: blind evaluation and brand information test. Ordered probit regression, although a relatively sophisticated technique compared to others used to analyze sensory data, was chosen to enable the observation of consumers' behavior using graphical interpretations of estimated probabilities plotted against hedonic scales. It can be concluded that brands B, C, and D had a positive effect on the sensory acceptance of the product, whereas brands A, F, G, and H had a negative influence on consumers' evaluation of the samples. On the other hand, brand E had little influence on consumers' assessment.
Resumo:
This paper studies seemingly unrelated linear models with integrated regressors and stationary errors. By adding leads and lags of the first differences of the regressors and estimating this augmented dynamic regression model by feasible generalized least squares using the long-run covariance matrix, we obtain an efficient estimator of the cointegrating vector that has a limiting mixed normal distribution. Simulation results suggest that this new estimator compares favorably with others already proposed in the literature. We apply these new estimators to the testing of purchasing power parity (PPP) among the G-7 countries. The test based on the efficient estimates rejects the PPP hypothesis for most countries.
Resumo:
We study the problem of testing the error distribution in a multivariate linear regression (MLR) model. The tests are functions of appropriately standardized multivariate least squares residuals whose distribution is invariant to the unknown cross-equation error covariance matrix. Empirical multivariate skewness and kurtosis criteria are then compared to simulation-based estimate of their expected value under the hypothesized distribution. Special cases considered include testing multivariate normal, Student t; normal mixtures and stable error models. In the Gaussian case, finite-sample versions of the standard multivariate skewness and kurtosis tests are derived. To do this, we exploit simple, double and multi-stage Monte Carlo test methods. For non-Gaussian distribution families involving nuisance parameters, confidence sets are derived for the the nuisance parameters and the error distribution. The procedures considered are evaluated in a small simulation experi-ment. Finally, the tests are applied to an asset pricing model with observable risk-free rates, using monthly returns on New York Stock Exchange (NYSE) portfolios over five-year subperiods from 1926-1995.
Resumo:
On étudie l’application des algorithmes de décomposition matricielles tel que la Factorisation Matricielle Non-négative (FMN), aux représentations fréquentielles de signaux audio musicaux. Ces algorithmes, dirigés par une fonction d’erreur de reconstruction, apprennent un ensemble de fonctions de base et un ensemble de coef- ficients correspondants qui approximent le signal d’entrée. On compare l’utilisation de trois fonctions d’erreur de reconstruction quand la FMN est appliquée à des gammes monophoniques et harmonisées: moindre carré, divergence Kullback-Leibler, et une mesure de divergence dépendente de la phase, introduite récemment. Des nouvelles méthodes pour interpréter les décompositions résultantes sont présentées et sont comparées aux méthodes utilisées précédemment qui nécessitent des connaissances du domaine acoustique. Finalement, on analyse la capacité de généralisation des fonctions de bases apprises par rapport à trois paramètres musicaux: l’amplitude, la durée et le type d’instrument. Pour ce faire, on introduit deux algorithmes d’étiquetage des fonctions de bases qui performent mieux que l’approche précédente dans la majorité de nos tests, la tâche d’instrument avec audio monophonique étant la seule exception importante.
Resumo:
Les chutes chez les personnes âgées représentent un problème majeur. Il n’est donc pas étonnant que l’identification des facteurs qui en accroissent le risque ait mobilisé autant d’attention. Les aînés plus fragiles ayant besoin de soutien pour vivre dans la communauté sont néanmoins demeurés le parent pauvre de la recherche, bien que, plus récemment, les autorités québécoises en aient fait une cible d’intervention prioritaire. Les études d’observation prospectives sont particulièrement indiquées pour étudier les facteurs de risque de chutes chez les personnes âgées. Leur identification optimale est cependant compliquée par le fait que l’exposition aux facteurs de risque peut varier au cours du suivi et qu’un même individu peut subir plus d’un événement. Il y a 20 ans, des chercheurs ont tenté de sensibiliser leurs homologues à cet égard, mais leurs efforts sont demeurés vains. On continue aujourd’hui à faire peu de cas de ces considérations, se concentrant sur la proportion des personnes ayant fait une chute ou sur le temps écoulé jusqu’à la première chute. On écarte du coup une quantité importante d’information pertinente. Dans cette thèse, nous examinons les méthodes en usage et nous proposons une extension du modèle de risques de Cox. Nous illustrons cette méthode par une étude des facteurs de risque susceptibles d’être associés à des chutes parmi un groupe de 959 personnes âgées ayant eu recours aux services publics de soutien à domicile. Nous comparons les résultats obtenus avec la méthode de Wei, Lin et Weissfeld à ceux obtenus avec d’autres méthodes, dont la régression logistique conventionnelle, la régression logistique groupée, la régression binomiale négative et la régression d’Andersen et Gill. L’investigation est caractérisée par des prises de mesures répétées des facteurs de risque au domicile des participants et par des relances téléphoniques mensuelles visant à documenter la survenue des chutes. Les facteurs d’exposition étudiés, qu’ils soient fixes ou variables dans le temps, comprennent les caractéristiques sociodémographiques, l’indice de masse corporelle, le risque nutritionnel, la consommation d’alcool, les dangers de l’environnement domiciliaire, la démarche et l’équilibre, et la consommation de médicaments. La quasi-totalité (99,6 %) des usagers présentaient au moins un facteur à haut risque. L’exposition à des risques multiples était répandue, avec une moyenne de 2,7 facteurs à haut risque distincts par participant. Les facteurs statistiquement associés au risque de chutes incluent le sexe masculin, les tranches d’âge inférieures, l’histoire de chutes antérieures, un bas score à l’échelle d’équilibre de Berg, un faible indice de masse corporelle, la consommation de médicaments de type benzodiazépine, le nombre de dangers présents au domicile et le fait de vivre dans une résidence privée pour personnes âgées. Nos résultats révèlent cependant que les méthodes courantes d’analyse des facteurs de risque de chutes – et, dans certains cas, de chutes nécessitant un recours médical – créent des biais appréciables. Les biais pour les mesures d’association considérées proviennent de la manière dont l’exposition et le résultat sont mesurés et définis de même que de la manière dont les méthodes statistiques d’analyse en tiennent compte. Une dernière partie, tout aussi innovante que distincte de par la nature des outils statistiques utilisés, complète l’ouvrage. Nous y identifions des profils d’aînés à risque de devenir des chuteurs récurrents, soit ceux chez qui au moins deux chutes sont survenues dans les six mois suivant leur évaluation initiale. Une analyse par arbre de régression et de classification couplée à une analyse de survie a révélé l’existence de cinq profils distinctifs, dont le risque relatif varie de 0,7 à 5,1. Vivre dans une résidence pour aînés, avoir des antécédents de chutes multiples ou des troubles de l’équilibre et consommer de l’alcool sont les principaux facteurs associés à une probabilité accrue de chuter précocement et de devenir un chuteur récurrent. Qu’il s’agisse d’activité de dépistage des facteurs de risque de chutes ou de la population ciblée, cette thèse s’inscrit dans une perspective de gain de connaissances sur un thème hautement d’actualité en santé publique. Nous encourageons les chercheurs intéressés par l’identification des facteurs de risque de chutes chez les personnes âgées à recourir à la méthode statistique de Wei, Lin et Weissfeld car elle tient compte des expositions variables dans le temps et des événements récurrents. Davantage de recherches seront par ailleurs nécessaires pour déterminer le choix du meilleur test de dépistage pour un facteur de risque donné chez cette clientèle.
Resumo:
Metal matrix composites (MMC) having aluminium (Al) in the matrix phase and silicon carbide particles (SiCp) in reinforcement phase, ie Al‐SiCp type MMC, have gained popularity in the re‐cent past. In this competitive age, manufacturing industries strive to produce superior quality products at reasonable price. This is possible by achieving higher productivity while performing machining at optimum combinations of process variables. The low weight and high strength MMC are found suitable for variety of components
Resumo:
This research investigates what information German Fairtrade coffee consumers search for during pre-purchase information seeking and to what extent information is retrieved. Furthermore, the sequence of the information search as well as the degree of cognitive involvement is highlighted. The role of labeling, the importance of additional ethical information and its quality in terms of concreteness as well as the importance of product price and organic origin are addressed. A set of information relevant to Fairtrade consumers was tested by means of the Information Display Matrix (IDM) method with 389 Fairtrade consumers. Results show that prior to purchase, information on product packages plays an important role and is retrieved rather extensively, but search strategies that reduce the information processing effort are applied as well. Furthermore, general information is preferred over specific information. Results of two regression analyses indicate that purchase decisions are related to search behavior variables rather than to socio-demographic variables and purchase motives. In order to match product information with consumers’ needs, marketers should offer information that is reduced to the central aspects of Fairtrade.