860 resultados para Multiple Additive Regression Trees (MART)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nesse artigo, tem-se o interesse em avaliar diferentes estratégias de estimação de parâmetros para um modelo de regressão linear múltipla. Para a estimação dos parâmetros do modelo foram utilizados dados de um ensaio clínico em que o interesse foi verificar se o ensaio mecânico da propriedade de força máxima (EM-FM) está associada com a massa femoral, com o diâmetro femoral e com o grupo experimental de ratas ovariectomizadas da raça Rattus norvegicus albinus, variedade Wistar. Para a estimação dos parâmetros do modelo serão comparadas três metodologias: a metodologia clássica, baseada no método dos mínimos quadrados; a metodologia Bayesiana, baseada no teorema de Bayes; e o método Bootstrap, baseado em processos de reamostragem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EN] Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2002 Mathematics Subject Classification: 62J05, 62G35.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new), and respiratory rate predictor RRP) with three main components of cow’s milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001) with R2 (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Long term, high quality estimates of burned area are needed for improving both prognostic and diagnostic fire emissions models and for assessing feedbacks between fire and the climate system. We developed global, monthly burned area estimates aggregated to 0.5° spatial resolution for the time period July 1996 through mid-2009 using four satellite data sets. From 2001ĝ€ "2009, our primary data source was 500-m burned area maps produced using Moderate Resolution Imaging Spectroradiometer (MODIS) surface reflectance imagery; more than 90% of the global area burned during this time period was mapped in this fashion. During times when the 500-m MODIS data were not available, we used a combination of local regression and regional regression trees developed over periods when burned area and Terra MODIS active fire data were available to indirectly estimate burned area. Cross-calibration with fire observations from the Tropical Rainfall Measuring Mission (TRMM) Visible and Infrared Scanner (VIRS) and the Along-Track Scanning Radiometer (ATSR) allowed the data set to be extended prior to the MODIS era. With our data set we estimated that the global annual area burned for the years 1997ĝ€ "2008 varied between 330 and 431 Mha, with the maximum occurring in 1998. We compared our data set to the recent GFED2, L3JRC, GLOBCARBON, and MODIS MCD45A1 global burned area products and found substantial differences in many regions. Lastly, we assessed the interannual variability and long-term trends in global burned area over the past 13 years. This burned area time series serves as the basis for the third version of the Global Fire Emissions Database (GFED3) estimates of trace gas and aerosol emissions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Le but de cette thèse est d’expliquer la délinquance prolifique de certains délinquants. Nous avançons la thèse que la délinquance prolifique s’explique par la formation plus fréquente de situations criminogènes. Ces situations réfèrent au moment où un délinquant entre en interaction avec une opportunité criminelle dans un contexte favorable au crime. Plus exactement, il s’agit du moment où le délinquant fait face à cette opportunité, mais où le crime n’a pas encore été commis. La formation de situations criminogènes est facilitée par l’interaction et l’interdépendance de trois éléments : la propension à la délinquance de la personne, son entourage criminalisé et son style de vie. Ainsi, la délinquance prolifique ne pourrait être expliquée adéquatement sans tenir compte de l’interaction entre le risque individuel et le risque contextuel. L’objectif général de la présente thèse est de faire la démonstration de l’importance d’une modélisation interactionnelle entre le risque individuel et le risque contextuel afin d’expliquer la délinquance plus prolifique de certains contrevenants. Pour ce faire, 155 contrevenants placés sous la responsabilité de deux établissements des Services correctionnels du Québec et de quatre centres jeunesse du Québec ont complété un protocole d’évaluation par questionnaires auto-administrés. Dans un premier temps (chapitre trois), nous avons décrit et comparé la nature de la délinquance autorévélée des contrevenants de notre échantillon. Ce premier chapitre de résultats a permis de mettre en valeur le fait que ce bassin de contrevenants est similaire à d’autres échantillons de délinquants en ce qui a trait à la nature de leur délinquance, plus particulièrement, au volume, à la variété et à la gravité de leurs crimes. En effet, la majorité des participants rapportent un volume faible de crimes contre la personne et contre les biens alors qu’un petit groupe se démarque par un lambda très élevé (13,1 % des délinquants de l’échantillon sont responsables de 60,3% de tous les crimes rapportés). Environ quatre délinquants sur cinq rapportent avoir commis au moins un crime contre la personne et un crime contre les biens. De plus, plus de 50% de ces derniers rapportent dans au moins quatre sous-catégories. Finalement, bien que les délinquants de notre échantillon aient un IGC (indice de gravité de la criminalité) moyen relativement faible (médiane = 77), près de 40% des contrevenants rapportent avoir commis au moins un des deux crimes les plus graves recensés dans cette étude (décharger une arme et vol qualifié). Le second objectif spécifique était d’explorer, au chapitre quatre, l’interaction entre les caractéristiques personnelles, l’entourage et le style de vie des délinquants dans la formation de situations criminogènes. Les personnes ayant une propension à la délinquance plus élevée semblent avoir tendance à être davantage entourées de personnes criminalisées et à avoir un style de vie plus oisif. L’entourage criminalisé semble également influencer le style de vie de ces délinquants. Ainsi, l’interdépendance entre ces trois éléments facilite la formation plus fréquente de situations criminogènes et crée une conjoncture propice à l’émergence de la délinquance prolifique. Le dernier objectif spécifique de la thèse, qui a été couvert dans le chapitre cinq, était d’analyser l’impact de la formation de situations criminogènes sur la nature de la délinquance. Les analyses de régression linéaires multiples et les arbres de régression ont permis de souligner la contribution des caractéristiques personnelles, de l’entourage et du style de vie dans l’explication de la nature de la délinquance. D’un côté, les analyses de régression (modèles additifs) suggèrent que l’ensemble des éléments favorisant la formation de situations criminogènes apporte une contribution unique à l’explication de la délinquance. D’un autre côté, les arbres de régression nous ont permis de mieux comprendre l’interaction entre les éléments dans l’explication de la délinquance prolifique. En effet, un positionnement plus faible sur certains éléments peut être compensé par un positionnement plus élevé sur d’autres. De plus, l’accumulation d’éléments favorisant la formation de situations criminogènes ne se fait pas de façon linéaire. Ces conclusions sont appuyées sur des proportions de variance expliquée plus élevées que celles des régressions linéaires multiples. En conclusion, mettre l’accent que sur un seul élément (la personne et sa propension à la délinquance ou le contexte et ses opportunités) ou leur combinaison de façon simplement additive ne permet pas de rendre justice à la complexité de l’émergence de la délinquance prolifique. En mettant à l’épreuve empiriquement cette idée généralement admise, cette thèse permet donc de souligner l’importance de considérer l’interaction entre le risque individuel et le risque contextuel dans l’explication de la délinquance prolifique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiple regression analysis is a statistical technique which allows to predict a dependent variable from m ore than one independent variable and also to determine influential independent variables. Using experimental data, in this study the multiple regression analysis is applied to predict the room mean velocity and determine the most influencing parameters on the velocity. More than 120 experiments for four different heat source locations were carried out in a test chamber with a high level wall mounted air supply terminal at air change rates 3-6 ach. The influence of the environmental parameters such as supply air momentum, room heat load, Archimedes number and local temperature ratio, were examined by two methods: a simple regression analysis incorporated into scatter matrix plots and multiple stepwise regression analysis. It is concluded that, when a heat source is located along the jet centre line, the supply momentum mainly influences the room mean velocity regardless of the plume strength. However, when the heat source is located outside the jet region, the local temperature ratio (the inverse of the local heat removal effectiveness) is a major influencing parameter.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In previous statnotes, the application of correlation and regression methods to the analysis of two variables (X,Y) was described. These methods can be used to determine whether there is a linear relationship between the two variables, whether the relationship is positive or negative, to test the degree of significance of the linear relationship, and to obtain an equation relating Y to X. This Statnote extends the methods of linear correlation and regression to situations where there are two or more X variables, i.e., 'multiple linear regression’.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives: The objectives of this study were to specifically investigate the differences in culture, attitudes and social networks between Australian and Taiwanese men and women and identify the factors that predict midlife men and women’s quality of life in both countries. Methods: A stratified random sample strategy based on probability proportional sampling (PPS) was conducted to investigate 278 Australian and 398 Taiwanese midlife men and women’s quality of life. Multiple regression modelling and classification and regression trees (CARTs) were performed to examine the potential differences on culture, attitude, social networks, social demographic factors and religion/spirituality in midlife men and women’s quality of life in both Australia and Taiwan. Results: The results of this study suggest that culture involves multiple functions and interacts with attitudes, social networks and individual factors to influence a person’s quality of life. Significant relationships were found between the interaction between cultural circumstances and a person’s internal and external factors. The research found that good social support networks and a healthy optimistic disposition may significantly enhance midlife men and women’s quality of life. Conclusion: The study indicated that there is a significant relationship between culture, attitude, social networks and quality of life in midlife Australian and Taiwanese men and women. People who had higher levels of horizontal individualism and collectivism, positive attitudes and better social support had better psychological, social, physical and environmental health, while it emerged that vertical individualists with competitive characteristics would experience a lower quality of life. This study has highlighted areas where opportunities exist to further reflect upon contemporary social health policies for Australian and Taiwanese societies and also within the global perspective, in order to provide enhanced quality care for growing midlife populations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Habitat models are widely used in ecology, however there are relatively few studies of rare species, primarily because of a paucity of survey records and lack of robust means of assessing accuracy of modelled spatial predictions. We investigated the potential of compiled ecological data in developing habitat models for Macadamia integrifolia, a vulnerable mid-stratum tree endemic to lowland subtropical rainforests of southeast Queensland, Australia. We compared performance of two binomial models—Classification and Regression Trees (CART) and Generalised Additive Models (GAM)—with Maximum Entropy (MAXENT) models developed from (i) presence records and available absence data and (ii) developed using presence records and background data. The GAM model was the best performer across the range of evaluation measures employed, however all models were assessed as potentially useful for informing in situ conservation of M. integrifolia, A significant loss in the amount of M. integrifolia habitat has occurred (p < 0.05), with only 37% of former habitat (pre-clearing) remaining in 2003. Remnant patches are significantly smaller, have larger edge-to-area ratios and are more isolated from each other compared to pre-clearing configurations (p < 0.05). Whilst the network of suitable habitat patches is still largely intact, there are numerous smaller patches that are more isolated in the contemporary landscape compared with their connectedness before clearing. These results suggest that in situ conservation of M. integrifolia may be best achieved through a landscape approach that considers the relative contribution of small remnant habitat fragments to the species as a whole, as facilitating connectivity among the entire network of habitat patches.