923 resultados para model selection in binary regression


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The solution of a TU cooperative game can be a distribution of the value of the grand coalition, i.e. it can be a distribution of the payo (utility) all the players together achieve. In a regression model, the evaluation of the explanatory variables can be a distribution of the overall t, i.e. the t of the model every regressor variable is involved. Furthermore, we can take regression models as TU cooperative games where the explanatory (regressor) variables are the players. In this paper we introduce the class of regression games, characterize it and apply the Shapley value to evaluating the explanatory variables in regression models. In order to support our approach we consider Young (1985)'s axiomatization of the Shapley value, and conclude that the Shapley value is a reasonable tool to evaluate the explanatory variables of regression models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Venture capitalists can be regarded as financers of young, high-risk enterprises, seeking investments with a high growth potential and offering professional support above and beyond their capital investment. The aim of this study is to analyse the occurrence of information asymmetry between venture capital investors and entrepreneurs, with special regard to the problem of adverse selection. In the course of my empirical research, I conducted in-depth interviews with 10 venture capital investors. The aim of the research was to elicit their opinions about the situation regarding information asymmetry, how they deal with problems arising from adverse selection, and what measures they take to manage these within the investment process. In the interviews we also touched upon how investors evaluate state intervention, and how much they believe company managers are influenced by state support.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The major objectives of this dissertation were to develop optimal spatial techniques to model the spatial-temporal changes of the lake sediments and their nutrients from 1988 to 2006, and evaluate the impacts of the hurricanes occurred during 1998–2006. Mud zone reduced about 10.5% from 1988 to 1998, and increased about 6.2% from 1998 to 2006. Mud areas, volumes and weight were calculated using validated Kriging models. From 1988 to 1998, mud thicknesses increased up to 26 cm in the central lake area. The mud area and volume decreased about 13.78% and 10.26%, respectively. From 1998 to 2006, mud depths declined by up to 41 cm in the central lake area, mud volume reduced about 27%. Mud weight increased up to 29.32% from 1988 to 1998, but reduced over 20% from 1998 to 2006. The reduction of mud sediments is likely due to re-suspension and redistribution by waves and currents produced by large storm events, particularly Hurricanes Frances and Jeanne in 2004 and Wilma in 2005. Regression, kriging, geographically weighted regression (GWR) and regression-kriging models have been calibrated and validated for the spatial analysis of the sediments TP and TN of the lake. GWR models provide the most accurate predictions for TP and TN based on model performance and error analysis. TP values declined from an average of 651 to 593 mg/kg from 1998 to 2006, especially in the lake’s western and southern regions. From 1988 to 1998, TP declined in the northern and southern areas, and increased in the central-western part of the lake. The TP weights increased about 37.99%–43.68% from 1988 to 1998 and decreased about 29.72%–34.42% from 1998 to 2006. From 1988 to 1998, TN decreased in most areas, especially in the northern and southern lake regions; western littoral zone had the biggest increase, up to 40,000 mg/kg. From 1998 to 2006, TN declined from an average of 9,363 to 8,926 mg/kg, especially in the central and southern regions. The biggest increases occurred in the northern lake and southern edge areas. TN weights increased about 15%–16.2% from 1988 to 1998, and decreased about 7%–11% from 1998 to 2006.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Foundations support constitute one of the types of legal entities of private law forged with the purpose of supporting research projects, education and extension and institutional, scientific and technological development of Brazil. Observed as links of the relationship between company, university, and government, foundations supporting emerge in the Brazilian scene from the principle to establish an economic platform of development based on three pillars: science, technology and innovation – ST&I. In applied terms, these ones operate as tools of debureaucratisation making the management between public entities more agile, especially in the academic management in accordance with the approach of Triple Helix. From the exposed, the present study has as purpose understanding how the relation of Triple Helix intervenes in the fund-raising process of Brazilian foundations support. To understand the relations submitted, it was used the interaction models University-Company-Government recommended by Sábato and Botana (1968), the approach of the Triple Helix proposed by Etzkowitz and Leydesdorff (2000), as well as the perspective of the national innovation systems discussed by Freeman (1987, 1995), Nelson (1990, 1993) and Lundvall (1992). The research object of this study consists of 26 state foundations that support research associated with the National Council of the State Foundations of Supporting Research - CONFAP, as well as the 102 foundations in support of IES associated with the National Council of Foundations of Support for Institutions of Higher Education and Scientific and Technological Research – CONFIES, totaling 128 entities. As a research strategy, this study is considered as an applied research with a quantitative approach. Primary research data were collected using the e-mail Survey procedure. Seventy-five observations were collected, which corresponds to 58.59% of the research universe. It is considering the use of the bootstrap method in order to validate the use of the sample in the analysis of results. For data analysis, it was used descriptive statistics and multivariate data analysis techniques: the cluster analysis; the canonical correlation and the binary logistic regression. From the obtained canonical roots, the results indicated that the dependency relationship between the variables of relations (with the actors of the Triple Helix) and the financial resources invested in innovation projects is low, assuming the null hypothesis of this study, that the relations of the Triple Helix do not have interfered positively or negatively in raising funds for investments in innovation projects. On the other hand, the results obtained with the cluster analysis indicate that entities which have greater quantitative and financial amounts of projects are mostly large foundations (over 100 employees), which support up to five IES, publish management reports and use in their capital structure, greater financing of the public department. Finally, it is pertinent to note that the power of the classification of the logistic model obtained in this study showed high predictive capacity (80.0%) providing to the academic community replication in environments of similar analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In longitudinal data analysis, our primary interest is in the regression parameters for the marginal expectations of the longitudinal responses; the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly for correlated discrete outcome data. Marginal modeling approaches such as generalized estimating equations (GEEs) have received much attention in the context of longitudinal regression. These methods are based on the estimates of the first two moments of the data and the working correlation structure. The confidence regions and hypothesis tests are based on the asymptotic normality. The methods are sensitive to misspecification of the variance function and the working correlation structure. Because of such misspecifications, the estimates can be inefficient and inconsistent, and inference may give incorrect results. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its characteristics and asymptotic properties. We also provide an algorithm based on EL principles for the estimation of the regression parameters and the construction of a confidence region for the parameter of interest. We extend our approach to variable selection for highdimensional longitudinal data with many covariates. In this situation it is necessary to identify a submodel that adequately represents the data. Including redundant variables may impact the model’s accuracy and efficiency for inference. We propose a penalized empirical likelihood (PEL) variable selection based on GEEs; the variable selection and the estimation of the coefficients are carried out simultaneously. We discuss its characteristics and asymptotic properties, and present an algorithm for optimizing PEL. Simulation studies show that when the model assumptions are correct, our method performs as well as existing methods, and when the model is misspecified, it has clear advantages. We have applied the method to two case examples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In questo studio, un multi-model ensemble è stato implementato e verificato, seguendo una delle priorità di ricerca del Subseasonal to Seasonal Prediction Project (S2S). Una regressione lineare è stata applicata ad un insieme di previsioni di ensemble su date passate, prodotte dai centri di previsione mensile del CNR-ISAC e ECMWF-IFS. Ognuna di queste contiene un membro di controllo e quattro elementi perturbati. Le variabili scelte per l'analisi sono l'altezza geopotenziale a 500 hPa, la temperatura a 850 hPa e la temperatura a 2 metri, la griglia spaziale ha risoluzione 1 ◦ × 1 ◦ lat-lon e sono stati utilizzati gli inverni dal 1990 al 2010. Le rianalisi di ERA-Interim sono utilizzate sia per realizzare la regressione, sia nella validazione dei risultati, mediante stimatori nonprobabilistici come lo scarto quadratico medio (RMSE) e la correlazione delle anomalie. Successivamente, tecniche di Model Output Statistics (MOS) e Direct Model Output (DMO) sono applicate al multi-model ensemble per ottenere previsioni probabilistiche per la media settimanale delle anomalie di temperatura a 2 metri. I metodi MOS utilizzati sono la regressione logistica e la regressione Gaussiana non-omogenea, mentre quelli DMO sono il democratic voting e il Tukey plotting position. Queste tecniche sono applicate anche ai singoli modelli in modo da effettuare confronti basati su stimatori probabilistici, come il ranked probability skill score, il discrete ranked probability skill score e il reliability diagram. Entrambe le tipologie di stimatori mostrano come il multi-model abbia migliori performance rispetto ai singoli modelli. Inoltre, i valori più alti di stimatori probabilistici sono ottenuti usando una regressione logistica sulla sola media di ensemble. Applicando la regressione a dataset di dimensione ridotta, abbiamo realizzato una curva di apprendimento che mostra come un aumento del numero di date nella fase di addestramento non produrrebbe ulteriori miglioramenti.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Successful conservation of migratory birds demands we understand how habitat factors on the breeding grounds influences breeding success. Multiple factors are known to directly influence breeding success in territorial songbirds. For example, greater food availability and fewer predators can have direct effects on breeding success. However, many of these same habitat factors can also result in higher conspecific density that may ultimately reduce breeding success through density dependence. In this case, there is a negative indirect effect of habitat on breeding success through its effects on conspecific density and territory size. Therefore, a key uncertainty facing land managers is whether important habitat attributes directly influence breeding success or indirectly influence breeding success through territory size. We used radio-telemetry, point-counts, vegetation sampling, predator observations, and insect sampling over two years to provide data on habitat selection of a steeply declining songbird species, the Canada Warbler (Cardellina canadensis). These data were then applied in a hierarchical path modeling framework and an AIC model selection approach to determine the habitat attributes that best predict breeding success. Canada Warblers had smaller territories in areas with high shrub cover, in the presence of red squirrels (Tamiasciurus hudsonicus), at shoreline sites relative to forest-interior sites and as conspecific density increased. Breeding success was lower for birds with smaller territories, which suggests competition for limited food resources, but there was no direct evidence that food availability influenced territory size or breeding success. The negative relationship between shrub cover and territory size in our study may arise because these specific habitat conditions are spatially heterogeneous, whereby individuals pack into patches of preferred breeding habitat scattered throughout the landscape, resulting in reduced territory size and an associated reduction in resource availability per territory. Our results therefore highlight the importance of considering direct and indirect effects for Canada warblers; efforts to increase the amount of breeding habitat may ultimately result in lower breeding success if habitat availability is limited and negative density dependent effects occur.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Although many studies have investigated sexual communication between parents and children in Kenya, none have focused singularly on grandparent and grandchild communication when grandparents are primary caregivers. Further, few studies have asked about specific topics related to sex, instead asking generally about “sex related topics” or focusing on HIV/AIDS. This research aims to investigate communication on ten specific sex- related topics between grandparents who are primary caregivers and their grandchildren. The primary research aim was to identify facilitators and barriers to grandparent-grandchild communication associated with frequency of communication. A secondary exploratory question was whether frequency of communication and youth satisfaction with communication were associated with youth’s desire for more communication in the future. Methods: The study was conducted in urban and peri-urban central Kenya. A convenience sample of 193 grandparents and 166 twelve to fifteen year old grandchildren were identified by community health workers. A cross sectional survey assessed nine potential barriers or facilitators to communication (e.g., frequency of communication, perceived grandparent knowledge, grandparent sense of responsibility to communication on a given topic) on ten specified sex- related topics (e.g., peer pressure on sex topics, romantic relationships, condoms). Bivariate and multivariable analyses identified significant associations between communication variables and the outcomes of interest. Results: Bivariate regression showed that higher grandchild age, grandchild gender, higher perceived grandparent knowledge, higher perceived grandparent comfort, higher grandparent-reported sense of responsibility, higher grandparent-reported belief that child should be aware of a given topic before initiating in sex, and higher youth’s own comfort during communication, were significantly associated with higher levels of communication frequency. In the multivariable model, higher grandchild age, gender, higher comfort during communication, and higher perceived grandparent knowledge remained significantly associated with higher levels communication frequency. For the secondary research question, higher communication frequency and higher levels of youth satisfaction were both significantly associated with higher levels of youth desire for more communication in bivariate regression, and higher levels of youth’s satisfaction with communication remained significantly associated with higher levels of youth’s desire for more in the adjusted analysis. Conclusions: This study found that several potential barriers and facilitators of communication are associated with both frequency of and youth’s desire for more communication. The association between grandchild age, gender and perceived grandparent knowledge and frequency of communication is similar to findings from other studies that have examined sex-related communication between parent primary caregivers and children. This finding has important implications for understanding grandparent and grandchild communication, and communication on specific topics in a population from Kenya. The positive association between youth satisfaction of and desire for more communication has important education policy and intervention implications, suggesting that if youth are satisfied with the communication with their caregivers, they may want to learn more.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mixtures of Zellner's g-priors have been studied extensively in linear models and have been shown to have numerous desirable properties for Bayesian variable selection and model averaging. Several extensions of g-priors to Generalized Linear Models (GLMs) have been proposed in the literature; however, the choice of prior distribution of g and resulting properties for inference have received considerably less attention. In this paper, we extend mixtures of g-priors to GLMs by assigning the truncated Compound Confluent Hypergeometric (tCCH) distribution to 1/(1+g) and illustrate how this prior distribution encompasses several special cases of mixtures of g-priors in the literature, such as the Hyper-g, truncated Gamma, Beta-prime, and the Robust prior. Under an integrated Laplace approximation to the likelihood, the posterior distribution of 1/(1+g) is in turn a tCCH distribution, and approximate marginal likelihoods are thus available analytically. We discuss the local geometric properties of the g-prior in GLMs and show that specific choices of the hyper-parameters satisfy the various desiderata for model selection proposed by Bayarri et al, such as asymptotic model selection consistency, information consistency, intrinsic consistency, and measurement invariance. We also illustrate inference using these priors and contrast them to others in the literature via simulation and real examples.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantile regression (QR) was first introduced by Roger Koenker and Gilbert Bassett in 1978. It is robust to outliers which affect least squares estimator on a large scale in linear regression. Instead of modeling mean of the response, QR provides an alternative way to model the relationship between quantiles of the response and covariates. Therefore, QR can be widely used to solve problems in econometrics, environmental sciences and health sciences. Sample size is an important factor in the planning stage of experimental design and observational studies. In ordinary linear regression, sample size may be determined based on either precision analysis or power analysis with closed form formulas. There are also methods that calculate sample size based on precision analysis for QR like C.Jennen-Steinmetz and S.Wellek (2005). A method to estimate sample size for QR based on power analysis was proposed by Shao and Wang (2009). In this paper, a new method is proposed to calculate sample size based on power analysis under hypothesis test of covariate effects. Even though error distribution assumption is not necessary for QR analysis itself, researchers have to make assumptions of error distribution and covariate structure in the planning stage of a study to obtain a reasonable estimate of sample size. In this project, both parametric and nonparametric methods are provided to estimate error distribution. Since the method proposed can be implemented in R, user is able to choose either parametric distribution or nonparametric kernel density estimation for error distribution. User also needs to specify the covariate structure and effect size to carry out sample size and power calculation. The performance of the method proposed is further evaluated using numerical simulation. The results suggest that the sample sizes obtained from our method provide empirical powers that are closed to the nominal power level, for example, 80%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

SELECTOR is a software package for studying the evolution of multiallelic genes under balancing or positive selection while simulating complex evolutionary scenarios that integrate demographic growth and migration in a spatially explicit population framework. Parameters can be varied both in space and time to account for geographical, environmental, and cultural heterogeneity. SELECTOR can be used within an approximate Bayesian computation estimation framework. We first describe the principles of SELECTOR and validate the algorithms by comparing its outputs for simple models with theoretical expectations. Then, we show how it can be used to investigate genetic differentiation of loci under balancing selection in interconnected demes with spatially heterogeneous gene flow. We identify situations in which balancing selection reduces genetic differentiation between population groups compared with neutrality and explain conflicting outcomes observed for human leukocyte antigen loci. These results and three previously published applications demonstrate that SELECTOR is efficient and robust for building insight into human settlement history and evolution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rotation is a key parameter in the evolution of massive stars, affecting their evolution, chemical yields, ionizing photon budget, and final fate. We determined the projected rotational velocity, υ e sin i, of ~330 O-type objects, i.e. ~210 spectroscopic single stars and ~110 primaries in binary systems, in the Tarantula nebula or 30 Doradus (30 Dor) region. The observations were taken using VLT/FLAMES and constitute the largest homogeneous dataset of multi-epoch spectroscopy of O-type stars currently available. The most distinctive feature of the υ e sin i distributions of the presumed-single stars and primaries in 30 Dor is a low-velocity peak at around 100 km s-1. Stellar winds are not expected to have spun-down the bulk of the stars significantly since their arrival on the main sequence and therefore the peak in the single star sample is likely to represent the outcome of the formation process. Whereas the spin distribution of presumed-single stars shows a well developed tail of stars rotating more rapidly than 300 km s-1, the sample of primaries does not feature such a high-velocity tail. The tail of the presumed-single star distribution is attributed for the most part - and could potentially be completely due - to spun-up binary products that appear as single stars or that have merged. This would be consistent with the lack of such post-interaction products in the binary sample, that is expected to be dominated by pre-interaction systems. The peak in this distribution is broader and is shifted toward somewhat higher spin rates compared to the distribution of presumed-single stars. Systems displaying large radial velocity variations, typical for short period systems, appear mostly responsible for these differences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The business model of an organization is an important strategic tool for its success, and should therefore be understood by business professionals and information technology professionals. By this context and considering the importance of information technology in contemporary business models, this article aims to verify the use of the business model components in the information technology (IT) projects management process in enterprises. To achieve this goal, this exploratory research has investigated the use of the Business Model concept in the information technology projects management, by a survey applied to 327 professionals from February to April 2012. It was observed that the business model concept, as well as its practices or its blocks, are not so well explored in its whole potential, possibly because it is relatively new. One of the benefits of this conceptual tool is to provide an understanding in terms of the core business for different areas, enabling a higher level of knowledge in terms of the essential activities of the enterprise IT professionals and the business area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cette thèse développe des méthodes bootstrap pour les modèles à facteurs qui sont couram- ment utilisés pour générer des prévisions depuis l'article pionnier de Stock et Watson (2002) sur les indices de diffusion. Ces modèles tolèrent l'inclusion d'un grand nombre de variables macroéconomiques et financières comme prédicteurs, une caractéristique utile pour inclure di- verses informations disponibles aux agents économiques. Ma thèse propose donc des outils éco- nométriques qui améliorent l'inférence dans les modèles à facteurs utilisant des facteurs latents extraits d'un large panel de prédicteurs observés. Il est subdivisé en trois chapitres complémen- taires dont les deux premiers en collaboration avec Sílvia Gonçalves et Benoit Perron. Dans le premier article, nous étudions comment les méthodes bootstrap peuvent être utilisées pour faire de l'inférence dans les modèles de prévision pour un horizon de h périodes dans le futur. Pour ce faire, il examine l'inférence bootstrap dans un contexte de régression augmentée de facteurs où les erreurs pourraient être autocorrélées. Il généralise les résultats de Gonçalves et Perron (2014) et propose puis justifie deux approches basées sur les résidus : le block wild bootstrap et le dependent wild bootstrap. Nos simulations montrent une amélioration des taux de couverture des intervalles de confiance des coefficients estimés en utilisant ces approches comparativement à la théorie asymptotique et au wild bootstrap en présence de corrélation sérielle dans les erreurs de régression. Le deuxième chapitre propose des méthodes bootstrap pour la construction des intervalles de prévision permettant de relâcher l'hypothèse de normalité des innovations. Nous y propo- sons des intervalles de prédiction bootstrap pour une observation h périodes dans le futur et sa moyenne conditionnelle. Nous supposons que ces prévisions sont faites en utilisant un ensemble de facteurs extraits d'un large panel de variables. Parce que nous traitons ces facteurs comme latents, nos prévisions dépendent à la fois des facteurs estimés et les coefficients de régres- sion estimés. Sous des conditions de régularité, Bai et Ng (2006) ont proposé la construction d'intervalles asymptotiques sous l'hypothèse de Gaussianité des innovations. Le bootstrap nous permet de relâcher cette hypothèse et de construire des intervalles de prédiction valides sous des hypothèses plus générales. En outre, même en supposant la Gaussianité, le bootstrap conduit à des intervalles plus précis dans les cas où la dimension transversale est relativement faible car il prend en considération le biais de l'estimateur des moindres carrés ordinaires comme le montre une étude récente de Gonçalves et Perron (2014). Dans le troisième chapitre, nous suggérons des procédures de sélection convergentes pour les regressions augmentées de facteurs en échantillons finis. Nous démontrons premièrement que la méthode de validation croisée usuelle est non-convergente mais que sa généralisation, la validation croisée «leave-d-out» sélectionne le plus petit ensemble de facteurs estimés pour l'espace généré par les vraies facteurs. Le deuxième critère dont nous montrons également la validité généralise l'approximation bootstrap de Shao (1996) pour les regressions augmentées de facteurs. Les simulations montrent une amélioration de la probabilité de sélectionner par- cimonieusement les facteurs estimés comparativement aux méthodes de sélection disponibles. L'application empirique revisite la relation entre les facteurs macroéconomiques et financiers, et l'excès de rendement sur le marché boursier américain. Parmi les facteurs estimés à partir d'un large panel de données macroéconomiques et financières des États Unis, les facteurs fortement correlés aux écarts de taux d'intérêt et les facteurs de Fama-French ont un bon pouvoir prédictif pour les excès de rendement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08