45 resultados para REGRESSION MULTINOMIAL ANALYSIS
Resumo:
The application of compositional data analysis through log ratio trans-formations corresponds to a multinomial logit model for the shares themselves.This model is characterized by the property of Independence of Irrelevant Alter-natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactlythis invariance of the ratio that underlies the commonly used zero replacementprocedure in compositional data analysis. In this paper we investigate using thenested logit model that does not embody IIA and an associated zero replacementprocedure and compare its performance with that of the more usual approach ofusing the multinomial logit model. Our comparisons exploit a data set that com-bines voting data by electoral division with corresponding census data for eachdivision for the 2001 Federal election in Australia
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models
Resumo:
The application of Discriminant function analysis (DFA) is not a new idea in the studyof tephrochrology. In this paper, DFA is applied to compositional datasets of twodifferent types of tephras from Mountain Ruapehu in New Zealand and MountainRainier in USA. The canonical variables from the analysis are further investigated witha statistical methodology of change-point problems in order to gain a betterunderstanding of the change in compositional pattern over time. Finally, a special caseof segmented regression has been proposed to model both the time of change and thechange in pattern. This model can be used to estimate the age for the unknown tephrasusing Bayesian statistical calibration
Resumo:
In CoDaWork’05, we presented an application of discriminant function analysis (DFA) to 4 differentcompositional datasets and modelled the first canonical variable using a segmented regression modelsolely based on an observation about the scatter plots. In this paper, multiple linear regressions areapplied to different datasets to confirm the validity of our proposed model. In addition to dating theunknown tephras by calibration as discussed previously, another method of mapping the unknown tephrasinto samples of the reference set or missing samples in between consecutive reference samples isproposed. The application of these methodologies is demonstrated with both simulated and real datasets.This new proposed methodology provides an alternative, more acceptable approach for geologists as theirfocus is on mapping the unknown tephra with relevant eruptive events rather than estimating the age ofunknown tephra.Kew words: Tephrochronology; Segmented regression
Resumo:
This paper studies the extent to which social networks influence the employment stability and wages of immigrants in Spain. By doing so, I consider an aspect that has not been previously addressed in the empirical literature, namely the connection between immigrants' social networks and labor market outcomes in Spain. For this purpose, I use micro-data from the National Immigrant Survey carried out in 2007. The analysis is conducted in two stages. First, the impact of social networks on the probability of keeping the first job obtained in Spain is studied through a multinomial logit regression. Second, quantile regressions are used to estimate a wage equation. The empirical results suggest that once the endogeneity problem has been accounted for, immigrants' social networks influence their labor market outcomes. On arrival, immigrants experience a mismatch in the labor market. In addition, different effects of social networks on wages by gender and wage distribution are found. While contacts on arrival and informal job access mechanisms positively influence women's wages, a wage penalty is observed for men.
Resumo:
This paper analyses the extent to which individual and workplacecharacteristics and regional policies influence the use and duration ofparental leave in Spain. The research is based on a sample of 125,165people, and 6,959 parental leaves stemming from the ‘Sample ofWorking Life Histories’ (SWLH), 2006. The SWLH consists of administrative register data which include information from threedifferent sources: Social Security, Municipality and Income TaxRegisters. We adopt a simultaneous equations approach to analyse theuse (logistic regression) and duration (event history analysis) ofparental leave, which allows us to control for endogeneity and censoredobservations. We argue that the Spanish parental leave scheme increases gender and social inequalities insofar as reinforces genderrole specialization, and only encourages the reconciling of work andfamily life among workers with a good position in the labour market(educated employees with high and stable working status).
Resumo:
The paper examines the relationship between family formation (i.e., living with a partner and having children) and women’s occupational career in southern Europe (i.e., Greece, Italy, Portugal and Spain). The relationship is explored by analysing the impact that different family structures and male [nvolvement in caring activities have on women’s early occupational trajectories (i.e., remaining in the same occupational status, experiencing downward or upward mobility, or withdrawing from paid work). This research shows that male involvement in caring activities does not really push women ahead in their career, but the absolute lack of male support seems to negatively affect women’s permanence in paid work. These results apply to all southern European countries except Portugal, where the absolute absence of the partners’ support in caring activities does not seem to alter women’s determination to remain in paid work. The methodology applied consists of the estimation of multinomial logit regression models and the analysis is based on eight waves (1994-2001) of the European Community Household Panel (ECHP).
Resumo:
We offer new evidence on multi-level determinants of the gender division of housework. Using data from the 2004 European Social Survey (ESS) for 26 European, we study the micro and macro-level factors which increase the likelihood of men doing an equal or greater share of housework than their female partners. A sample of 11,915 young men and women is analysed with a multi-level logistic regression in order to test at individual level the classic relative-income, time-availability and gender-role values, and a new couple conflict hypothesis. At individual level we find significant relationships between relative resources, values, couple's disagreement, and the division of housework which support more economic dependency than "doing gender" perspectives. At the macro-level, we find important composition effects and also support for gender empowerment, family model and social stratification explanations of cross-country differences.
Resumo:
We consider the application of normal theory methods to the estimation and testing of a general type of multivariate regressionmodels with errors--in--variables, in the case where various data setsare merged into a single analysis and the observable variables deviatepossibly from normality. The various samples to be merged can differ on the set of observable variables available. We show that there is a convenient way to parameterize the model so that, despite the possiblenon--normality of the data, normal--theory methods yield correct inferencesfor the parameters of interest and for the goodness--of--fit test. Thetheory described encompasses both the functional and structural modelcases, and can be implemented using standard software for structuralequations models, such as LISREL, EQS, LISCOMP, among others. An illustration with Monte Carlo data is presented.
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
Logistic regression is included into the analysis techniques which are valid for observationalmethodology. However, its presence at the heart of thismethodology, and more specifically in physical activity and sports studies, is scarce. With a view to highlighting the possibilities this technique offers within the scope of observational methodology applied to physical activity and sports, an application of the logistic regression model is presented. The model is applied in the context of an observational design which aims to determine, from the analysis of use of the playing area, which football discipline (7 a side football, 9 a side football or 11 a side football) is best adapted to the child"s possibilities. A multiple logistic regression model can provide an effective prognosis regarding the probability of a move being successful (reaching the opposing goal area) depending on the sector in which the move commenced and the football discipline which is being played.
Resumo:
Sickness absence (SA) is an important social, economic and public health issue. Identifying and understanding the determinants, whether biological, regulatory or, health services-related, of variability in SA duration is essential for better management of SA. The conditional frailty model (CFM) is useful when repeated SA events occur within the same individual, as it allows simultaneous analysis of event dependence and heterogeneity due to unknown, unmeasured, or unmeasurable factors. However, its use may encounter computational limitations when applied to very large data sets, as may frequently occur in the analysis of SA duration. To overcome the computational issue, we propose a Poisson-based conditional frailty model (CFPM) for repeated SA events that accounts for both event dependence and heterogeneity. To demonstrate the usefulness of the model proposed in the SA duration context, we used data from all non-work-related SA episodes that occurred in Catalonia (Spain) in 2007, initiated by either a diagnosis of neoplasm or mental and behavioral disorders. As expected, the CFPM results were very similar to those of the CFM for both diagnosis groups. The CPU time for the CFPM was substantially shorter than the CFM. The CFPM is an suitable alternative to the CFM in survival analysis with recurrent events,especially with large databases.
Resumo:
The final year project came to us as an opportunity to get involved in a topic which has appeared to be attractive during the learning process of majoring in economics: statistics and its application to the analysis of economic data, i.e. econometrics.Moreover, the combination of econometrics and computer science is a very hot topic nowadays, given the Information Technologies boom in the last decades and the consequent exponential increase in the amount of data collected and stored day by day. Data analysts able to deal with Big Data and to find useful results from it are verydemanded in these days and, according to our understanding, the work they do, although sometimes controversial in terms of ethics, is a clear source of value added both for private corporations and the public sector. For these reasons, the essence of this project is the study of a statistical instrument valid for the analysis of large datasets which is directly related to computer science: Partial Correlation Networks.The structure of the project has been determined by our objectives through the development of it. At first, the characteristics of the studied instrument are explained, from the basic ideas up to the features of the model behind it, with the final goal of presenting SPACE model as a tool for estimating interconnections in between elements in large data sets. Afterwards, an illustrated simulation is performed in order to show the power and efficiency of the model presented. And at last, the model is put into practice by analyzing a relatively large data set of real world data, with the objective of assessing whether the proposed statistical instrument is valid and useful when applied to a real multivariate time series. In short, our main goals are to present the model and evaluate if Partial Correlation Network Analysis is an effective, useful instrument and allows finding valuable results from Big Data.As a result, the findings all along this project suggest the Partial Correlation Estimation by Joint Sparse Regression Models approach presented by Peng et al. (2009) to work well under the assumption of sparsity of data. Moreover, partial correlation networks are shown to be a very valid tool to represent cross-sectional interconnections in between elements in large data sets.The scope of this project is however limited, as there are some sections in which deeper analysis would have been appropriate. Considering intertemporal connections in between elements, the choice of the tuning parameter lambda, or a deeper analysis of the results in the real data application are examples of aspects in which this project could be completed.To sum up, the analyzed statistical tool has been proved to be a very useful instrument to find relationships that connect the elements present in a large data set. And after all, partial correlation networks allow the owner of this set to observe and analyze the existing linkages that could have been omitted otherwise.
Resumo:
This paper analyses the effect of R&D investment on firm growth. We use an extensive sample of Spanish manufacturing and service firms. The database comprises diverse waves of Spanish Community Innovation Survey and covers the period 2004–2008. First, a probit model corrected for sample selection analyses the role of innovation on the probability of being a high-growth firm (HGF). Second, a quantile regression technique is applied to explore the determinants of firm growth. Our database shows that a small number of firms experience fast growth rates in terms of sales or employees. Our results reveal that R&D investments positively affect the probability of becoming a HGF. However, differences appear between manufacturing and service firms. Finally, when we study the impact of R&D investment on firm growth, quantile estimations show that internal R&D presents a significant positive impact for the upper quantiles, while external R&D shows a significant positive impact up to the median. Keywords : High-growth firms, Firm growth, Innovation activity. JEL Classifications : L11, L25, L26, O30