8 resultados para Multinomial Logit
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
We are concerned with providing more empirical evidence on forecast failure, developing forecast models, and examining the impact of events such as audit reports. A joint consideration of classic financial ratios and relevant external indicators leads us to build a basic prediction model focused in non-financial Galician SMEs. Explanatory variables are relevant financial indicators from the viewpoint of the financial logic and financial failure theory. The paper explores three mathematical models: discriminant analysis, Logit, and linear multivariate regression. We conclude that, even though they both offer high explanatory and predictive abilities, Logit and MDA models should be used and interpreted jointly.
Resumo:
In order to study the impact of premature birth and low income on mother–infant interaction, four Portuguese samples were gathered: full-term, middle-class (n=99); premature, middle-class (n=63); full-term, low income (n=22); and premature, low income (n=21). Infants were filmed in a free play situation with their mothers, and the results were scored using the CARE Index. By means of multinomial regression analysis, social economic status (SES) was found to be the best predictor of maternal sensitivity and infant cooperative behavior within a set of medical and social factors. Contrary to the expectations of the cumulative risk perspective, two factors of risk (premature birth together with low SES) were as negative for mother–infant interaction as low SES solely. In this study, as previous studies have shown, maternal sensitivity and infant cooperative behavior were highly correlated, as was maternal control with infant compliance. Our results further indicate that, when maternal lack of responsiveness is high, the infant displays passive behavior, whereas when the maternal lack of responsiveness is medium, the infant displays difficult behavior. Indeed, our findings suggest that, in these cases, the link between types of maternal and infant interactive behavior is more dependent on the degree of maternal lack of responsiveness than it is on birth status or SES. The results will be discussed under a developmental and evolutionary reasoning
Resumo:
Mestrado em Contabilidade e Análise Financeira
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.
Resumo:
In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Mestrado em Contabilidade e Gestão das Instituições Financeiras