841 resultados para Rank estimation
Resumo:
A novel technique for estimating the rank of the trajectory matrix in the local subspace affinity (LSA) motion segmentation framework is presented. This new rank estimation is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built with LSA. The result is an enhanced model selection technique for trajectory matrix rank estimation by which it is possible to automate LSA, without requiring any a priori knowledge, and to improve the final segmentation
Resumo:
A novel technique for estimating the rank of the trajectory matrix in the local subspace affinity (LSA) motion segmentation framework is presented. This new rank estimation is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built with LSA. The result is an enhanced model selection technique for trajectory matrix rank estimation by which it is possible to automate LSA, without requiring any a priori knowledge, and to improve the final segmentation
Resumo:
In this paper a novel rank estimation technique for trajectories motion segmentation within the Local Subspace Affinity (LSA) framework is presented. This technique, called Enhanced Model Selection (EMS), is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built by LSA. The results on synthetic and real data show that without any a priori knowledge, EMS automatically provides an accurate and robust rank estimation, improving the accuracy of the final motion segmentation
Resumo:
In this paper a novel rank estimation technique for trajectories motion segmentation within the Local Subspace Affinity (LSA) framework is presented. This technique, called Enhanced Model Selection (EMS), is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built by LSA. The results on synthetic and real data show that without any a priori knowledge, EMS automatically provides an accurate and robust rank estimation, improving the accuracy of the final motion segmentation
Resumo:
En aquesta tesi s’estudia el problema de la segmentació del moviment. La tesi presenta una revisió dels principals algoritmes de segmentació del moviment, s’analitzen les característiques principals i es proposa una classificació de les tècniques més recents i importants. La segmentació es pot entendre com un problema d’agrupament d’espais (manifold clustering). Aquest estudi aborda alguns dels reptes més difícils de la segmentació de moviment a través l’agrupament d’espais. S’han proposat nous algoritmes per a l’estimació del rang de la matriu de trajectòries, s’ha presenta una mesura de similitud entre subespais, s’han abordat problemes relacionats amb el comportament dels angles canònics i s’ha desenvolupat una eina genèrica per estimar quants moviments apareixen en una seqüència. L´ultima part de l’estudi es dedica a la correcció de l’estimació inicial d’una segmentació. Aquesta correcció es du a terme ajuntant els problemes de la segmentació del moviment i de l’estructura a partir del moviment.
Resumo:
Various inference procedures for linear regression models with censored failure times have been studied extensively. Recent developments on efficient algorithms to implement these procedures enhance the practical usage of such models in survival analysis. In this article, we present robust inferences for certain covariate effects on the failure time in the presence of "nuisance" confounders under a semiparametric, partial linear regression setting. Specifically, the estimation procedures for the regression coefficients of interest are derived from a working linear model and are valid even when the function of the confounders in the model is not correctly specified. The new proposals are illustrated with two examples and their validity for cases with practical sample sizes is demonstrated via a simulation study.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Background: Food portion size estimation involves a complex mental process that may influence food consumption evaluation. Knowing the variables that influence this process can improve the accuracy of dietary assessment. The present study aimed to evaluate the ability of nutrition students to estimate food portions in usual meals and relate food energy content with errors in food portion size estimation. Methods: Seventy-eight nutrition students, who had already studied food energy content, participated in this cross-sectional study on the estimation of food portions, organised into four meals. The participants estimated the quantity of each food, in grams or millilitres, with the food in view. Estimation errors were quantified, and their magnitude were evaluated. Estimated quantities (EQ) lower than 90% and higher than 110% of the weighed quantity (WQ) were considered to represent underestimation and overestimation, respectively. Correlation between food energy content and error on estimation was analysed by the Spearman correlation, and comparison between the mean EQ and WQ was accomplished by means of the Wilcoxon signed rank test (P < 0.05). Results: A low percentage of estimates (18.5%) were considered accurate (+/- 10% of the actual weight). The most frequently underestimated food items were cauliflower, lettuce, apple and papaya; the most often overestimated items were milk, margarine and sugar. A significant positive correlation between food energy density and estimation was found (r = 0.8166; P = 0.0002). Conclusions: The results obtained in the present study revealed a low percentage of acceptable estimations of food portion size by nutrition students, with trends toward overestimation of high-energy food items and underestimation of low-energy items.
Resumo:
This paper develops an approach to rank testing that nests all existing rank tests andsimplifies their asymptotics. The approach is based on the fact that implicit in every ranktest there are estimators of the null spaces of the matrix in question. The approach yieldsmany new insights about the behavior of rank testing statistics under the null as well as localand global alternatives in both the standard and the cointegration setting. The approach alsosuggests many new rank tests based on alternative estimates of the null spaces as well as thenew fixed-b theory. A brief Monte Carlo study illustrates the results.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
Using vector autoregressive (VAR) models and Monte-Carlo simulation methods we investigate the potential gains for forecasting accuracy and estimation uncertainty of two commonly used restrictions arising from economic relationships. The Örst reduces parameter space by imposing long-term restrictions on the behavior of economic variables as discussed by the literature on cointegration, and the second reduces parameter space by imposing short-term restrictions as discussed by the literature on serial-correlation common features (SCCF). Our simulations cover three important issues on model building, estimation, and forecasting. First, we examine the performance of standard and modiÖed information criteria in choosing lag length for cointegrated VARs with SCCF restrictions. Second, we provide a comparison of forecasting accuracy of Ötted VARs when only cointegration restrictions are imposed and when cointegration and SCCF restrictions are jointly imposed. Third, we propose a new estimation algorithm where short- and long-term restrictions interact to estimate the cointegrating and the cofeature spaces respectively. We have three basic results. First, ignoring SCCF restrictions has a high cost in terms of model selection, because standard information criteria chooses too frequently inconsistent models, with too small a lag length. Criteria selecting lag and rank simultaneously have a superior performance in this case. Second, this translates into a superior forecasting performance of the restricted VECM over the VECM, with important improvements in forecasting accuracy ñreaching more than 100% in extreme cases. Third, the new algorithm proposed here fares very well in terms of parameter estimation, even when we consider the estimation of long-term parameters, opening up the discussion of joint estimation of short- and long-term parameters in VAR models.
Resumo:
We study the joint determination of the lag length, the dimension of the cointegrating space and the rank of the matrix of short-run parameters of a vector autoregressive (VAR) model using model selection criteria. We consider model selection criteria which have data-dependent penalties for a lack of parsimony, as well as the traditional ones. We suggest a new procedure which is a hybrid of traditional criteria and criteria with data-dependant penalties. In order to compute the fit of each model, we propose an iterative procedure to compute the maximum likelihood estimates of parameters of a VAR model with short-run and long-run restrictions. Our Monte Carlo simulations measure the improvements in forecasting accuracy that can arise from the joint determination of lag-length and rank, relative to the commonly used procedure of selecting the lag-length only and then testing for cointegration.
Resumo:
We study the joint determination of the lag length, the dimension of the cointegrating space and the rank of the matrix of short-run parameters of a vector autoregressive (VAR) model using model selection criteria. We consider model selection criteria which have data-dependent penalties as well as the traditional ones. We suggest a new two-step model selection procedure which is a hybrid of traditional criteria and criteria with data-dependant penalties and we prove its consistency. Our Monte Carlo simulations measure the improvements in forecasting accuracy that can arise from the joint determination of lag-length and rank using our proposed procedure, relative to an unrestricted VAR or a cointegrated VAR estimated by the commonly used procedure of selecting the lag-length only and then testing for cointegration. Two empirical applications forecasting Brazilian inflation and U.S. macroeconomic aggregates growth rates respectively show the usefulness of the model-selection strategy proposed here. The gains in different measures of forecasting accuracy are substantial, especially for short horizons.
Resumo:
We study the joint determination of the lag length, the dimension of the cointegrating space and the rank of the matrix of short-run parameters of a vector autoregressive (VAR) model using model selection criteria. We consider model selection criteria which have data-dependent penalties as well as the traditional ones. We suggest a new two-step model selection procedure which is a hybrid of traditional criteria and criteria with data-dependant penalties and we prove its consistency. Our Monte Carlo simulations measure the improvements in forecasting accuracy that can arise from the joint determination of lag-length and rank using our proposed procedure, relative to an unrestricted VAR or a cointegrated VAR estimated by the commonly used procedure of selecting the lag-length only and then testing for cointegration. Two empirical applications forecasting Brazilian in ation and U.S. macroeconomic aggregates growth rates respectively show the usefulness of the model-selection strategy proposed here. The gains in di¤erent measures of forecasting accuracy are substantial, especially for short horizons.
Resumo:
We study the joint determination of the lag length, the dimension of the cointegrating space and the rank of the matrix of short-run parameters of a vector autoregressive (VAR) model using model selection criteria. We suggest a new two-step model selection procedure which is a hybrid of traditional criteria and criteria with data-dependant penalties and we prove its consistency. A Monte Carlo study explores the finite sample performance of this procedure and evaluates the forecasting accuracy of models selected by this procedure. Two empirical applications confirm the usefulness of the model selection procedure proposed here for forecasting.