74 resultados para correlation-based feature selection
Resumo:
In this paper a novel rank estimation technique for trajectories motion segmentation within the Local Subspace Affinity (LSA) framework is presented. This technique, called Enhanced Model Selection (EMS), is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built by LSA. The results on synthetic and real data show that without any a priori knowledge, EMS automatically provides an accurate and robust rank estimation, improving the accuracy of the final motion segmentation
Resumo:
This contribution compares existing and newly developed techniques for geometrically representing mean-variances-kewness portfolio frontiers based on the rather widely adapted methodology of polynomial goal programming (PGP) on the one hand and the more recent approach based on the shortage function on the other hand. Moreover, we explain the working of these different methodologies in detail and provide graphical illustrations. Inspired by these illustrations, we prove a generalization of the well-known two fund separation theorem from traditionalmean-variance portfolio theory.
Resumo:
Background: The human FOXI1 gene codes for a transcription factor involved in the physiology of the inner ear, testis, and kidney. Using three interspecies comparisons, it has been suggested that this may be a gene underhuman-specific selection. We sought to confirm this finding by using an extended set of orthologous sequences.Additionally, we explored for signals of natural selection within humans by sequencing the gene in 20 Europeans,20 East Asians and 20 Yorubas and by analysing SNP variation in a 2 Mb region centered on FOXI1 in 39worldwide human populations from the HGDP-CEPH diversity panel.Results: The genome sequences recently available from other primate and non-primate species showed that FOXI1divergence patterns are compatible with neutral evolution. Sequence-based neutrality tests were not significant inEuropeans, East Asians or Yorubas. However, the Long Range Haplotype (LRH) test, as well as the iHS and XP-Rsbstatistics revealed significantly extended tracks of homozygosity around FOXI1 in Africa, suggesting a recentepisode of positive selection acting on this gene. A functionally relevant SNP, as well as several SNPs either on theputatively selected core haplotypes or with significant iHS or XP-Rsb values, displayed allele frequencies stronglycorrelated with the absolute geographical latitude of the populations sampled.Conclusions: We present evidence for recent positive selection in the FOXI1 gene region in Africa. Climate mightbe related to this recent adaptive event in humans. Of the multiple functions of FOXI1, its role in kidney-mediatedwater-electrolyte homeostasis is the most obvious candidate for explaining a climate-related adaptation.
Resumo:
We propose an edge detector based on the selection of wellcontrasted pieces of level lines, following the proposal ofDesolneux-Moisan-Morel (DMM) [1]. The DMM edge detectorhas the problem of over-representation, that is, everyedge is detected several times in slightly different positions.In this paper we propose two modifications of the originalDMM edge detector in order to solve this problem. The firstmodification is a post-processing of the output using a generalmethod to select the best representative of a bundle of curves.The second modification is the use of Canny’s edge detectorinstead of the norm of the gradient to build the statistics. Thetwo modifications are independent and can be applied separately.Elementary reasoning and some experiments showthat the best results are obtained when both modifications areapplied together.
Resumo:
Interviewing in professional labor markets is a costly process for firms. Moreover, poor screening can have a persistent negative impact on firms bottom lines and candidates careers. In a simple dynamic model where firms can pay a cost to interview applicants who have private information about their own ability, potentially large inefficiencies arise from information-based unemployment, where able workers are rejected by firms because of their lack of offers in previous interviews. This effect may make the market less efficient than random matching. We show that the first best can be achieved using either a mechanism with transfers or one without transfers.
Resumo:
The emphasis on integrated care implies new incentives that promote coordinationbetween levels of care. Considering a population as a whole, the resource allocation systemhas to adapt to this environment. This research is aimed to design a model that allows formorbidity related prospective and concurrent capitation payment. The model can be applied inpublicly funded health systems and managed competition settings.Methods: We analyze the application of hybrid risk adjustment versus either prospective orconcurrent risk adjustment formulae in the context of funding total health expenditures for thepopulation of an integrated healthcare delivery organization in Catalonia during years 2004 and2005.Results: The hybrid model reimburses integrated care organizations avoiding excessive risktransfer and maximizing incentives for efficiency in the provision. At the same time, it eliminatesincentives for risk selection for a specific set of high risk individuals through the use ofconcurrent reimbursement in order to assure a proper classification of patients.Conclusion: Prospective Risk Adjustment is used to transfer the financial risk to the healthprovider and therefore provide incentives for efficiency. Within the context of a National HealthSystem, such transfer of financial risk is illusory, and the government has to cover the deficits.Hybrid risk adjustment is useful to provide the right combination of incentive for efficiency andappropriate level of risk transfer for integrated care organizations.
Resumo:
We continue the development of a method for the selection of a bandwidth or a number of design parameters in density estimation. We provideexplicit non-asymptotic density-free inequalities that relate the $L_1$ error of the selected estimate with that of the best possible estimate,and study in particular the connection between the richness of the classof density estimates and the performance bound. For example, our methodallows one to pick the bandwidth and kernel order in the kernel estimatesimultaneously and still assure that for {\it all densities}, the $L_1$error of the corresponding kernel estimate is not larger than aboutthree times the error of the estimate with the optimal smoothing factor and kernel plus a constant times $\sqrt{\log n/n}$, where $n$ is the sample size, and the constant only depends on the complexity of the family of kernels used in the estimate. Further applications include multivariate kernel estimates, transformed kernel estimates, and variablekernel estimates.
Resumo:
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical {\sc vc} dimension, empirical {\sc vc} entropy, andmargin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.
Resumo:
This work proposes novel network analysis techniques for multivariate time series.We define the network of a multivariate time series as a graph where verticesdenote the components of the process and edges denote non zero long run partialcorrelations. We then introduce a two step LASSO procedure, called NETS, toestimate high dimensional sparse Long Run Partial Correlation networks. This approachis based on a VAR approximation of the process and allows to decomposethe long run linkages into the contribution of the dynamic and contemporaneousdependence relations of the system. The large sample properties of the estimatorare analysed and we establish conditions for consistent selection and estimation ofthe non zero long run partial correlations. The methodology is illustrated with anapplication to a panel of U.S. bluechips.
Resumo:
Monitoring thunderstorms activity is an essential part of operational weather surveillance given their potential hazards, including lightning, hail, heavy rainfall, strong winds or even tornadoes. This study has two main objectives: firstly, the description of a methodology, based on radar and total lightning data to characterise thunderstorms in real-time; secondly, the application of this methodology to 66 thunderstorms that affected Catalonia (NE Spain) in the summer of 2006. An object-oriented tracking procedure is employed, where different observation data types generate four different types of objects (radar 1-km CAPPI reflectivity composites, radar reflectivity volumetric data, cloud-to-ground lightning data and intra-cloud lightning data). In the framework proposed, these objects are the building blocks of a higher level object, the thunderstorm. The methodology is demonstrated with a dataset of thunderstorms whose main characteristics, along the complete life cycle of the convective structures (development, maturity and dissipation), are described statistically. The development and dissipation stages present similar durations in most cases examined. On the contrary, the duration of the maturity phase is much more variable and related to the thunderstorm intensity, defined here in terms of lightning flash rate. Most of the activity of IC and CG flashes is registered in the maturity stage. In the development stage little CG flashes are observed (2% to 5%), while for the dissipation phase is possible to observe a few more CG flashes (10% to 15%). Additionally, a selection of thunderstorms is used to examine general life cycle patterns, obtained from the analysis of normalized (with respect to thunderstorm total duration and maximum value of variables considered) thunderstorm parameters. Among other findings, the study indicates that the normalized duration of the three stages of thunderstorm life cycle is similar in most thunderstorms, with the longest duration corresponding to the maturity stage (approximately 80% of the total time).
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
We have shown that finite-size effects in the correlation functions away from equilibrium may be introduced through dimensionless numbers: the Nusselt numbers, accounting for both the nature of the boundaries and the size of the system. From an analysis based on fluctuating hydrodynamics, we conclude that the mean-square fluctuations satisfy scaling laws, since they depend only on the dimensionless numbers in addition to reduced variables. We focus on the case of diffusion modes and describe some physical situations in which finite-size effects may be relevant.
Resumo:
Complexity of biological function relies on large networks of interacting molecules. However, the evolutionary properties of these networks are not fully understood. It has been shown that selective pressures depend on the position of genes in the network. We have previously shown that in the Drosophila insulin/target of rapamycin (TOR) signal transduction pathway there is a correlation between the pathway position and the strength of purifying selection, with the downstream genes being most constrained. In this study, we investigated the evolutionary dynamics of this well-characterized pathway in vertebrates. More specifically, we determined the impact of natural selection on the evolution of 72 genes of this pathway. We found that in vertebrates there is a similar gradient of selective constraint in the insulin/TOR pathway to that found in Drosophila. This feature is neither the result of a polarity in the impact of positive selection nor of a series of factors affecting selective constraint levels (gene expression level and breadth, codon bias, protein length, and connectivity). We also found that pathway genes encoding physically interacting proteins tend to evolve under similar selective constraints. The results indicate that the architecture of the vertebrate insulin/TOR pathway constrains the molecular evolution of its components. Therefore, the polarity detected in Drosophila is neither specific nor incidental of this genus. Hence, although the underlying biological mechanisms remain unclear, these may be similar in both vertebrates and Drosophila.
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
In October 1998, Hurricane Mitch triggered numerous landslides (mainly debris flows) in Honduras and Nicaragua, resulting in a high death toll and in considerable damage to property. The potential application of relatively simple and affordable spatial prediction models for landslide hazard mapping in developing countries was studied. Our attention was focused on a region in NW Nicaragua, one of the most severely hit places during the Mitch event. A landslide map was obtained at 1:10 000 scale in a Geographic Information System (GIS) environment from the interpretation of aerial photographs and detailed field work. In this map the terrain failure zones were distinguished from the areas within the reach of the mobilized materials. A Digital Elevation Model (DEM) with 20 m×20 m of pixel size was also employed in the study area. A comparative analysis of the terrain failures caused by Hurricane Mitch and a selection of 4 terrain factors extracted from the DEM which, contributed to the terrain instability, was carried out. Land propensity to failure was determined with the aid of a bivariate analysis and GIS tools in a terrain failure susceptibility map. In order to estimate the areas that could be affected by the path or deposition of the mobilized materials, we considered the fact that under intense rainfall events debris flows tend to travel long distances following the maximum slope and merging with the drainage network. Using the TauDEM extension for ArcGIS software we generated automatically flow lines following the maximum slope in the DEM starting from the areas prone to failure in the terrain failure susceptibility map. The areas crossed by the flow lines from each terrain failure susceptibility class correspond to the runout susceptibility classes represented in a runout susceptibility map. The study of terrain failure and runout susceptibility enabled us to obtain a spatial prediction for landslides, which could contribute to landslide risk mitigation.