950 resultados para kernel density method
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
When modeling machines in their natural working environment collisions become a very important feature in terms of simulation accuracy. By expanding the simulation to include the operation environment, the need for a general collision model that is able to handle a wide variety of cases has become central in the development of simulation environments. With the addition of the operating environment the challenges for the collision modeling method also change. More simultaneous contacts with more objects occur in more complicated situations. This means that the real-time requirement becomes more difficult to meet. Common problems in current collision modeling methods include for example dependency on the geometry shape or mesh density, calculation need increasing exponentially in respect to the number of contacts, the lack of a proper friction model and failures due to certain configurations like closed kinematic loops. All these problems mean that the current modeling methods will fail in certain situations. A method that would not fail in any situation is not very realistic but improvements can be made over the current methods.
Resumo:
An axisymmetric supersonic flow of rarefied gas past a finite cylinder was calculated applying the direct simulation Monte Carlo method. The drag force, the coefficients of pressure, of skin friction, and of heat transfer, the fields of density, of temperature, and of velocity were calculated as function of the Reynolds number for a fixed Mach number. The variation of the Reynolds number is related to the variation of the Knudsen number, which characterizes the gas rarefaction. The present results show that all quantities in the transition regime (Knudsen number is about the unity) are significantly different from those in the hydrodynamic regime, when the Knudsen number is small.
Resumo:
Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.
Resumo:
Familial hypercholesterolemia (FH) is a metabolic disorder inherited as an autosomal dominant trait characterized by an increased plasma low-density lipoprotein (LDL) level. The disease is caused by several different mutations in the LDL receptor gene. Although early identification of individuals carrying the defective gene could be useful in reducing the risk of atherosclerosis and myocardial infarction, the techniques available for determining the number of the functional LDL receptor molecules are difficult to carry out and expensive. Polymorphisms associated with this gene may be used for unequivocal diagnosis of FH in several populations. The aim of our study was to evaluate the genotype distribution and relative allele frequencies of three polymorphisms of the LDL receptor gene, HincII1773 (exon 12), AvaII (exon 13) and PvuII (intron 15), in 50 unrelated Brazilian individuals with a diagnosis of heterozygous FH and in 130 normolipidemic controls. Genomic DNA was extracted from blood leukocytes by a modified salting-out method. The polymorphisms were detected by PCR-RFLP. The FH subjects showed a higher frequency of A+A+ (AvaII), H+H+ (HincII1773) and P1P1 (PvuII) homozygous genotypes when compared to the control group (P<0.05). In addition, FH probands presented a high frequency of A+ (0.58), H+ (0.61) and P1 (0.78) alleles when compared to normolipidemic individuals (0.45, 0.45 and 0.64, respectively). The strong association observed between these alleles and FH suggests that AvaII, HincII1773 and PvuII polymorphisms could be useful to monitor the inheritance of FH in Brazilian families.
Resumo:
The aim of this work is to invert the ionospheric electron density profile from Riometer (Relative Ionospheric opacity meter) measurement. The newly Riometer instrument KAIRA (Kilpisjärvi Atmospheric Imaging Receiver Array) is used to measure the cosmic HF radio noise absorption that taking place in the D-region ionosphere between 50 to 90 km. In order to invert the electron density profile synthetic data is used to feed the unknown parameter Neq using spline height method, which works by taking electron density profile at different altitude. Moreover, smoothing prior method also used to sample from the posterior distribution by truncating the prior covariance matrix. The smoothing profile approach makes the problem easier to find the posterior using MCMC (Markov Chain Monte Carlo) method.
Resumo:
Numerical simulation of plasma sources is very important. Such models allows to vary different plasma parameters with high degree of accuracy. Moreover, they allow to conduct measurements not disturbing system balance.Recently, the scientific and practical interest increased in so-called two-chamber plasma sources. In one of them (small or discharge chamber) an external power source is embedded. In that chamber plasma forms. In another (large or diffusion chamber) plasma exists due to the transport of particles and energy through the boundary between chambers.In this particular work two-chamber plasma sources with argon and oxygen as active mediums were onstructed. This models give interesting results in electric field profiles and, as a consequence, in density profiles of charged particles.
Resumo:
There are many opportunities to utilise coconut in Nzema to support farmers. Coconut oil that is mainly used for food preparation in Nzema can be utilized as fuel to support overcoming of the energy crisis in the Ghana. Coconut oil in Nzema is not used in both transportation and electricity generation. A few of the waste husk and shell are mainly used as fuel in homes for heating but greater amount is left to rot or burn the coconut plantation. In addition, some portion of the granulated coconut kernel is sometime used as feed for piggery feed and the rest of the granulated kernel are left as waste on the oil processing site. In this thesis, the author identified alternative utilization of cocoanut, for instance the use of coconut husk and shell for charcoal production, and the use of coconut trunks as construction materials. It is envisaged that exploring these alternatives will not only reduce carbon emission in the country but will also contribute significantly to the sustainability of the local agro-industry.
Resumo:
The effects. of moisture, cation concentration, dens ity , temper~ t ure and grai n si ze on the electrical resistivity of so il s are examined using laboratory prepared soils. An i nexpen si ve method for preparing soils of different compositions was developed by mixing various size fractions i n the laboratory. Moisture and cation c oncentration are related to soil resistivity by powe r functions, whereas soil resistiv ity and temperature, density, Yo gravel, sand , sil t, and clay are related by exponential functions . A total of 1066 cases (8528 data) from all the experiments were used in a step-wise multiple linear r egression to determine the effect of each variable on soil resistivity. Six variables out of the eight variables studied account for 92.57/. of the total variance in so il resistivity with a correlation coefficient of 0.96. The other two variables (silt and gravel) did not increase the · variance. Moisture content was found to be - the most important Yo clay. variable- affecting s oil res istivi ty followed by These two variables account for 90.81Yo of the total variance in soil resistivity with a correlation ~oefficient ·.of 0 . 95. Based on these results an equation to ' ~~ed{ ct soil r esist ivi ty using moisture and Yo clay is developed . To t est the predicted equation, resistivity measurements were made on natural soils both in s i tu a nd i n the laboratory. The data show that field and laboratory measurements are comparable. The predicted regression line c losely coinciqes with resistivity data from area A and area B soils ~clayey and silty~clayey sands). Resistivity data and the predicted regression line in the case of c layey soils (clays> 40%) do not coincide, especially a t l ess than 15% moisture. The regression equation overestimates the resistivity of so i l s from area C and underestimates for area D soils. Laboratory prepared high clay soils give similar trends. The deviations are probably caused by heterogeneous distribution of mo i sture and difference in the type o f cl ays present in these soils.
Resumo:
Many unit root and cointegration tests require an estimate of the spectral density function at frequency zero at some process. Kernel estimators based on weighted sums of autocovariances constructed using estimated residuals from an AR(1) regression are commonly used. However, it is known that with substantially correlated errors, the OLS estimate of the AR(1) parameter is severely biased. in this paper, we first show that this least squares bias induces a significant increase in the bias and mean-squared error of kernel-based estimators.
Resumo:
Ce mémoire porte sur la présentation des estimateurs de Bernstein qui sont des alternatives récentes aux différents estimateurs classiques de fonctions de répartition et de densité. Plus précisément, nous étudions leurs différentes propriétés et les comparons à celles de la fonction de répartition empirique et à celles de l'estimateur par la méthode du noyau. Nous déterminons une expression asymptotique des deux premiers moments de l'estimateur de Bernstein pour la fonction de répartition. Comme pour les estimateurs classiques, nous montrons que cet estimateur vérifie la propriété de Chung-Smirnov sous certaines conditions. Nous montrons ensuite que l'estimateur de Bernstein est meilleur que la fonction de répartition empirique en terme d'erreur quadratique moyenne. En s'intéressant au comportement asymptotique des estimateurs de Bernstein, pour un choix convenable du degré du polynôme, nous montrons que ces estimateurs sont asymptotiquement normaux. Des études numériques sur quelques distributions classiques nous permettent de confirmer que les estimateurs de Bernstein peuvent être préférables aux estimateurs classiques.
Resumo:
We show that, at high densities, fully variational solutions of solidlike types can be obtained from a density functional formalism originally designed for liquid 4He . Motivated by this finding, we propose an extension of the method that accurately describes the solid phase and the freezing transition of liquid 4He at zero temperature. The density profile of the interface between liquid and the (0001) surface of the 4He crystal is also investigated, and its surface energy evaluated. The interfacial tension is found to be in semiquantitative agreement with experiments and with other microscopic calculations. This opens the possibility to use unbiased density functional (DF) methods to study highly nonhomogeneous systems, like 4He interacting with strongly attractive impurities and/or substrates, or the nucleation of the solid phase in the metastable liquid.
Resumo:
LLDPE was blended with poly (vinyl alcohol) and mechanical, thermal, spectroscopic properties and biodegradability were investigated. The biodegradability of LLDPE/PVA blends has been studied in two environments, viz. (1) a culture medium containing Vibrio sp. and (2) a soil environment over a period of 15 weeks. Nanoanatase having photo catalytic activity was synthesized by hydrothermal method using titanium-iso-propoxide. The synthesized TiO2 was characterized by X-Ray diffraction (XRD), BET studies, FTIR studies and scanning electron microscopy (SEM). The crystallite size of titania was calculated to be ≈ 6nm from the XRD results and the surface area was found to be about 310m2/g by BET method. SEM shows that nanoanatase particles prepared by this method are spherical in shape. Linear low density polyethylene films containing polyvinyl alcohol and a pro-oxidant (TiO2 or cobalt stearate with or without vegetable oil) were prepared. The films were then subjected to natural weathering and UV exposure followed by biodegradation in culture medium as well as in soil environment. The degradation was monitored by mechanical property measurements, thermal studies, rate of weight loss, FTIR and SEM studies. Higher weight loss, texture change and greater increments in carbonyl index values were observed in samples containing cobalt stearate and vegetable oil. The present study demonstrates that the combination of LLDPE/PVA blends with (I) nanoanatase/vegetable oil and (ii) cobalt stearate/vegetable oil leads to extensive photodegradation. These samples show substantial degradation when subsequent exposure to Vibrio sp. is made. Thus a combined photodegradation and biodegradation process is a promising step towards obtaining a biodegradable grade of LLDPE.
Resumo:
We investigate adsorption of helium in nanoscopic polygonal pores at zero temperature using a finite-range density functional theory. The adsorption potential is computed by means of a technique denoted as the elementary source method. We analyze a rhombic pore with Cs walls, where we show the existence of multiple interfacial configurations at some linear densities, which correspond to metastable states. Shape transitions and hysterectic loops appear in patterns which are richer and more complex than in a cylindrical tube with the same transverse area.