75 resultados para Dunkl Kernel


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Due to the availability of huge number of web services, finding an appropriate Web service according to the requirements of a service consumer is still a challenge. Moreover, sometimes a single web service is unable to fully satisfy the requirements of the service consumer. In such cases, combinations of multiple inter-related web services can be utilised. This paper proposes a method that first utilises a semantic kernel model to find related services and then models these related Web services as nodes of a graph. An all-pair shortest-path algorithm is applied to find the best compositions of Web services that are semantically related to the service consumer requirement. The recommendation of individual and composite Web services composition for a service request is finally made. Empirical evaluation confirms that the proposed method significantly improves the accuracy of service discovery in comparison to traditional keyword-based discovery methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The commercialization of aerial image processing is highly dependent on the platforms such as UAVs (Unmanned Aerial Vehicles). However, the lack of an automated UAV forced landing site detection system has been identified as one of the main impediments to allow UAV flight over populated areas in civilian airspace. This article proposes a UAV forced landing site detection system that is based on machine learning approaches including the Gaussian Mixture Model and the Support Vector Machine. A range of learning parameters are analysed including the number of Guassian mixtures, support vector kernels including linear, radial basis function Kernel (RBF) and polynormial kernel (poly), and the order of RBF kernel and polynormial kernel. Moreover, a modified footprint operator is employed during feature extraction to better describe the geometric characteristics of the local area surrounding a pixel. The performance of the presented system is compared to a baseline UAV forced landing site detection system which uses edge features and an Artificial Neural Network (ANN) region type classifier. Experiments conducted on aerial image datasets captured over typical urban environments reveal improved landing site detection can be achieved with an SVM classifier with an RBF kernel using a combination of colour and texture features. Compared to the baseline system, the proposed system provides significant improvement in term of the chance to detect a safe landing area, and the performance is more stable than the baseline in the presence of changes to the UAV altitude.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Imaging genetics is a new field of neuroscience that blends methods from computational anatomy and quantitative genetics to identify genetic influences on brain structure and function. Here we analyzed brain MRI data from 372 young adult twins to identify cortical regions in which gray matter volume is influenced by genetic differences across subjects. Thickness maps, reconstructed from surface models of the cortical gray/white and gray/CSF interfaces, were smoothed with a 25 mm FWHM kernel and automatically parcellated into 34 regions of interest per hemisphere. In structural equation models fitted to volume values at each surface vertex, we computed components of variance due to additive genetic (A), shared (C) and unique (E) environmental factors, and tested their significance. Cortical regions in the vicinity of the perisylvian language cortex, and at the frontal and temporal poles, showed significant additive genetic variance, suggesting that volume measures from these regions may provide quantitative phenotypes to narrow the search for quantitative trait loci that influence brain structure.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Consider a general regression model with an arbitrary and unknown link function and a stochastic selection variable that determines whether the outcome variable is observable or missing. The paper proposes U-statistics that are based on kernel functions as estimators for the directions of the parameter vectors in the link function and the selection equation, and shows that these estimators are consistent and asymptotically normal.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A 'pseudo-Bayesian' interpretation of standard errors yields a natural induced smoothing of statistical estimating functions. When applied to rank estimation, the lack of smoothness which prevents standard error estimation is remedied. Efficiency and robustness are preserved, while the smoothed estimation has excellent computational properties. In particular, convergence of the iterative equation for standard error is fast, and standard error calculation becomes asymptotically a one-step procedure. This property also extends to covariance matrix calculation for rank estimates in multi-parameter problems. Examples, and some simple explanations, are given.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Historically, determining the country of origin of a published work presented few challenges, because works were generally published physically – whether in print or otherwise – in a distinct location or few locations. However, publishing opportunities presented by new technologies mean that we now live in a world of simultaneous publication – works that are first published online are published simultaneously to every country in world in which there is Internet connectivity. While this is certainly advantageous for the dissemination and impact of information and creative works, it creates potential complications under the Berne Convention for the Protection of Literary and Artistic Works (“Berne Convention”), an international intellectual property agreement to which most countries in the world now subscribe. Under the Berne Convention’s national treatment provisions, rights accorded to foreign copyright works may not be subject to any formality, such as registration requirements (although member countries are free to impose formalities in relation to domestic copyright works). In Kernel Records Oy v. Timothy Mosley p/k/a Timbaland, et al. however, the Florida Southern District Court of the United States ruled that first publication of a work on the Internet via an Australian website constituted “simultaneous publication all over the world,” and therefore rendered the work a “United States work” under the definition in section 101 of the U.S. Copyright Act, subjecting the work to registration formality under section 411. This ruling is in sharp contrast with an earlier decision delivered by the Delaware District Court in Håkan Moberg v. 33T LLC, et al. which arrived at an opposite conclusion. The conflicting rulings of the U.S. courts reveal the problems posed by new forms of publishing online and demonstrate a compelling need for further harmonization between the Berne Convention, domestic laws and the practical realities of digital publishing. In this chapter, we argue that even if a work first published online can be considered to be simultaneously published all over the world it does not follow that any country can assert itself as the “country of origin” of the work for the purpose of imposing domestic copyright formalities. More specifically, we argue that the meaning of “United States work” under the U.S. Copyright Act should be interpreted in line with the presumption against extraterritorial application of domestic law to limit its application to only those works with a real and substantial connection to the United States. There are gaps in the Berne Convention’s articulation of “country of origin” which provide scope for judicial interpretation, at a national level, of the most pragmatic way forward in reconciling the goals of the Berne Convention with the practical requirements of domestic law. We believe that the uncertainties arising under the Berne Convention created by new forms of online publishing can be resolved at a national level by the sensible application of principles of statutory interpretation by the courts. While at the international level we may need a clearer consensus on what amounts to “simultaneous publication” in the digital age, state practice may mean that we do not yet need to explore textual changes to the Berne Convention.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Images from cell biology experiments often indicate the presence of cell clustering, which can provide insight into the mechanisms driving the collective cell behaviour. Pair-correlation functions provide quantitative information about the presence, or absence, of clustering in a spatial distribution of cells. This is because the pair-correlation function describes the ratio of the abundance of pairs of cells, separated by a particular distance, relative to a randomly distributed reference population. Pair-correlation functions are often presented as a kernel density estimate where the frequency of pairs of objects are grouped using a particular bandwidth (or bin width), Δ>0. The choice of bandwidth has a dramatic impact: choosing Δ too large produces a pair-correlation function that contains insufficient information, whereas choosing Δ too small produces a pair-correlation signal dominated by fluctuations. Presently, there is little guidance available regarding how to make an objective choice of Δ. We present a new technique to choose Δ by analysing the power spectrum of the discrete Fourier transform of the pair-correlation function. Using synthetic simulation data, we confirm that our approach allows us to objectively choose Δ such that the appropriately binned pair-correlation function captures known features in uniform and clustered synthetic images. We also apply our technique to images from two different cell biology assays. The first assay corresponds to an approximately uniform distribution of cells, while the second assay involves a time series of images of a cell population which forms aggregates over time. The appropriately binned pair-correlation function allows us to make quantitative inferences about the average aggregate size, as well as quantifying how the average aggregate size changes with time.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Terrain traversability estimation is a fundamental requirement to ensure the safety of autonomous planetary rovers and their ability to conduct long-term missions. This paper addresses two fundamental challenges for terrain traversability estimation techniques. First, representations of terrain data, which are typically built by the rover’s onboard exteroceptive sensors, are often incomplete due to occlusions and sensor limitations. Second, during terrain traversal, the rover-terrain interaction can cause terrain deformation, which may significantly alter the difficulty of traversal. We propose a novel approach built on Gaussian process (GP) regression to learn, and consequently to predict, the rover’s attitude and chassis configuration on unstructured terrain using terrain geometry information only. First, given incomplete terrain data, we make an initial prediction under the assumption that the terrain is rigid, using a learnt kernel function. Then, we refine this initial estimate to account for the effects of potential terrain deformation, using a near-to-far learning approach based on multitask GP regression. We present an extensive experimental validation of the proposed approach on terrain that is mostly rocky and whose geometry changes as a result of loads from rover traversals. This demonstrates the ability of the proposed approach to accurately predict the rover’s attitude and configuration in partially occluded and deformable terrain.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The phosphine distribution in a cylindrical silo containing grain is predicted. A three-dimensional mathematical model, which accounts for multicomponent gas phase transport and the sorption of phosphine into the grain kernel is developed. In addition, a simple model is presented to describe the death of insects within the grain as a function of their exposure to phosphine gas. The proposed model is solved using the commercially available computational fluid dynamics (CFD) software, FLUENT, together with our own C code to customize the solver in order to incorporate the models for sorption and insect extinction. Two types of fumigation delivery are studied, namely, fan- forced from the base of the silo and tablet from the top of the silo. An analysis of the predicted phosphine distribution shows that during fan forced fumigation, the position of the leaky area is very important to the development of the gas flow field and the phosphine distribution in the silo. If the leak is in the lower section of the silo, insects that exist near the top of the silo may not be eradicated. However, the position of a leak does not affect phosphine distribution during tablet fumigation. For such fumigation in a typical silo configuration, phosphine concentrations remain low near the base of the silo. Furthermore, we find that half-life pressure test readings are not an indicator of phosphine distribution during tablet fumigation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of unsupervised anomaly detection arises in a wide variety of practical applications. While one-class support vector machines have demonstrated their effectiveness as an anomaly detection technique, their ability to model large datasets is limited due to their memory and time complexity for training. To address this issue for supervised learning of kernel machines, there has been growing interest in random projection methods as an alternative to the computationally expensive problems of kernel matrix construction and sup-port vector optimisation. In this paper we leverage the theory of nonlinear random projections and propose the Randomised One-class SVM (R1SVM), which is an efficient and scalable anomaly detection technique that can be trained on large-scale datasets. Our empirical analysis on several real-life and synthetic datasets shows that our randomised 1SVM algorithm achieves comparable or better accuracy to deep auto encoder and traditional kernelised approaches for anomaly detection, while being approximately 100 times faster in training and testing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

State-of-the-art image-set matching techniques typically implicitly model each image-set with a Gaussian distribution. Here, we propose to go beyond these representations and model image-sets as probability distribution functions (PDFs) using kernel density estimators. To compare and match image-sets, we exploit Csiszar´ f-divergences, which bear strong connections to the geodesic distance defined on the space of PDFs, i.e., the statistical manifold. Furthermore, we introduce valid positive definite kernels on the statistical manifold, which let us make use of more powerful classification schemes to match image-sets. Finally, we introduce a supervised dimensionality reduction technique that learns a latent space where f-divergences reflect the class labels of the data. Our experiments on diverse problems, such as video-based face recognition and dynamic texture classification, evidence the benefits of our approach over the state-of-the-art image-set matching methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many conventional statistical machine learning al- gorithms generalise poorly if distribution bias ex- ists in the datasets. For example, distribution bias arises in the context of domain generalisation, where knowledge acquired from multiple source domains need to be used in a previously unseen target domains. We propose Elliptical Summary Randomisation (ESRand), an efficient domain generalisation approach that comprises of a randomised kernel and elliptical data summarisation. ESRand learns a domain interdependent projection to a la- tent subspace that minimises the existing biases to the data while maintaining the functional relationship between domains. In the latent subspace, ellipsoidal summaries replace the samples to enhance the generalisation by further removing bias and noise in the data. Moreover, the summarisation enables large-scale data processing by significantly reducing the size of the data. Through comprehensive analysis, we show that our subspace-based approach outperforms state-of-the-art results on several activity recognition benchmark datasets, while keeping the computational complexity significantly low.