56 resultados para Dimensionality
Resumo:
The adulteration of extra virgin olive oil with other vegetable oils is a certain problem with economic and health consequences. Current official methods have been proved insufficient to detect such adulterations. One of the most concerning and undetectable adulterations with other vegetable oils is the addition of hazelnut oil. The main objective of this work was to develop a novel dimensionality reduction technique able to model oil mixtures as a part of an integrated pattern recognition solution. This final solution attempts to identify hazelnut oil adulterants in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. The proposed Continuous Locality Preserving Projections (CLPP) technique allows the modelling of the continuous nature of the produced in house admixtures as data series instead of discrete points. This methodology has potential to be extended to other mixtures and adulterations of food products. The maintenance of the continuous structure of the data manifold lets the better visualization of this examined classification problem and facilitates a more accurate utilisation of the manifold for detecting the adulterants.
Resumo:
Sparse representation based visual tracking approaches have attracted increasing interests in the community in recent years. The main idea is to linearly represent each target candidate using a set of target and trivial templates while imposing a sparsity constraint onto the representation coefficients. After we obtain the coefficients using L1-norm minimization methods, the candidate with the lowest error, when it is reconstructed using only the target templates and the associated coefficients, is considered as the tracking result. In spite of promising system performance widely reported, it is unclear if the performance of these trackers can be maximised. In addition, computational complexity caused by the dimensionality of the feature space limits these algorithms in real-time applications. In this paper, we propose a real-time visual tracking method based on structurally random projection and weighted least squares techniques. In particular, to enhance the discriminative capability of the tracker, we introduce background templates to the linear representation framework. To handle appearance variations over time, we relax the sparsity constraint using a weighed least squares (WLS) method to obtain the representation coefficients. To further reduce the computational complexity, structurally random projection is used to reduce the dimensionality of the feature space while preserving the pairwise distances between the data points in the feature space. Experimental results show that the proposed approach outperforms several state-of-the-art tracking methods.
Resumo:
This work examines the conformational ensemble involved in β-hairpin folding by means of advanced molecular dynamics simulations and dimensionality reduction. A fully atomistic description of the protein and the surrounding solvent molecules is used, and this complex energy landscape is sampled by means of parallel tempering metadynamics simulations. The ensemble of configurations explored is analyzed using the recently proposed sketch-map algorithm. Further simulations allow us to probe how mutations affect the structures adopted by this protein. We find that many of the configurations adopted by a mutant are the same as those adopted by the wild-type protein. Furthermore, certain mutations destabilize secondary-structure-containing configurations by preventing the formation of hydrogen bonds or by promoting the formation of new intramolecular contacts. Our analysis demonstrates that machine-learning techniques can be used to study the energy landscapes of complex molecules and that the visualizations that are generated in this way provide a natural basis for examining how the stabilities of particular configurations of the molecule are affected by factors such as temperature or structural mutations.
Resumo:
In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.
Resumo:
Dynamic economic load dispatch (DELD) is one of the most important steps in power system operation. Various optimisation algorithms for solving the problem have been developed; however, due to the non-convex characteristics and large dimensionality of the problem, it is necessary to explore new methods to further improve the dispatch results and minimise the costs. This article proposes a hybrid differential evolution (DE) algorithm, namely clonal selection-based differential evolution (CSDE), to solve the problem. CSDE is an artificial intelligence technique that can be applied to complex optimisation problems which are for example nonlinear, large scale, non-convex and discontinuous. This hybrid algorithm combines the clonal selection algorithm (CSA) as the local search technique to update the best individual in the population, which enhances the diversity of the solutions and prevents premature convergence in DE. Furthermore, we investigate four mutation operations which are used in CSA as the hyper-mutation operations. Finally, an efficient solution repair method is designed for DELD to satisfy the complicated equality and inequality constraints of the power system to guarantee the feasibility of the solutions. Two benchmark power systems are used to evaluate the performance of the proposed method. The experimental results show that the proposed CSDE/best/1 approach significantly outperforms nine other variants of CSDE and DE, as well as most other published methods, in terms of the quality of the solution and the convergence characteristics.
Resumo:
New, automated forms of data-analysis are required in order to understand the high-dimensional trajectories that are obtained from molecular dynamics simulations on proteins. Dimensionality reduction algorithms are particularly appealing in this regard as they allow one to construct unbiased, low-dimensional representations of the trajectory using only the information encoded in the trajectory. The downside of this approach is that different sets of coordinates are required for each different chemical systems under study precisely because the coordinates are constructed using information from the trajectory. In this paper we show how one can resolve this problem by using the sketch-map algorithm that we recently proposed to construct a low-dimensional representation of the structures contained in the protein data bank (PDB). We show that the resulting coordinates are as useful for analysing trajectory data as coordinates constructed using landmark configurations taken from the trajectory and that these coordinates can thus be used for understanding protein folding across a range of systems.
Resumo:
Transonic tests in linear cascade wind tunnels can suffer
from significant test section boundary interference effects in pitch. A slotted tailboard has been designed and optimised with an in-house Euler numerical method to reduce such ef- fects. Wind tunnel measurements on an overspeed Mach 1.27 discharge from a Rolls-Royce T2 cascade, featuring strong end-wall shock-induced interference, showed a 77% reduction in the flow pitchwise periodicity error with the optimised tail- board, with respect to the baseline open-jet cascade flow. Two-dimensional Euler predictions were also cross-validated against a three-dimensional Reynolds averaged computation, to explore the three-dimensionality of the discharge
Resumo:
Real-space grids are a powerful alternative for the simulation of electronic systems. One of the main advantages of the approach is the flexibility and simplicity of working directly in real space where the different fields are discretized on a grid, combined with competitive numerical performance and great potential for parallelization. These properties constitute a great advantage at the time of implementing and testing new physical models. Based on our experience with the Octopus code, in this article we discuss how the real-space approach has allowed for the recent development of new ideas for the simulation of electronic systems. Among these applications are approaches to calculate response properties, modeling of photoemission, optimal control of quantum systems, simulation of plasmonic systems, and the exact solution of the Schrödinger equation for low-dimensionality systems.
Resumo:
The application of custom classification techniques and posterior probability modeling (PPM) using Worldview-2 multispectral imagery to archaeological field survey is presented in this paper. Research is focused on the identification of Neolithic felsite stone tool workshops in the North Mavine region of the Shetland Islands in Northern Scotland. Sample data from known workshops surveyed using differential GPS are used alongside known non-sites to train a linear discriminant analysis (LDA) classifier based on a combination of datasets including Worldview-2 bands, band difference ratios (BDR) and topographical derivatives. Principal components analysis is further used to test and reduce dimensionality caused by redundant datasets. Probability models were generated by LDA using principal components and tested with sites identified through geological field survey. Testing shows the prospective ability of this technique and significance between 0.05 and 0.01, and gain statistics between 0.90 and 0.94, higher than those obtained using maximum likelihood and random forest classifiers. Results suggest that this approach is best suited to relatively homogenous site types, and performs better with correlated data sources. Finally, by combining posterior probability models and least-cost analysis, a survey least-cost efficacy model is generated showing the utility of such approaches to archaeological field survey.
Resumo:
The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques.
Resumo:
To maintain the pace of development set by Moore's law, production processes in semiconductor manufacturing are becoming more and more complex. The development of efficient and interpretable anomaly detection systems is fundamental to keeping production costs low. As the dimension of process monitoring data can become extremely high anomaly detection systems are impacted by the curse of dimensionality, hence dimensionality reduction plays an important role. Classical dimensionality reduction approaches, such as Principal Component Analysis, generally involve transformations that seek to maximize the explained variance. In datasets with several clusters of correlated variables the contributions of isolated variables to explained variance may be insignificant, with the result that they may not be included in the reduced data representation. It is then not possible to detect an anomaly if it is only reflected in such isolated variables. In this paper we present a new dimensionality reduction technique that takes account of such isolated variables and demonstrate how it can be used to build an interpretable and robust anomaly detection system for Optical Emission Spectroscopy data.