933 resultados para high-dimensional space geometry


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis applies a hierarchical latent trait model system to a large quantity of data. The motivation for it was lack of viable approaches to analyse High Throughput Screening datasets which maybe include thousands of data points with high dimensions. High Throughput Screening (HTS) is an important tool in the pharmaceutical industry for discovering leads which can be optimised and further developed into candidate drugs. Since the development of new robotic technologies, the ability to test the activities of compounds has considerably increased in recent years. Traditional methods, looking at tables and graphical plots for analysing relationships between measured activities and the structure of compounds, have not been feasible when facing a large HTS dataset. Instead, data visualisation provides a method for analysing such large datasets, especially with high dimensions. So far, a few visualisation techniques for drug design have been developed, but most of them just cope with several properties of compounds at one time. We believe that a latent variable model (LTM) with a non-linear mapping from the latent space to the data space is a preferred choice for visualising a complex high-dimensional data set. As a type of latent variable model, the latent trait model can deal with either continuous data or discrete data, which makes it particularly useful in this domain. In addition, with the aid of differential geometry, we can imagine the distribution of data from magnification factor and curvature plots. Rather than obtaining the useful information just from a single plot, a hierarchical LTM arranges a set of LTMs and their corresponding plots in a tree structure. We model the whole data set with a LTM at the top level, which is broken down into clusters at deeper levels of t.he hierarchy. In this manner, the refined visualisation plots can be displayed in deeper levels and sub-clusters may be found. Hierarchy of LTMs is trained using expectation-maximisation (EM) algorithm to maximise its likelihood with respect to the data sample. Training proceeds interactively in a recursive fashion (top-down). The user subjectively identifies interesting regions on the visualisation plot that they would like to model in a greater detail. At each stage of hierarchical LTM construction, the EM algorithm alternates between the E- and M-step. Another problem that can occur when visualising a large data set is that there may be significant overlaps of data clusters. It is very difficult for the user to judge where centres of regions of interest should be put. We address this problem by employing the minimum message length technique, which can help the user to decide the optimal structure of the model. In this thesis we also demonstrate the applicability of the hierarchy of latent trait models in the field of document data mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Владимир Тодоров, Петър Стоев - Тази бележка съдържа елементарна конструкция на множество с указаните в заглавието свойства. Да отбележим в допълнение, че така полученото множество остава напълно несвързано дори и след като се допълни с краен брой елементи.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a test for identifying clusters in high dimensional data based on the k-means algorithm when the null hypothesis is spherical normal. We show that projection techniques used for evaluating validity of clusters may be misleading for such data. In particular, we demonstrate that increasingly well-separated clusters are identified as the dimensionality increases, when no such clusters exist. Furthermore, in a case of true bimodality, increasing the dimensionality makes identifying the correct clusters more difficult. In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Principal component analysis (PCA) is well recognized in dimensionality reduction, and kernel PCA (KPCA) has also been proposed in statistical data analysis. However, KPCA fails to detect the nonlinear structure of data well when outliers exist. To reduce this problem, this paper presents a novel algorithm, named iterative robust KPCA (IRKPCA). IRKPCA works well in dealing with outliers, and can be carried out in an iterative manner, which makes it suitable to process incremental input data. As in the traditional robust PCA (RPCA), a binary field is employed for characterizing the outlier process, and the optimization problem is formulated as maximizing marginal distribution of a Gibbs distribution. In this paper, this optimization problem is solved by stochastic gradient descent techniques. In IRKPCA, the outlier process is in a high-dimensional feature space, and therefore kernel trick is used. IRKPCA can be regarded as a kernelized version of RPCA and a robust form of kernel Hebbian algorithm. Experimental results on synthetic data demonstrate the effectiveness of IRKPCA. © 2010 Taylor & Francis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Subspaces and manifolds are two powerful models for high dimensional signals. Subspaces model linear correlation and are a good fit to signals generated by physical systems, such as frontal images of human faces and multiple sources impinging at an antenna array. Manifolds model sources that are not linearly correlated, but where signals are determined by a small number of parameters. Examples are images of human faces under different poses or expressions, and handwritten digits with varying styles. However, there will always be some degree of model mismatch between the subspace or manifold model and the true statistics of the source. This dissertation exploits subspace and manifold models as prior information in various signal processing and machine learning tasks.

A near-low-rank Gaussian mixture model measures proximity to a union of linear or affine subspaces. This simple model can effectively capture the signal distribution when each class is near a subspace. This dissertation studies how the pairwise geometry between these subspaces affects classification performance. When model mismatch is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the model mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. This linear transformation, termed TRAIT, also preserves some specific features in each class, being complementary to a recently developed Low Rank Transform (LRT). Moreover, when the model mismatch is more significant, TRAIT shows superior performance compared to LRT.

The manifold model enforces a constraint on the freedom of data variation. Learning features that are robust to data variation is very important, especially when the size of the training set is small. A learning machine with large numbers of parameters, e.g., deep neural network, can well describe a very complicated data distribution. However, it is also more likely to be sensitive to small perturbations of the data, and to suffer from suffer from degraded performance when generalizing to unseen (test) data.

From the perspective of complexity of function classes, such a learning machine has a huge capacity (complexity), which tends to overfit. The manifold model provides us with a way of regularizing the learning machine, so as to reduce the generalization error, therefore mitigate overfiting. Two different overfiting-preventing approaches are proposed, one from the perspective of data variation, the other from capacity/complexity control. In the first approach, the learning machine is encouraged to make decisions that vary smoothly for data points in local neighborhoods on the manifold. In the second approach, a graph adjacency matrix is derived for the manifold, and the learned features are encouraged to be aligned with the principal components of this adjacency matrix. Experimental results on benchmark datasets are demonstrated, showing an obvious advantage of the proposed approaches when the training set is small.

Stochastic optimization makes it possible to track a slowly varying subspace underlying streaming data. By approximating local neighborhoods using affine subspaces, a slowly varying manifold can be efficiently tracked as well, even with corrupted and noisy data. The more the local neighborhoods, the better the approximation, but the higher the computational complexity. A multiscale approximation scheme is proposed, where the local approximating subspaces are organized in a tree structure. Splitting and merging of the tree nodes then allows efficient control of the number of neighbourhoods. Deviation (of each datum) from the learned model is estimated, yielding a series of statistics for anomaly detection. This framework extends the classical {\em changepoint detection} technique, which only works for one dimensional signals. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Human activities represent a significant burden on the global water cycle, with large and increasing demands placed on limited water resources by manufacturing, energy production and domestic water use. In addition to changing the quantity of available water resources, human activities lead to changes in water quality by introducing a large and often poorly-characterized array of chemical pollutants, which may negatively impact biodiversity in aquatic ecosystems, leading to impairment of valuable ecosystem functions and services. Domestic and industrial wastewaters represent a significant source of pollution to the aquatic environment due to inadequate or incomplete removal of chemicals introduced into waters by human activities. Currently, incomplete chemical characterization of treated wastewaters limits comprehensive risk assessment of this ubiquitous impact to water. In particular, a significant fraction of the organic chemical composition of treated industrial and domestic wastewaters remains uncharacterized at the molecular level. Efforts aimed at reducing the impacts of water pollution on aquatic ecosystems critically require knowledge of the composition of wastewaters to develop interventions capable of protecting our precious natural water resources.

The goal of this dissertation was to develop a robust, extensible and high-throughput framework for the comprehensive characterization of organic micropollutants in wastewaters by high-resolution accurate-mass mass spectrometry. High-resolution mass spectrometry provides the most powerful analytical technique available for assessing the occurrence and fate of organic pollutants in the water cycle. However, significant limitations in data processing, analysis and interpretation have limited this technique in achieving comprehensive characterization of organic pollutants occurring in natural and built environments. My work aimed to address these challenges by development of automated workflows for the structural characterization of organic pollutants in wastewater and wastewater impacted environments by high-resolution mass spectrometry, and to apply these methods in combination with novel data handling routines to conduct detailed fate studies of wastewater-derived organic micropollutants in the aquatic environment.

In Chapter 2, chemoinformatic tools were implemented along with novel non-targeted mass spectrometric analytical methods to characterize, map, and explore an environmentally-relevant “chemical space” in municipal wastewater. This was accomplished by characterizing the molecular composition of known wastewater-derived organic pollutants and substances that are prioritized as potential wastewater contaminants, using these databases to evaluate the pollutant-likeness of structures postulated for unknown organic compounds that I detected in wastewater extracts using high-resolution mass spectrometry approaches. Results showed that application of multiple computational mass spectrometric tools to structural elucidation of unknown organic pollutants arising in wastewaters improved the efficiency and veracity of screening approaches based on high-resolution mass spectrometry. Furthermore, structural similarity searching was essential for prioritizing substances sharing structural features with known organic pollutants or industrial and consumer chemicals that could enter the environment through use or disposal.

I then applied this comprehensive methodological and computational non-targeted analysis workflow to micropollutant fate analysis in domestic wastewaters (Chapter 3), surface waters impacted by water reuse activities (Chapter 4) and effluents of wastewater treatment facilities receiving wastewater from oil and gas extraction activities (Chapter 5). In Chapter 3, I showed that application of chemometric tools aided in the prioritization of non-targeted compounds arising at various stages of conventional wastewater treatment by partitioning high dimensional data into rational chemical categories based on knowledge of organic chemical fate processes, resulting in the classification of organic micropollutants based on their occurrence and/or removal during treatment. Similarly, in Chapter 4, high-resolution sampling and broad-spectrum targeted and non-targeted chemical analysis were applied to assess the occurrence and fate of organic micropollutants in a water reuse application, wherein reclaimed wastewater was applied for irrigation of turf grass. Results showed that organic micropollutant composition of surface waters receiving runoff from wastewater irrigated areas appeared to be minimally impacted by wastewater-derived organic micropollutants. Finally, Chapter 5 presents results of the comprehensive organic chemical composition of oil and gas wastewaters treated for surface water discharge. Concurrent analysis of effluent samples by complementary, broad-spectrum analytical techniques, revealed that low-levels of hydrophobic organic contaminants, but elevated concentrations of polymeric surfactants, which may effect the fate and analysis of contaminants of concern in oil and gas wastewaters.

Taken together, my work represents significant progress in the characterization of polar organic chemical pollutants associated with wastewater-impacted environments by high-resolution mass spectrometry. Application of these comprehensive methods to examine micropollutant fate processes in wastewater treatment systems, water reuse environments, and water applications in oil/gas exploration yielded new insights into the factors that influence transport, transformation, and persistence of organic micropollutants in these systems across an unprecedented breadth of chemical space.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, new computers generation provides a high performance that enables to build computationally expensive computer vision applications applied to mobile robotics. Building a map of the environment is a common task of a robot and is an essential part to allow the robots to move through these environments. Traditionally, mobile robots used a combination of several sensors from different technologies. Lasers, sonars and contact sensors have been typically used in any mobile robotic architecture, however color cameras are an important sensor due to we want the robots to use the same information that humans to sense and move through the different environments. Color cameras are cheap and flexible but a lot of work need to be done to give robots enough visual understanding of the scenes. Computer vision algorithms are computational complex problems but nowadays robots have access to different and powerful architectures that can be used for mobile robotics purposes. The advent of low-cost RGB-D sensors like Microsoft Kinect which provide 3D colored point clouds at high frame rates made the computer vision even more relevant in the mobile robotics field. The combination of visual and 3D data allows the systems to use both computer vision and 3D processing and therefore to be aware of more details of the surrounding environment. The research described in this thesis was motivated by the need of scene mapping. Being aware of the surrounding environment is a key feature in many mobile robotics applications from simple robotic navigation to complex surveillance applications. In addition, the acquisition of a 3D model of the scenes is useful in many areas as video games scene modeling where well-known places are reconstructed and added to game systems or advertising where once you get the 3D model of one room the system can add furniture pieces using augmented reality techniques. In this thesis we perform an experimental study of the state-of-the-art registration methods to find which one fits better to our scene mapping purposes. Different methods are tested and analyzed on different scene distributions of visual and geometry appearance. In addition, this thesis proposes two methods for 3d data compression and representation of 3D maps. Our 3D representation proposal is based on the use of Growing Neural Gas (GNG) method. This Self-Organizing Maps (SOMs) has been successfully used for clustering, pattern recognition and topology representation of various kind of data. Until now, Self-Organizing Maps have been primarily computed offline and their application in 3D data has mainly focused on free noise models without considering time constraints. Self-organising neural models have the ability to provide a good representation of the input space. In particular, the Growing Neural Gas (GNG) is a suitable model because of its flexibility, rapid adaptation and excellent quality of representation. However, this type of learning is time consuming, specially for high-dimensional input data. Since real applications often work under time constraints, it is necessary to adapt the learning process in order to complete it in a predefined time. This thesis proposes a hardware implementation leveraging the computing power of modern GPUs which takes advantage of a new paradigm coined as General-Purpose Computing on Graphics Processing Units (GPGPU). Our proposed geometrical 3D compression method seeks to reduce the 3D information using plane detection as basic structure to compress the data. This is due to our target environments are man-made and therefore there are a lot of points that belong to a plane surface. Our proposed method is able to get good compression results in those man-made scenarios. The detected and compressed planes can be also used in other applications as surface reconstruction or plane-based registration algorithms. Finally, we have also demonstrated the goodness of the GPU technologies getting a high performance implementation of a CAD/CAM common technique called Virtual Digitizing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The utilization of wood from reforested species by the furniture industry is a recent trend. Thus, the present study determined the specific gravity and shrinkage of wood of 18-year-old Eucalyptus grandis, Eucalyptus dunnii and Eucalyptus urophylla, for use as components in solid wood furniture making. The tests to evaluate the specific gravity and shrinkage of wood in the radial and axial variation of the eucalyptus trees were performed according to NBR 7190/96. The results of the analysis of wood from eucalypt species were subjected to the Homogeneity Test, ANOVA, Tukey and Pearson correlation and compared to the performance of sucupira wood (Bowdichia nitida) and cumaru wood (Dipteryx odorata), often used in the furniture industry. The following results were found: Eucalyptus grandis had a lower value of shrinkage, being more suitable for furniture components that require high dimensional stability, as well as parts of larger surface. The wood of this species showed a rate of dimensional variation compatible with the native species used in the furniture industry. The radial variation of the wood was also verified, and a high correlation between specific gravity and shrinkage was found. Longitudinally, the base of the trunk of the eucalyptus trees was shown to be the region of greatest dimensional stability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dictated by the string theory and various higher dimensional scenarios, black holes in D > 4-dimensional space-times must have higher curvature corrections. The first and dominant term is quadratic in curvature, and called the Gauss-Bonnet (GB) term. We shall show that although the Gauss-Bonnet correction changes black hole's geometry only softly, the emission of gravitons is suppressed by many orders even at quite small values of the GB coupling. The huge suppression of the graviton emission is due to the multiplication of the two effects: the quick cooling of the black hole when one turns on the GB coupling and the exponential decreasing of the gray-body factor of the tensor type of gravitons at small and moderate energies. At higher D the tensor gravitons emission is dominant, so that the overall lifetime of black holes with Gauss-Bonnet corrections is many orders larger than was expected. This effect should be relevant for the future experiments at the Large Hadron Collider (LHC).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate stability of the D-dimensional Reissner-Nordstrom-anti-de Sitter metrics as solutions of the Einstein-Maxwell equations. We have shown that asymptotically anti-de Sitter (AdS) black holes are dynamically stable for all values of charge and anti-de Sitter radius in D=5,6...11 dimensional space-times. This does not contradict dynamical instability of RNAdS black holes found by Gubser in N=8 gauged supergravity, because the latter instability comes from the tachyon mode of the scalar field, coupled to the system. Asymptotically AdS black holes are known to be thermodynamically unstable for some region of parameters, yet, as we have shown here, they are stable against gravitational perturbations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses robust model-order reduction of a high dimensional nonlinear partial differential equation (PDE) model of a complex biological process. Based on a nonlinear, distributed parameter model of the same process which was validated against experimental data of an existing, pilot-scale BNR activated sludge plant, we developed a state-space model with 154 state variables in this work. A general algorithm for robustly reducing the nonlinear PDE model is presented and based on an investigation of five state-of-the-art model-order reduction techniques, we are able to reduce the original model to a model with only 30 states without incurring pronounced modelling errors. The Singular perturbation approximation balanced truncating technique is found to give the lowest modelling errors in low frequency ranges and hence is deemed most suitable for controller design and other real-time applications. (C) 2002 Elsevier Science Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A navegação de veículos autónomos em ambientes não estruturados continua a ser um problema em aberto. A complexidade do mundo real ainda é um desafio. A difícil caracterização do relevo irregular, dos objectos dinâmicos e pouco distintos(e a inexistência de referências de localização) tem sido alvo de estudo e do desenvolvimento de vários métodos que permitam de uma forma eficiente, e em tempo real, modelizar o espaço tridimensional. O trabalho realizado ao longo desta dissertação insere-se na estratégia do Laboratório de Sistemas Autónomos (LSA) na pesquisa e desenvolvimento de sistemas sensoriais que possibilitem o aumento da capacidade de percepção das plataformas robóticas. O desenvolvimento de um sistema de modelização tridimensional visa acrescentar aos projectos LINCE (Land INtelligent Cooperative Explorer) e TIGRE (Terrestrial Intelligent General proposed Robot Explorer) maior autonomia e capacidade de exploração e mapeamento. Apresentamos alguns sensores utilizados para a aquisição de modelos tridimensionais, bem como alguns dos métodos mais utilizados para o processo de mapeamento, e a sua aplicação em plataformas robóticas. Ao longo desta dissertação são apresentadas e validadas técnicas que permitem a obtenção de modelos tridimensionais. É abordado o problema de analisar a cor e geometria dos objectos, e da criação de modelos realistas que os representam. Desenvolvemos um sistema que nos permite a obtenção de dados volumétricos tridimensionais, a partir de múltiplas leituras de um Laser Range Finder bidimensional de médio alcance. Aos conjuntos de dados resultantes associamos numa nuvem de pontos coerente e referenciada. Foram desenvolvidas e implementadas técnicas de segmentação que permitem inspeccionar uma nuvem de pontos e classifica-la quanto às suas características geométricas, bem como ao tipo de estruturas que representem. São apresentadas algumas técnicas para a criação de Mapas de Elevação Digital, tendo sido desenvolvida um novo método que tira partido da segmentação efectuada

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper formulates a genetic algorithm that evolves two types of objects in a plane. The fitness function promotes a relationship between the objects that is optimal when some kind of interface between them occurs. Furthermore, the algorithm adopts an hexagonal tessellation of the two-dimensional space for promoting an efficient method of the neighbour modelling. The genetic algorithm produces special patterns with resemblances to those revealed in percolation phenomena or in the symbiosis found in lichens. Besides the analysis of the spacial layout, a modelling of the time evolution is performed by adopting a distance measure and the modelling in the Fourier domain in the perspective of fractional calculus. The results reveal a consistent, and easy to interpret, set of model parameters for distinct operating conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper a complex-order van der Pol oscillator is considered. The complex derivative Dα±ȷβ , with α,β∈R + is a generalization of the concept of integer derivative, where α=1, β=0. By applying the concept of complex derivative, we obtain a high-dimensional parameter space. Amplitude and period values of the periodic solutions of the two versions of the complex-order van der Pol oscillator are studied for variation of these parameters. Fourier transforms of the periodic solutions of the two oscillators are also analyzed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Ciência e Sistemas de Informação Geográfica