866 resultados para High-Dimensional Space Geometrical Informatics (HDSGI)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Failure to detect patients at risk of attempting suicide can result in tragic consequences. Identifying risks earlier and more accurately helps prevent serious incidents occurring and is the objective of the GRiST clinical decision support system (CDSS). One of the problems it faces is high variability in the type and quantity of data submitted for patients, who are assessed in multiple contexts along the care pathway. Although GRiST identifies up to 138 patient cues to collect, only about half of them are relevant for any one patient and their roles may not be for risk evaluation but more for risk management. This paper explores the data collection behaviour of clinicians using GRiST to see whether it can elucidate which variables are important for risk evaluations and when. The GRiST CDSS is based on a cognitive model of human expertise manifested by a sophisticated hierarchical knowledge structure or tree. This structure is used by the GRiST interface to provide top-down controlled access to the patient data. Our research explores relationships between the answers given to these higher-level 'branch' questions to see whether they can help direct assessors to the most important data, depending on the patient profile and assessment context. The outcome is a model for dynamic data collection driven by the knowledge hierarchy. It has potential for improving other clinical decision support systems operating in domains with high dimensional data that are only partially collected and in a variety of combinations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper considers the problem of low-dimensional visualisation of very high dimensional information sources for the purpose of situation awareness in the maritime environment. In response to the requirement for human decision support aids to reduce information overload (and specifically, data amenable to inter-point relative similarity measures) appropriate to the below-water maritime domain, we are investigating a preliminary prototype topographic visualisation model. The focus of the current paper is on the mathematical problem of exploiting a relative dissimilarity representation of signals in a visual informatics mapping model, driven by real-world sonar systems. A realistic noise model is explored and incorporated into non-linear and topographic visualisation algorithms building on the approach of [9]. Concepts are illustrated using a real world dataset of 32 hydrophones monitoring a shallow-water environment in which targets are present and dynamic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 62J99.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 30A05, 33E05, 30G30, 30G35, 33E20.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A retificação, processo final de usinagem de uma peça, utiliza fluidos de corte com a finalidade de lubrificação, refrigeração e remoção de cavacos. No entanto, esses fluidos são extremamente agressivos com o meio. Com o avanço tecnológico a tendência mundial é produzir peças cada vez mais sofisticadas, com elevado grau de tolerância geométrica, dimensional, com bom acabamento superficial, com baixo custo e, principalmente, sem causar danos ao meio. Para tanto, ao processo de retificação está intrínseca a reciclagem do fluido de corte, que se destaca pelo seu custo. Através da variação da velocidade de avanço no processo de retificação cilíndrica externa do aço ABNT D6, racionalizando a aplicação de dois fluidos de corte e usando um rebolo superabrasivo de CBN (nitreto de boro cúbico) com ligante vitrificado, avaliaram-se os parâmetros de saída da força tangencial de corte, emissão acústica, rugosidade, circularidade, desgaste da ferramenta, tensão residual e a integridade superficial através da microscopia eletrônica de varredura (MEV) dos corpos-de-prova. Com a análise do desempenho do fluido, do rebolo e da velocidade de mergulho, encontraram-se as melhores condições de usinagem propiciando a diminuição do volume de fluido de corte e a diminuição do tempo de usinagem, sem prejudicar os parâmetros geométricos e dimensionais, o acabamento superficial e a integridade superficial dos componentes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class­ based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state­ of ­the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state­ of­ the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La Geometría del Espacio es una rama de la Matemática que estudia las propieda-des y medidas de figurasque se relacionan con la mayoría de objetos tridimensionales que tenemos a nuestro alrededor; por esto nuestro trabajo titulado “Elaboración de una guía y material didáctico de la Geometría del Espacio para el Laboratorio de Matemática de la carrera de Matemáticas y Física de la Universidad de Cuenca” da lugar a una nueva estrategia que puede utilizar el docente en el proceso de enseñanza-aprendizaje de esta asignatura. En el capítulo uno de nuestro trabajo de graduación se analizan los aspectos generales de la educación así como las corrientes pedagógicas que están presentes dentro del proceso educativo, para luego hablar de la didáctica y la importancia de trabajar con material concreto en el área de Matemática, específicamente en la Geometría del Espacio así como los recursos que son óptimos para trabajar esta asignatura. En el capítulo dos, se demuestra mediante un muestreo no probabilístico que existe dificultad en la comprensión de los contenidos de la Geometría del Espacio y que una buena opción para desarrollar el proceso de enseñanza-aprendizaje sobre esta asignatura es la utilización de material concreto y de una guía didáctica que facilite la comprensión de los contenidos. Por último en el capítulo tres se presenta un conjunto de diez prácticas sobre pla-nos y sólidos, las cuales, siguen los pasos que exige una práctica de laboratorio innovadora, de una manera ordenada y secuencial para reforzar el proceso de enseñanza-aprendizaje.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An unstructured mesh �nite volume discretisation method for simulating di�usion in anisotropic media in two-dimensional space is discussed. This technique is considered as an extension of the fully implicit hybrid control-volume �nite-element method and it retains the local continuity of the ux at the control volume faces. A least squares function recon- struction technique together with a new ux decomposition strategy is used to obtain an accurate ux approximation at the control volume face, ensuring that the overall accuracy of the spatial discretisation maintains second order. This paper highlights that the new technique coincides with the traditional shape function technique when the correction term is neglected and that it signi�cantly increases the accuracy of the previous linear scheme on coarse meshes when applied to media that exhibit very strong to extreme anisotropy ratios. It is concluded that the method can be used on both regular and irregular meshes, and appears independent of the mesh quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis addresses computational challenges arising from Bayesian analysis of complex real-world problems. Many of the models and algorithms designed for such analysis are ‘hybrid’ in nature, in that they are a composition of components for which their individual properties may be easily described but the performance of the model or algorithm as a whole is less well understood. The aim of this research project is to after a better understanding of the performance of hybrid models and algorithms. The goal of this thesis is to analyse the computational aspects of hybrid models and hybrid algorithms in the Bayesian context. The first objective of the research focuses on computational aspects of hybrid models, notably a continuous finite mixture of t-distributions. In the mixture model, an inference of interest is the number of components, as this may relate to both the quality of model fit to data and the computational workload. The analysis of t-mixtures using Markov chain Monte Carlo (MCMC) is described and the model is compared to the Normal case based on the goodness of fit. Through simulation studies, it is demonstrated that the t-mixture model can be more flexible and more parsimonious in terms of number of components, particularly for skewed and heavytailed data. The study also reveals important computational issues associated with the use of t-mixtures, which have not been adequately considered in the literature. The second objective of the research focuses on computational aspects of hybrid algorithms for Bayesian analysis. Two approaches will be considered: a formal comparison of the performance of a range of hybrid algorithms and a theoretical investigation of the performance of one of these algorithms in high dimensions. For the first approach, the delayed rejection algorithm, the pinball sampler, the Metropolis adjusted Langevin algorithm, and the hybrid version of the population Monte Carlo (PMC) algorithm are selected as a set of examples of hybrid algorithms. Statistical literature shows how statistical efficiency is often the only criteria for an efficient algorithm. In this thesis the algorithms are also considered and compared from a more practical perspective. This extends to the study of how individual algorithms contribute to the overall efficiency of hybrid algorithms, and highlights weaknesses that may be introduced by the combination process of these components in a single algorithm. The second approach to considering computational aspects of hybrid algorithms involves an investigation of the performance of the PMC in high dimensions. It is well known that as a model becomes more complex, computation may become increasingly difficult in real time. In particular the importance sampling based algorithms, including the PMC, are known to be unstable in high dimensions. This thesis examines the PMC algorithm in a simplified setting, a single step of the general sampling, and explores a fundamental problem that occurs in applying importance sampling to a high-dimensional problem. The precision of the computed estimate from the simplified setting is measured by the asymptotic variance of the estimate under conditions on the importance function. Additionally, the exponential growth of the asymptotic variance with the dimension is demonstrated and we illustrates that the optimal covariance matrix for the importance function can be estimated in a special case.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The computation of compact and meaningful representations of high dimensional sensor data has recently been addressed through the development of Nonlinear Dimensional Reduction (NLDR) algorithms. The numerical implementation of spectral NLDR techniques typically leads to a symmetric eigenvalue problem that is solved by traditional batch eigensolution algorithms. The application of such algorithms in real-time systems necessitates the development of sequential algorithms that perform feature extraction online. This paper presents an efficient online NLDR scheme, Sequential-Isomap, based on incremental singular value decomposition (SVD) and the Isomap method. Example simulations demonstrate the validity and significant potential of this technique in real-time applications such as autonomous systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Maps are used to represent three-dimensional space and are integral to a range of everyday experiences. They are increasingly used in mathematics, being prominent both in school curricula and as a form of assessing students understanding of mathematics ideas. In order to successfully interpret maps, students need to be able to understand that maps: represent space, have their own perspective and scale, and their own set of symbols and texts. Despite the fact that maps have an increased prevalence in society and school, there is evidence to suggest that students have difficulty interpreting maps. This study investigated 43 primary-aged students’ (aged 9-12 years) verbal and gestural behaviours as they engaged with and solved map tasks. Within a multiliteracies framework that focuses on spatial, visual, linguistic, and gestural elements, the study investigated how students interpret map tasks. Specifically, the study sought to understand students’ skills and approaches used to solving map tasks and the gestural behaviours they utilised as they engaged with map tasks. The investigation was undertaken using the Knowledge Discovery in Data (KDD) design. The design of this study capitalised on existing research data to carry out a more detailed analysis of students’ interpretation of map tasks. Video data from an existing data set was reorganised according to two distinct episodes—Task Solution and Task Explanation—and analysed within the multiliteracies framework. Content Analysis was used with these data and through anticipatory data reduction techniques, patterns of behaviour were identified in relation to each specific map task by looking at task solution, task correctness and gesture use. The findings of this study revealed that students had a relatively sound understanding of general mapping knowledge such as identifying landmarks, using keys, compass points and coordinates. However, their understanding of mathematical concepts pertinent to map tasks including location, direction, and movement were less developed. Successful students were able to interpret the map tasks and apply relevant mathematical understanding to navigate the spatial demands of the map tasks while the unsuccessful students were only able to interpret and understand basic map conventions. In terms of their gesture use, the more difficult the task, the more likely students were to exhibit gestural behaviours to solve the task. The most common form of gestural behaviour was deictic, that is a pointing gesture. Deictic gestures not only aided the students capacity to explain how they solved the map tasks but they were also a tool which assisted them to navigate and monitor their spatial movements when solving the tasks. There were a number of implications for theory, learning and teaching, and test and curriculum design arising from the study. From a theoretical perspective, the findings of the study suggest that gesturing is an important element of multimodal engagement in mapping tasks. In terms of teaching and learning, implications include the need for students to utilise gesturing techniques when first faced with new or novel map tasks. As students become more proficient in solving such tasks, they should be encouraged to move beyond a reliance on such gesture use in order to progress to more sophisticated understandings of map tasks. Additionally, teachers need to provide students with opportunities to interpret and attend to multiple modes of information when interpreting map tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Focusing on the conditions that an optimization problem may comply with, the so-called convergence conditions have been proposed and sequentially a stochastic optimization algorithm named as DSZ algorithm is presented in order to deal with both unconstrained and constrained optimizations. The principle is discussed in the theoretical model of DSZ algorithm, from which we present the practical model of DSZ algorithm. Practical model efficiency is demonstrated by the comparison with the similar algorithms such as Enhanced simulated annealing (ESA), Monte Carlo simulated annealing (MCS), Sniffer Global Optimization (SGO), Directed Tabu Search (DTS), and Genetic Algorithm (GA), using a set of well-known unconstrained and constrained optimization test cases. Meanwhile, further attention goes to the strategies how to optimize the high-dimensional unconstrained problem using DSZ algorithm.