850 resultados para High-dimensional data visualization
Resumo:
Increasingly, it is recognized that new automated forms of analysis are required to understand the high-dimensional output obtained from atomistic simulations. Recently, we introduced a new dimensionality reduction algorithm, sketch-map, that was designed specifically to work with data from molecular dynamics trajectories. In what follows, we provide more details on how this algorithm works and on how to set its parameters. We also test it on two well-studied Lennard-Jones clusters and show that the coordinates we extract using this algorithm are extremely robust. In particular, we demonstrate that the coordinates constructed for one particular Lennard-Jones cluster can be used to describe the configurations adopted by a second, different cluster and even to tell apart different phases of bulk Lennard-Jonesium.
Resumo:
A new scheme, sketch-map, for obtaining a low-dimensional representation of the region of phase space explored during an enhanced dynamics simulation is proposed. We show evidence, from an examination of the distribution of pairwise distances between frames, that some features of the free-energy surface are inherently high-dimensional. This makes dimensionality reduction problematic because the data does not satisfy the assumptions made in conventional manifold learning algorithms We therefore propose that when dimensionality reduction is performed on trajectory data one should think of the resultant embedding as a quickly sketched set of directions rather than a road map. In other words, the embedding tells one about the connectivity between states but does not provide the vectors that correspond to the slow degrees of freedom. This realization informs the development of sketch-map, which endeavors to reproduce the proximity information from the high-dimensionality description in a space of lower dimensionality even when a faithful embedding is not possible.
Resumo:
The requirement to provide multimedia services with QoS support in mobile networks has led to standardization and deployment of high speed data access technologies such as the High Speed Downlink Packet Access (HSDPA) system. HSDPA improves downlink packet data and multimedia services support in WCDMA-based cellular networks. As is the trend in emerging wireless access technologies, HSDPA supports end-user multi-class sessions comprising parallel flows with diverse Quality of Service (QoS) requirements, such as real-time (RT) voice or video streaming concurrent with non real-time (NRT) data service being transmitted to the same user, with differentiated queuing at the radio link interface. Hence, in this paper we present and evaluate novel radio link buffer management schemes for QoS control of multimedia traffic comprising concurrent RT and NRT flows in the same HSDPA end-user session. The new buffer management schemes—Enhanced Time Space Priority (E-TSP) and Dynamic Time Space Priority (D-TSP)—are designed to improve radio link and network resource utilization as well as optimize end-to-end QoS performance of both RT and NRT flows in the end-user session. Both schemes are based on a Time-Space Priority (TSP) queuing system, which provides joint delay and loss differentiation between the flows by queuing (partially) loss tolerant RT flow packets for higher transmission priority but with restricted access to the buffer space, whilst allowing unlimited access to the buffer space for delay-tolerant NRT flow but with queuing for lower transmission priority. Experiments by means of extensive system-level HSDPA simulations demonstrates that with the proposed TSP-based radio link buffer management schemes, significant end-to-end QoS performance gains accrue to end-user traffic with simultaneous RT and NRT flows, in addition to improved resource utilization in the radio access network.
Resumo:
High-quality data from appropriate archives are needed for the continuing improvement of radiocarbon calibration curves. We discuss here the basic assumptions behind 14C dating that necessitate calibration and the relative strengths and weaknesses of archives from which calibration data are obtained. We also highlight the procedures, problems and uncertainties involved in determining atmospheric and surface ocean 14C/12C in these archives, including a discussion of the various methods used to derive an independent absolute timescale and uncertainty. The types of data required for the current IntCal database and calibration curve model are tabulated with examples.
Resumo:
In this paper a multiple classifier machine learning methodology for Predictive Maintenance (PdM) is presented. PdM is a prominent strategy for dealing with maintenance issues given the increasing need to minimize downtime and associated costs. One of the challenges with PdM is generating so called ’health factors’ or quantitative indicators of the status of a system associated with a given maintenance issue, and determining their relationship to operating costs and failure risk. The proposed PdM methodology allows dynamical decision rules to be adopted for maintenance management and can be used with high-dimensional and censored data problems. This is achieved by training multiple classification modules with different prediction horizons to provide different performance trade-offs in terms of frequency of unexpected breaks and unexploited lifetime and then employing this information in an operating cost based maintenance decision system to minimise expected costs. The effectiveness of the methodology is demonstrated using a simulated example and a benchmark semiconductor manufacturing maintenance problem.
Resumo:
Virtual metrology (VM) aims to predict metrology values using sensor data from production equipment and physical metrology values of preceding samples. VM is a promising technology for the semiconductor manufacturing industry as it can reduce the frequency of in-line metrology operations and provide supportive information for other operations such as fault detection, predictive maintenance and run-to-run control. The prediction models for VM can be from a large variety of linear and nonlinear regression methods and the selection of a proper regression method for a specific VM problem is not straightforward, especially when the candidate predictor set is of high dimension, correlated and noisy. Using process data from a benchmark semiconductor manufacturing process, this paper evaluates the performance of four typical regression methods for VM: multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), neural networks (NN) and Gaussian process regression (GPR). It is observed that GPR performs the best among the four methods and that, remarkably, the performance of linear regression approaches that of GPR as the subset of selected input variables is increased. The observed competitiveness of high-dimensional linear regression models, which does not hold true in general, is explained in the context of extreme learning machines and functional link neural networks.
Resumo:
Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representation of semantics. Evaluations show that the model 1) matches a behavioral measure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technologies and across subjects. We believe that the model is thus a more faithful representation of mental vocabularies.
Resumo:
The assimilation of discrete higher fidelity data points with model predictions can be used to achieve a reduction in the uncertainty of the model input parameters which generate accurate predictions. The problem investigated here involves the prediction of limit-cycle oscillations using a High-Dimensional Harmonic Balance method (HDHB). The efficiency of the HDHB method is exploited to enable calibration of structural input parameters using a Bayesian inference technique. Markov-chain Monte Carlo is employed to sample the posterior distributions. Parameter estimation is carried out on both a pitch/plunge aerofoil and Goland wing configuration. In both cases significant refinement was achieved in the distribution of possible structural parameters allowing better predictions of their
true deterministic values.
Resumo:
Slow release drugs must be manufactured to meet target specifications with respect to dissolution curve profiles. In this paper we consider the problem of identifying the drivers of dissolution curve variability of a drug from historical manufacturing data. Several data sources are considered: raw material parameters, coating data, loss on drying and pellet size statistics. The methodology employed is to develop predictive models using LASSO, a powerful machine learning algorithm for regression with high-dimensional datasets. LASSO provides sparse solutions facilitating the identification of the most important causes of variability in the drug fabrication process. The proposed methodology is illustrated using manufacturing data for a slow release drug.
Resumo:
New, automated forms of data-analysis are required in order to understand the high-dimensional trajectories that are obtained from molecular dynamics simulations on proteins. Dimensionality reduction algorithms are particularly appealing in this regard as they allow one to construct unbiased, low-dimensional representations of the trajectory using only the information encoded in the trajectory. The downside of this approach is that different sets of coordinates are required for each different chemical systems under study precisely because the coordinates are constructed using information from the trajectory. In this paper we show how one can resolve this problem by using the sketch-map algorithm that we recently proposed to construct a low-dimensional representation of the structures contained in the protein data bank (PDB). We show that the resulting coordinates are as useful for analysing trajectory data as coordinates constructed using landmark configurations taken from the trajectory and that these coordinates can thus be used for understanding protein folding across a range of systems.
Resumo:
This paper demonstrates the unparalleled value of full scale data which has been acquired from ocean trials of Aquamarine Power’s Oyster 800 Wave Energy Converter (WEC) at the European Marine Energy Centre (EMEC), Orkney, Scotland.
High quality prototype and wave data were simultaneously recorded in over 750 distinct sea states (comprising different wave height, wave period and tidal height combinations) and include periods of operation where the hydraulic Power Take-Off (PTO) system was both pressurised (damped operation) and de-pressurised (undamped operation).
A detailed model-prototype correlation procedure is presented where the full scale prototype behaviour is compared to predictions from both experimental and numerical modelling techniques via a high temporal resolution wave-by-wave reconstruction. This unquestionably provides the definitive verification of the capabilities of such research techniques and facilitates a robust and meaningful uncertainty analysis to be performed on their outputs.
The importance of a good data capture methodology, both in terms of handling and accuracy is also presented. The techniques and procedures implemented by Aquamarine Power for real-time data management are discussed, including lessons learned on the instrumentation and infrastructure required to collect high-value data.
Resumo:
Thesis (Master's)--University of Washington, 2015
Resumo:
É possível assistir nos dias de hoje, a um processo tecnológico evolutivo acentuado por toda a parte do globo. No caso das empresas, quer as pequenas, médias ou de grandes dimensões, estão cada vez mais dependentes dos sistemas informatizados para realizar os seus processos de negócio, e consequentemente à geração de informação referente aos negócios e onde, muitas das vezes, os dados não têm qualquer relacionamento entre si. A maioria dos sistemas convencionais informáticos não são projetados para gerir e armazenar informações estratégicas, impossibilitando assim que esta sirva de apoio como recurso estratégico. Portanto, as decisões são tomadas com base na experiência dos administradores, quando poderiam serem baseadas em factos históricos armazenados pelos diversos sistemas. Genericamente, as organizações possuem muitos dados, mas na maioria dos casos extraem pouca informação, o que é um problema em termos de mercados competitivos. Como as organizações procuram evoluir e superar a concorrência nas tomadas de decisão, surge neste contexto o termo Business Intelligence(BI). A GisGeo Information Systems é uma empresa que desenvolve software baseado em SIG (sistemas de informação geográfica) recorrendo a uma filosofia de ferramentas open-source. O seu principal produto baseia-se na localização geográfica dos vários tipos de viaturas, na recolha de dados, e consequentemente a sua análise (quilómetros percorridos, duração de uma viagem entre dois pontos definidos, consumo de combustível, etc.). Neste âmbito surge o tema deste projeto que tem objetivo de dar uma perspetiva diferente aos dados existentes, cruzando os conceitos BI com o sistema implementado na empresa de acordo com a sua filosofia. Neste projeto são abordados alguns dos conceitos mais importantes adjacentes a BI como, por exemplo, modelo dimensional, data Warehouse, o processo ETL e OLAP, seguindo a metodologia de Ralph Kimball. São também estudadas algumas das principais ferramentas open-source existentes no mercado, assim como quais as suas vantagens/desvantagens relativamente entre elas. Em conclusão, é então apresentada a solução desenvolvida de acordo com os critérios enumerados pela empresa como prova de conceito da aplicabilidade da área Business Intelligence ao ramo de Sistemas de informação Geográfica (SIG), recorrendo a uma ferramenta open-source que suporte visualização dos dados através de dashboards.
Resumo:
El presente manual de uso del software de visualización de datos “Ocean Data View” (ODV) describe la exploración, análisis y visualización de datos oceanográficos según el formato de la colección mundial de base de datos del océano “World Ocean Database” (WOD). El manual comprende 6 ejercicios prácticos donde se describe paso a paso la creación de las metavariables, la importación de los datos y su visualización mediante mapas de latitud, longitud y gráficos de dispersión, secciones verticales y series de tiempo. Se sugiere el uso extensivo del ODV para la visualización de datos oceanográficos por el personal científico del IMARPE.
Resumo:
Lexical processing among bilinguals is often affected by complex patterns of individual experience. In this paper we discuss the psychocentric perspective on language representation and processing, which highlights the centrality of individual experience in psycholinguistic experimentation. We discuss applications to the investigation of lexical processing among multilinguals and explore the advantages of using high-density experiments with multilinguals. High density experiments are designed to co-index measures of lexical perception and production, as well as participant profiles. We discuss the challenges associated with the characterization of participant profiles and present a new data visualization technique, that we term Facial Profiles. This technique is based on Chernoff faces developed over 40 years ago. The Facial Profile technique seeks to overcome some of the challenges associated with the use of Chernoff faces, while maintaining the core insight that recoding multivariate data as facial features can engage the human face recognition system and thus enhance our ability to detect and interpret patterns within multivariate datasets. We demonstrate that Facial Profiles can code participant characteristics in lexical processing studies by recoding variables such as reading ability, speaking ability, and listening ability into iconically-related relative sizes of eye, mouth, and ear, respectively. The balance of ability in bilinguals can be captured by creating composite facial profiles or Janus Facial Profiles. We demonstrate the use of Facial Profiles and Janus Facial Profiles in the characterization of participant effects in the study of lexical perception and production.