970 resultados para Multivariate data


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pregnane X receptor (PXR) is an important nuclear receptor xenosensor that regulates the expression of metabolic enzymes and transporters involved in the metabolism of xenobiotics and endobiotics. In this study, ultra-performance liquid chromatography (UPLC) coupled with electrospray time-of-flight mass spectrometry (TOFMS), revealed altered urinary metabolomes in both Pxr-null and wild-type mice treated with the mouse PXR activator pregnenolone 16alpha-carbonitrile (PCN). Multivariate data analysis revealed that PCN significantly attenuated the urinary vitamin E metabolite alpha-carboxyethyl hydroxychroman (CEHC) glucuronide together with a novel metabolite in wild-type but not Pxr-null mice. Deconjugation experiments with beta-glucuronidase and beta-glucosidase suggested that the novel urinary metabolite was gamma-CEHC beta-D-glucoside (Glc). The identity of gamma-CEHC Glc was confirmed by chemical synthesis and by comparing tandem mass fragmentation of the urinary metabolite with the authentic standard. The lower urinary CEHC was likely due to PXR-mediated repression of hepatic sterol carrier protein 2 involved in peroxisomal beta-oxidation of branched-chain fatty acids (BCFA). Using a combination of metabolomic analysis and a genetically modified mouse model, this study revealed that activation of PXR results in attenuated levels of the two vitamin E conjugates, and identification of a novel vitamin E metabolite, gamma-CEHC Glc. Activation of PXR results in attenuated levels of the two vitamin E conjugates that may be useful as biomarkers of PXR activation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nuevas biotecnologías permiten obtener información para caracterizar materiales genéticos a partir de múltiples marcadores, ya sean éstos moleculares y/o morfológicos. La ordenación del material genético a través de la exploración de patrones de variabilidad multidimensionales se aborda mediante diversas técnicas de análisis multivariado. Las técnicas multivariadas de reducción de dimensión (TRD) y la representación gráfica de las mismas cobran sustancial importancia en la visualización de datos multivariados en espacios de baja dimensión ya que facilitan la interpretación de interrelaciones entre las variables (marcadores) y entre los casos u observaciones bajo análisis. Tanto el Análisis de Componentes Principales, como el Análisis de Coordenadas Principales y el Análisis de Procrustes Generalizado son TRD aplicables a datos provenientes de marcadores moleculares y/o morfológicos. Los Árboles de Mínimo Recorrido y los biplots constituyen técnicas para lograr representaciones geométricas de resultados provenientes de TRD. En este trabajo se describen estas técnicas multivariadas y se ilustran sus aplicaciones sobre dos conjuntos de datos, moleculares y morfológicos, usados para caracterizar material genético fúngico.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Las redes Bayesianas constituyen un modelo ampliamente utilizado para la representación de relaciones de dependencia condicional en datos multivariantes. Su aprendizaje a partir de un conjunto de datos o expertos ha sido estudiado profundamente desde su concepción. Sin embargo, en determinados escenarios se demanda la obtención de un modelo común asociado a particiones de datos o conjuntos de expertos. En este caso, se trata el problema de fusión o agregación de modelos. Los trabajos y resultados en agregación de redes Bayesianas son de naturaleza variada, aunque escasos en comparación con aquellos de aprendizaje. En este documento, se proponen dos métodos para la agregación de redes Gaussianas, definidas como aquellas redes Bayesianas que modelan una distribución Gaussiana multivariante. Los métodos presentados son efectivos, precisos y producen redes con menor cantidad de parámetros en comparación con los modelos obtenidos individualmente. Además, constituyen un enfoque novedoso al incorporar nociones exploradas tradicionalmente por separado en el estado del arte. Futuras aplicaciones en entornos escalables hacen dichos métodos especialmente atractivos, dada su simplicidad y la ganancia en compacidad de la representación obtenida.---ABSTRACT---Bayesian networks are a widely used model for the representation of conditional dependence relationships among variables in multivariate data. The task of learning them from a data set or experts has been deeply studied since their conception. However, situations emerge where there is a need of obtaining a consensuated model from several data partitions or a set of experts. This situation is referred to as model fusion or aggregation. Results about Bayesian network aggregation, although rich in variety, have been scarce when compared to the learning task. In this context, two methods are proposed for the aggregation of Gaussian Bayesian networks, that is, Bayesian networks whose underlying modelled distribution is a multivariate Gaussian. Both methods are effective, precise and produce networks with fewer parameters in comparison with the models obtained by individual learning. They constitute a novel approach given that they incorporate notions traditionally explored separately in the state of the art. Future applications in scalable computer environments make such models specially attractive, given their simplicity and the gaining in sparsity of the produced model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

O uso assertivo e eficiente das estratégias de aprendizagem depende, muitas vezes, da compreensão e consideração de aspectos psicológicos e motivacionais. O adequado emprego de estratégias de aprendizagem se reflete no desempenho acadêmico, no domínio de construtos e modelos e no amadurecimento crítico e científico. A presente tese defende que há uma relação entre as estratégias de aprendizagem autorregulada e as estratégias de aprendizagem autodeterminada predominantes em alunos de mestrado e doutorado em Contabilidade. O estudo se justifica, porquanto, porque além de inaugurar uma linha de pesquisa ainda inédita no contexto da Contabilidade Humana, seus resultados destacam um original entendimento da relação da aprendizagem com a regulação e a motivação pessoal. Tem como objetivo principal apresentar diagnóstico, dimensões e correlações das estratégias de aprendizagem autorregulada e aprendizagem autodeterminada de alunos de programas de pós-graduação stricto sensu em Contabilidade no Brasil. Participaram do survey 516 respondentes, sendo 383 mestrandos e 133 doutorandos. Foram aplicados dois instrumentos psicométricos: Self-Regulated Learning Strategies (SRLS) e Motivated Strategies for Learning Questionnaire (MSLQ). O modelo operacional de pesquisa delineou a formulação de oito hipóteses, sendo que a primeira delas sustenta a defesa da tese, enquanto as demais defendem a influência das variáveis idade, gênero, tipo de curso, estágio no curso, tipo de instituição de graduação, nota do curso atribuída pela Capes e graus de instrução dos pais nos níveis de Self-Regulated Learning (SRL) e Self-Determination Theory (SDT). A partir da análise multivariada dos dados, os resultados corroboraram a tese e a influência do gênero no nível de SRL. A metaconclusão desta tese ratifica os estudos referenciados, confirmando que a aprendizagem pode ser dominada e controlada pelo indivíduo, ao se adotar estratégias individuais de regulação e motivação. Uma importante contribuição desta pesquisa consiste em oferecer conclusões empíricas que podem ajudar docentes, discentes, pesquisadores, instituições de ensino e programas de pós-graduação a compreender mais sistematicamente os aspectos da aprendizagem autorregulada e da aprendizagem autodeterminada que caracterizam o aluno de Contabilidade. Limitações importantes deste estudo podem ser vistas como oportunidades para pesquisas futuras: a amostra envolve um público específico, a pesquisa survey pode apresentar vieses de método comum e a baixa participação de alunos de mestrado profissional. Estudos futuros poderão adotar outras estratégias metodológicas e/ou envolver amostras mais diversificadas ou em maior lastro temporal

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Univariate linkage analysis is used routinely to localise genes for human complex traits. Often, many traits are analysed but the significance of linkage for each trait is not corrected for multiple trait testing, which increases the experiment-wise type-I error rate. In addition, univariate analyses do not realise the full power provided by multivariate data sets. Multivariate linkage is the ideal solution but it is computationally intensive, so genome-wide analysis and evaluation of empirical significance are often prohibitive. We describe two simple methods that efficiently alleviate these caveats by combining P-values from multiple univariate linkage analyses. The first method estimates empirical pointwise and genome-wide significance between one trait and one marker when multiple traits have been tested. It is as robust as an appropriate Bonferroni adjustment, with the advantage that no assumptions are required about the number of independent tests performed. The second method estimates the significance of linkage between multiple traits and one marker and, therefore, it can be used to localise regions that harbour pleiotropic quantitative trait loci (QTL). We show that this method has greater power than individual univariate analyses to detect a pleiotropic QTL across different situations. In addition, when traits are moderately correlated and the QTL influences all traits, it can outperform formal multivariate VC analysis. This approach is computationally feasible for any number of traits and was not affected by the residual correlation between traits. We illustrate the utility of our approach with a genome scan of three asthma traits measured in families with a twin proband.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Wilbur Zelinsky formulated a Hypothesis of Mobility Transition in 1971,in which he tried to relate all aspects of mobility to the Demographic Transition and modernisation. This dissertation applies the theoretical framework, proposed by Zelinsky and extended to encompass a family of transitions, to understand migration patterns of city regions. The two city regions, Brisbane and Stockholm, are selected as case studies, representing important city regions of similar size, but drawn from contrasting historical settings. A comparison of the case studies with the theoretical framework aims to determine how the relative contributions of net migration, the source areas of migrants, and the migration intensity change with modernisation. In addition, the research also aims to identify aspects of modernisation affecting migration. These aspects of migration are analysed with a "historical approach" and a "multivariate approach". An extensive investigation into the city regions' historical background provides the source, from which evidence for a relationship between migration and modernisation is extracted. With this historical approach, similarities and differences in migration patterns are identified. The other research approach analyse multivariate data, from the last two decades, on migration flows and modernisation. Correlations between migration and key aspects of modernisation are tested with multivariate regression, based on an alternative version of a spatial interaction model. The project demonstrates that the changing functions of cities and the structural modernisation are influential on migration. Similar patterns are found, regarding the relative contributions of net migration and natural increase to population growth. The research finds links between these changes in the relative contribution of net migration and demographic modernisation. The findings on variations in urban and rural source areas of migrants to city regions do not contradict the expected pattern, but data limitations prevent definite conclusion to be drawn. The assessment of variations in migration intensity resulted in the expected pattern not being supported. Based on Swedish data, the hypothesised increase in migration intensity is rejected. Interactional migration data also show patterns different from those derived from the theoretical framework. The findings, from both research approaches, suggested that structural modernisation affected migration flows more than demographic modernisation. The findings lead to a formulation of hypothesised patterns for migration to city regions. The study provides an important research contribution by applying the two research approaches to city regions. It also combines the study of internal and international migration to address the research objectives within a framework of transitional change.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis presents an investigation, of synchronisation and causality, motivated by problems in computational neuroscience. The thesis addresses both theoretical and practical signal processing issues regarding the estimation of interdependence from a set of multivariate data generated by a complex underlying dynamical system. This topic is driven by a series of problems in neuroscience, which represents the principal background motive behind the material in this work. The underlying system is the human brain and the generative process of the data is based on modern electromagnetic neuroimaging methods . In this thesis, the underlying functional of the brain mechanisms are derived from the recent mathematical formalism of dynamical systems in complex networks. This is justified principally on the grounds of the complex hierarchical and multiscale nature of the brain and it offers new methods of analysis to model its emergent phenomena. A fundamental approach to study the neural activity is to investigate the connectivity pattern developed by the brain’s complex network. Three types of connectivity are important to study: 1) anatomical connectivity refering to the physical links forming the topology of the brain network; 2) effective connectivity concerning with the way the neural elements communicate with each other using the brain’s anatomical structure, through phenomena of synchronisation and information transfer; 3) functional connectivity, presenting an epistemic concept which alludes to the interdependence between data measured from the brain network. The main contribution of this thesis is to present, apply and discuss novel algorithms of functional connectivities, which are designed to extract different specific aspects of interaction between the underlying generators of the data. Firstly, a univariate statistic is developed to allow for indirect assessment of synchronisation in the local network from a single time series. This approach is useful in inferring the coupling as in a local cortical area as observed by a single measurement electrode. Secondly, different existing methods of phase synchronisation are considered from the perspective of experimental data analysis and inference of coupling from observed data. These methods are designed to address the estimation of medium to long range connectivity and their differences are particularly relevant in the context of volume conduction, that is known to produce spurious detections of connectivity. Finally, an asymmetric temporal metric is introduced in order to detect the direction of the coupling between different regions of the brain. The method developed in this thesis is based on a machine learning extensions of the well known concept of Granger causality. The thesis discussion is developed alongside examples of synthetic and experimental real data. The synthetic data are simulations of complex dynamical systems with the intention to mimic the behaviour of simple cortical neural assemblies. They are helpful to test the techniques developed in this thesis. The real datasets are provided to illustrate the problem of brain connectivity in the case of important neurological disorders such as Epilepsy and Parkinson’s disease. The methods of functional connectivity in this thesis are applied to intracranial EEG recordings in order to extract features, which characterize underlying spatiotemporal dynamics before during and after an epileptic seizure and predict seizure location and onset prior to conventional electrographic signs. The methodology is also applied to a MEG dataset containing healthy, Parkinson’s and dementia subjects with the scope of distinguishing patterns of pathological from physiological connectivity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis seeks to describe the development of an inexpensive and efficient clustering technique for multivariate data analysis. The technique starts from a multivariate data matrix and ends with graphical representation of the data and pattern recognition discriminant function. The technique also results in distances frequency distribution that might be useful in detecting clustering in the data or for the estimation of parameters useful in the discrimination between the different populations in the data. The technique can also be used in feature selection. The technique is essentially for the discovery of data structure by revealing the component parts of the data. lhe thesis offers three distinct contributions for cluster analysis and pattern recognition techniques. The first contribution is the introduction of transformation function in the technique of nonlinear mapping. The second contribution is the us~ of distances frequency distribution instead of distances time-sequence in nonlinear mapping, The third contribution is the formulation of a new generalised and normalised error function together with its optimal step size formula for gradient method minimisation. The thesis consists of five chapters. The first chapter is the introduction. The second chapter describes multidimensional scaling as an origin of nonlinear mapping technique. The third chapter describes the first developing step in the technique of nonlinear mapping that is the introduction of "transformation function". The fourth chapter describes the second developing step of the nonlinear mapping technique. This is the use of distances frequency distribution instead of distances time-sequence. The chapter also includes the new generalised and normalised error function formulation. Finally, the fifth chapter, the conclusion, evaluates all developments and proposes a new program. for cluster analysis and pattern recognition by integrating all the new features.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Microfluidics has recently emerged as a new method of manufacturing liposomes, which allows for reproducible mixing in miliseconds on the nanoliter scale. Here we investigate microfluidics-based manufacturing of liposomes. The aim of these studies was to assess the parameters in a microfluidic process by varying the total flow rate (TFR) and the flow rate ratio (FRR) of the solvent and aqueous phases. Design of experiment and multivariate data analysis were used for increased process understanding and development of predictive and correlative models. High FRR lead to the bottom-up synthesis of liposomes, with a strong correlation with vesicle size, demonstrating the ability to in-process control liposomes size; the resulting liposome size correlated with the FRR in the microfluidics process, with liposomes of 50 nm being reproducibly manufactured. Furthermore, we demonstrate the potential of a high throughput manufacturing of liposomes using microfluidics with a four-fold increase in the volumetric flow rate, maintaining liposome characteristics. The efficacy of these liposomes was demonstrated in transfection studies and was modelled using predictive modeling. Mathematical modelling identified FRR as the key variable in the microfluidic process, with the highest impact on liposome size, polydispersity and transfection efficiency. This study demonstrates microfluidics as a robust and high-throughput method for the scalable and highly reproducible manufacture of size-controlled liposomes. Furthermore, the application of statistically based process control increases understanding and allows for the generation of a design-space for controlled particle characteristics.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Purpose ‐ This study provides empirical evidence for the contextuality of marketing performance assessment (MPA) systems. It aims to introduce a taxonomical classification of MPA profiles based on the relative emphasis placed on different dimensions of marketing performance in different companies and business contexts. Design/methodology/approach ‐ The data used in this study (n=1,157) were collected using a web-based questionnaire, targeted to top managers in Finnish companies. Two multivariate data analysis techniques were used to address the research questions. First, dimensions of marketing performance underlying the current MPA systems were identified through factor analysis. Second, a taxonomy of different profiles of marketing performance measurement was created by clustering respondents based on the relative emphasis placed on the dimensions and characterizing them vis-á-vis contextual factors. Findings ‐ The study identifies nine broad dimensions of marketing performance that underlie the MPA systems in use and five MPA profiles typical of companies of varying sizes in varying industries, market life cycle stages, and competitive positions associated with varying levels of market orientation and business performance. The findings support the previously conceptual notion of contextuality in MPA and provide empirical evidence for the factors that affect MPA systems in practice. Originality/value ‐ The paper presents the first field study of current MPA systems focusing on combinations of metrics in use. The findings of the study provide empirical support for the contextuality of MPA and form a classification of existing contextual systems suitable for benchmarking purposes. Limited evidence for performance differences between MPA profiles is also provided.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nanoparticles offer an ideal platform for the delivery of small molecule drugs, subunit vaccines and genetic constructs. Besides the necessity of a homogenous size distribution, defined loading efficiencies and reasonable production and development costs, one of the major bottlenecks in translating nanoparticles into clinical application is the need for rapid, robust and reproducible development techniques. Within this thesis, microfluidic methods were investigated for the manufacturing, drug or protein loading and purification of pharmaceutically relevant nanoparticles. Initially, methods to prepare small liposomes were evaluated and compared to a microfluidics-directed nanoprecipitation method. To support the implementation of statistical process control, design of experiment models aided the process robustness and validation for the methods investigated and gave an initial overview of the size ranges obtainable in each method whilst evaluating advantages and disadvantages of each method. The lab-on-a-chip system resulted in a high-throughput vesicle manufacturing, enabling a rapid process and a high degree of process control. To further investigate this method, cationic low transition temperature lipids, cationic bola-amphiphiles with delocalized charge centers, neutral lipids and polymers were used in the microfluidics-directed nanoprecipitation method to formulate vesicles. Whereas the total flow rate (TFR) and the ratio of solvent to aqueous stream (flow rate ratio, FRR) was shown to be influential for controlling the vesicle size in high transition temperature lipids, the factor FRR was found the most influential factor controlling the size of vesicles consisting of low transition temperature lipids and polymer-based nanoparticles. The biological activity of the resulting constructs was confirmed by an invitro transfection of pDNA constructs using cationic nanoprecipitated vesicles. Design of experiments and multivariate data analysis revealed the mathematical relationship and significance of the factors TFR and FRR in the microfluidics process to the liposome size, polydispersity and transfection efficiency. Multivariate tools were used to cluster and predict specific in-vivo immune responses dependent on key liposome adjuvant characteristics upon delivery a tuberculosis antigen in a vaccine candidate. The addition of a low solubility model drug (propofol) in the nanoprecipitation method resulted in a significantly higher solubilisation of the drug within the liposomal bilayer, compared to the control method. The microfluidics method underwent scale-up work by increasing the channel diameter and parallelisation of the mixers in a planar way, resulting in an overall 40-fold increase in throughput. Furthermore, microfluidic tools were developed based on a microfluidics-directed tangential flow filtration, which allowed for a continuous manufacturing, purification and concentration of liposomal drug products.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data Visualization is widely used to facilitate the comprehension of information and find relationships between data. One of the most widely used techniques for multivariate data (4 or more variables) visualization is the 2D scatterplot. This technique associates each data item to a visual mark in the following way: two variables are mapped to Cartesian coordinates so that a visual mark can be placed on the Cartesian plane; the others variables are mapped gradually to visual properties of the mark, such as size, color, shape, among others. As the number of variables to be visualized increases, the amount of visual properties associated to the mark increases as well. As a result, the complexity of the final visualization is higher. However, increasing the complexity of the visualization does not necessarily implies a better visualization and, sometimes, it provides an inverse situation, producing a visually polluted and confusing visualization—this problem is called visual properties overload. This work aims to investigate whether it is possible to work around the overload of the visual channel and improve insight about multivariate data visualized through a modification in the 2D scatterplot technique. In this modification, we map the variables from data items to multisensoriy marks. These marks are composed not only by visual properties, but haptic properties, such as vibration, viscosity and elastic resistance, as well. We believed that this approach could ease the insight process, through the transposition of properties from the visual channel to the haptic channel. The hypothesis was verified through experiments, in which we have analyzed (a) the accuracy of the answers; (b) response time; and (c) the grade of personal satisfaction with the proposed approach. However, the hypothesis was not validated. The results suggest that there is an equivalence between the investigated visual and haptic properties in all analyzed aspects, though in strictly numeric terms the multisensory visualization achieved better results in response time and personal satisfaction.