952 resultados para Dynamic data set visualization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hierarchical visualization systems are desirable because a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex high-dimensional data sets. We extend an existing locally linear hierarchical visualization system PhiVis [1] in several directions: bf(1) we allow for em non-linear projection manifolds (the basic building block is the Generative Topographic Mapping -- GTM), bf(2) we introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree, bf(3) we describe folding patterns of low-dimensional projection manifold in high-dimensional data space by computing and visualizing the manifold's local directional curvatures. Quantities such as magnification factors [3] and directional curvatures are helpful for understanding the layout of the nonlinear projection manifold in the data space and for further refinement of the hierarchical visualization plot. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. We demonstrate the visualization system principle of the approach on a complex 12-dimensional data set and mention possible applications in the pharmaceutical industry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A combination of deductive reasoning, clustering, and inductive learning is given as an example of a hybrid system for exploratory data analysis. Visualization is replaced by a dialogue with the data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Estuaries are perhaps the most threatened environments in the coastal fringe; the coincidence of high natural value and attractiveness for human use has led to conflicts between conservation and development. These conflicts occur in the Sado Estuary since its location is near the industrialised zone of Peninsula of Setúbal and at the same time, a great part of the Estuary is classified as a Natural Reserve due to its high biodiversity. These facts led us to the need of implementing a model of environmental management and quality assessment, based on methodologies that enable the assessment of the Sado Estuary quality and evaluation of the human pressures in the estuary. These methodologies are based on indicators that can better depict the state of the environment and not necessarily all that could be measured or analysed. Sediments have always been considered as an important temporary source of some compounds or a sink for other type of materials or an interface where a great diversity of biogeochemical transformations occur. For all this they are of great importance in the formulation of coastal management system. Many authors have been using sediments to monitor aquatic contamination, showing great advantages when compared to the sampling of the traditional water column. The main objective of this thesis was to develop an estuary environmental management framework applied to Sado Estuary using the DPSIR Model (EMMSado), including data collection, data processing and data analysis. The support infrastructure of EMMSado were a set of spatially contiguous and homogeneous regions of sediment structure (management units). The environmental quality of the estuary was assessed through the sediment quality assessment and integrated in a preliminary stage with the human pressure for development. Besides the earlier explained advantages, studying the quality of the estuary mainly based on the indicators and indexes of the sediment compartment also turns this methodology easier, faster and human and financial resource saving. These are essential factors to an efficient environmental management of coastal areas. Data management, visualization, processing and analysis was obtained through the combined use of indicators and indices, sampling optimization techniques, Geographical Information Systems, remote sensing, statistics for spatial data, Global Positioning Systems and best expert judgments. As a global conclusion, from the nineteen management units delineated and analyzed three showed no ecological risk (18.5 % of the study area). The areas of more concern (5.6 % of the study area) are located in the North Channel and are under strong human pressure mainly due to industrial activities. These areas have also low hydrodynamics and are, thus associated with high levels of deposition. In particular the areas near Lisnave and Eurominas industries can also accumulate the contamination coming from Águas de Moura Channel, since particles coming from that channel can settle down in that area due to residual flow. In these areas the contaminants of concern, from those analyzed, are the heavy metals and metalloids (Cd, Cu, Zn and As exceeded the PEL guidelines) and the pesticides BHC isomers, heptachlor, isodrin, DDT and metabolits, endosulfan and endrin. In the remain management units (76 % of the study area) there is a moderate impact potential of occurrence of adverse ecological effects and in some of these areas no stress agents could be identified. This emphasizes the need for further research, since unmeasured chemicals may be causing or contributing to these adverse effects. Special attention must be taken to the units with moderate impact potential of occurrence of adverse ecological effects, located inside the natural reserve. Non-point source pollution coming from agriculture and aquaculture activities also seem to contribute with important pollution load into the estuary entering from Águas de Moura Channel. This pressure is expressed in a moderate impact potential for ecological risk existent in the areas near the entrance of this Channel. Pressures may also came from Alcácer Channel although they were not quantified in this study. The management framework presented here, including all the methodological tools may be applied and tested in other estuarine ecosystems, which will also allow a comparison between estuarine ecosystems in other parts of the globe.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the growing complexity and adaptability requirements of real-time systems, which often exhibit unrestricted Quality of Service (QoS) inter-dependencies among supported services and user-imposed quality constraints, it is increasingly difficult to optimise the level of service of a dynamic task set within an useful and bounded time. This is even more difficult when intending to benefit from the full potential of an open distributed cooperating environment, where service characteristics are not known beforehand and tasks may be inter-dependent. This paper focuses on optimising a dynamic local set of inter-dependent tasks that can be executed at varying levels of QoS to achieve an efficient resource usage that is constantly adapted to the specific constraints of devices and users, nature of executing tasks and dynamically changing system conditions. Extensive simulations demonstrate that the proposed anytime algorithms are able to quickly find a good initial solution and effectively optimise the rate at which the quality of the current solution improves as the algorithms are given more time to run, with a minimum overhead when compared against their traditional versions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines the impact of globalization on cross-country inequality and poverty using a panel data set for 65 developing counties, over the period 1970-2008. With separate modelling for poverty and inequality, explicit control for financial intermediation, and comparative analysis for developing countries, the study attempts to provide a deeper understanding of cross country variations in income inequality and poverty. The major findings of the study are five fold. First, a non-monotonic relationship between income distribution and the level of economic development holds in all samples of countries. Second, both openness to trade and FDI do not have a favourable effect on income distribution in developing countries. Third, high financial liberalization exerts a negative and significant influence on income distribution in developing countries. Fourth, inflation seems to distort income distribution in all sets of countries. Finally, the government emerges as a major player in impacting income distribution in developing countries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Empirical literature on the analysis of the efficiency of measures for reducing persistent government deficits has mainly focused on the direct explanation of deficit. By contrast, this paper aims at modeling government revenue and expenditure within a simultaneous framework and deriving the fiscal balance (surplus or deficit) equation as the difference between the two variables. This setting enables one to not only judge how relevant the explanatory variables are in explaining the fiscal balance but also understand their impact on revenue and/or expenditure. Our empirical results, obtained by using a panel data set on Swiss Cantons for the period 1980-2002, confirm the relevance of the approach followed here, by providing unambiguous evidence of a simultaneous relationship between revenue and expenditure. They also reveal strong dynamic components in revenue, expenditure, and fiscal balance. Among the significant determinants of public fiscal balance we not only find the usual business cycle elements, but also and more importantly institutional factors such as the number of administrative units, and the ease with which people can resort to political (direct democracy) instruments, such as public initiatives and referendum.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper deals with the development and application of the methodology for automatic mapping of pollution/contamination data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve this problem. The automatic tuning of isotropic and an anisotropic GRNN model using cross-validation procedure is presented. Results are compared with k-nearest-neighbours interpolation algorithm using independent validation data set. Quality of mapping is controlled by the analysis of raw data and the residuals using variography. Maps of probabilities of exceeding a given decision level and ?thick? isoline visualization of the uncertainties are presented as examples of decision-oriented mapping. Real case study is based on mapping of radioactively contaminated territories.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In order to compare coronary magnetic resonance angiography (MRA) data obtained with different scanning methodologies, adequate visualization and presentation of the coronary MRA data need to be ensured. Furthermore, an objective quantitative comparison between images acquired with different scanning methods is desirable. To address this need, a software tool ("Soap-Bubble") that facilitates visualization and quantitative comparison of 3D volume targeted coronary MRA data was developed. In the present implementation, the user interactively specifies a curved subvolume (enclosed in the 3D coronary MRA data set) that closely encompasses the coronary arterial segments. With a 3D Delaunay triangulation and a parallel projection, this enables the simultaneous display of multiple coronary segments in one 2D representation. For objective quantitative analysis, frequently explored quantitative parameters such as signal-to-noise ratio (SNR); contrast-to-noise ratio (CNR); and vessel length, sharpness, and diameter can be assessed. The present tool supports visualization and objective, quantitative comparisons of coronary MRA data obtained with different scanning methods. The first results obtained in healthy adults and in patients with coronary artery disease are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performances of high-speed network communications frequently rest with the distribution of data-stream. In this paper, a dynamic data-stream balancing architecture based on link information is introduced and discussed firstly. Then the algorithms for simultaneously acquiring the passing nodes and links of a path between any two source-destination nodes rapidly, as well as a dynamic data-stream distribution planning are proposed. Some related topics such as data fragment disposal, fair service, etc. are further studied and discussed. Besides, the performance and efficiency of proposed algorithms, especially for fair service and convergence, are evaluated through a demonstration with regard to the rate of bandwidth utilization. Hoping the discussion presented here can be helpful to application developers in selecting an effective strategy for planning the distribution of data-stream.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We explore the influence of the choice of attenuation factor on Katz centrality indices for evolving communication networks. For given snapshots of a network observed over a period of time, recently developed communicability indices aim to identify best broadcasters and listeners in the network. In this article, we looked into the sensitivity of communicability indices on the attenuation factor constraint, in relation to spectral radius (the largest eigenvalue) of the network at any point in time and its computation in the case of large networks. We proposed relaxed communicability measures where the spectral radius bound on attenuation factor is relaxed and the adjacency matrix is normalised in order to maintain the convergence of the measure. Using a vitality based measure of both standard and relaxed communicability indices we looked at the ways of establishing the most important individuals for broadcasting and receiving of messages related to community bridging roles. We illustrated our findings with two examples of real-life networks, MIT reality mining data set of daily communications between 106 individuals during one year and UK Twitter mentions network, direct messages on Twitter between 12.4k individuals during one week.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study the effect of the cultivar on the volatile profile of five different banana varieties was evaluated and determined by dynamic headspace solid-phase microextraction (dHS-SPME) combined with one-dimensional gas chromatography–mass spectrometry (1D-GC–qMS). This approach allowed the definition of a volatile metabolite profile to each banana variety and can be used as pertinent criteria of differentiation. The investigated banana varieties (Dwarf Cavendish, Prata, Maçã, Ouro and Platano) have certified botanical origin and belong to the Musaceae family, the most common genomic group cultivated in Madeira Island (Portugal). The influence of dHS-SPME experimental factors, namely, fibre coating, extraction time and extraction temperature, on the equilibrium headspace analysis was investigated and optimised using univariate optimisation design. A total of 68 volatile organic metabolites (VOMs) were tentatively identified and used to profile the volatile composition in different banana cultivars, thus emphasising the sensitivity and applicability of SPME for establishment of the volatile metabolomic pattern of plant secondary metabolites. Ethyl esters were found to comprise the largest chemical class accounting 80.9%, 86.5%, 51.2%, 90.1% and 6.1% of total peak area for Dwarf Cavendish, Prata, Ouro, Maçã and Platano volatile fraction, respectively. Gas chromatographic peak areas were submitted to multivariate statistical analysis (principal component and stepwise linear discriminant analysis) in order to visualise clusters within samples and to detect the volatile metabolites able to differentiate banana cultivars. The application of the multivariate analysis on the VOMs data set resulted in predictive abilities of 90% as evaluated by the cross-validation procedure.