970 resultados para Data exploration
Resumo:
Visual exploration of scientific data in life science area is a growing research field due to the large amount of available data. The Kohonen’s Self Organizing Map (SOM) is a widely used tool for visualization of multidimensional data. In this paper we present a fast learning algorithm for SOMs that uses a simulated annealing method to adapt the learning parameters. The algorithm has been adopted in a data analysis framework for the generation of similarity maps. Such maps provide an effective tool for the visual exploration of large and multi-dimensional input spaces. The approach has been applied to data generated during the High Throughput Screening of molecular compounds; the generated maps allow a visual exploration of molecules with similar topological properties. The experimental analysis on real world data from the National Cancer Institute shows the speed up of the proposed SOM training process in comparison to a traditional approach. The resulting visual landscape groups molecules with similar chemical properties in densely connected regions.
Resumo:
This study has explored the underlying causes of preventable drug-related admissions to hospital, from primary care through semi-structured interviews and review of patients’ medical records. Analysis of the data has revealed that communication failures between different groups of healthcare professionals and between healthcare professionals and patients contribute to preventable drug-related admissions, as do knowledge gaps about medication in both healthcare professionals and patients. In addition, working conditions for community pharmacists severely limit their ability to effectively act as a safety barrier to patients receiving inappropriate medication. Limitations include heavy workloads, lack of access to patients’ clinical information, poor relationships with general practitioners and time restrictions. The results of this study represent an important addition to our understanding of the contribution of human error as an underlying cause of preventable drug-related morbidity, and the factors which contribute to errors occurring in the primary healthcare setting.
Resumo:
Particle filters are fully non-linear data assimilation techniques that aim to represent the probability distribution of the model state given the observations (the posterior) by a number of particles. In high-dimensional geophysical applications the number of particles required by the sequential importance resampling (SIR) particle filter in order to capture the high probability region of the posterior, is too large to make them usable. However particle filters can be formulated using proposal densities, which gives greater freedom in how particles are sampled and allows for a much smaller number of particles. Here a particle filter is presented which uses the proposal density to ensure that all particles end up in the high probability region of the posterior probability density function. This gives rise to the possibility of non-linear data assimilation in large dimensional systems. The particle filter formulation is compared to the optimal proposal density particle filter and the implicit particle filter, both of which also utilise a proposal density. We show that when observations are available every time step, both schemes will be degenerate when the number of independent observations is large, unlike the new scheme. The sensitivity of the new scheme to its parameter values is explored theoretically and demonstrated using the Lorenz (1963) model.
Resumo:
Visualization of high-dimensional data requires a mapping to a visual space. Whenever the goal is to preserve similarity relations a frequent strategy is to use 2D projections, which afford intuitive interactive exploration, e. g., by users locating and selecting groups and gradually drilling down to individual objects. In this paper, we propose a framework for projecting high-dimensional data to 3D visual spaces, based on a generalization of the Least-Square Projection (LSP). We compare projections to 2D and 3D visual spaces both quantitatively and through a user study considering certain exploration tasks. The quantitative analysis confirms that 3D projections outperform 2D projections in terms of precision. The user study indicates that certain tasks can be more reliably and confidently answered with 3D projections. Nonetheless, as 3D projections are displayed on 2D screens, interaction is more difficult. Therefore, we incorporate suitable interaction functionalities into a framework that supports 3D transformations, predefined optimal 2D views, coordinated 2D and 3D views, and hierarchical 3D cluster definition and exploration. For visually encoding data clusters in a 3D setup, we employ color coding of projected data points as well as four types of surface renderings. A second user study evaluates the suitability of these visual encodings. Several examples illustrate the framework`s applicability for both visual exploration of multidimensional abstract (non-spatial) data as well as the feature space of multi-variate spatial data.
Resumo:
Point placement strategies aim at mapping data points represented in higher dimensions to bi-dimensional spaces and are frequently used to visualize relationships amongst data instances. They have been valuable tools for analysis and exploration of data sets of various kinds. Many conventional techniques, however, do not behave well when the number of dimensions is high, such as in the case of documents collections. Later approaches handle that shortcoming, but may cause too much clutter to allow flexible exploration to take place. In this work we present a novel hierarchical point placement technique that is capable of dealing with these problems. While good grouping and separation of data with high similarity is maintained without increasing computation cost, its hierarchical structure lends itself both to exploration in various levels of detail and to handling data in subsets, improving analysis capability and also allowing manipulation of larger data sets.
Resumo:
Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.
Resumo:
Issues related to the reality of lesbian, gay, bisexual and transgender (LGBT) individuals are being incorporated into institutional and social discourses, and show the challenges that must be overcome towards citizenship. The inclusion of gay rights in the domain of institutions like the United Nations and the Brazilian Secretariat of Human Rights are a response to broader movements that places the gay subject as an important topic of debate in the social-political sphere. In this scenario, some institutions deserve close attention from researchers related to gay issues, the business environment being a good example. In this domain, diversity has become an important topic of debate between scholars, where the question of sexual identity in most cases does not appear. The literature that actually focuses on the theme is explored through approaches that are not able to break with universalisms and a normatized vocabulary. Therefore, this research explores discursive structures related to sexuality and examines the meanings construed throughout these structures as described by gay individuals working in business. Furthermore, it investigates patterns of discursive normative structures and consequential challenges faced by gay people in the working environment, and also complements the current debate both in the socio-political sphere and in academic reality on LGBT challenges. The Foucauldian notions of discourse, knowledge and power, and the main concepts of queer theory are incorporated to the analysis, as well as concepts related to the politics of post-colonial sexuality, subordination, and hegemonic forces, together with role of reflexivity in modernity and its impacts on secularized mental structures. The research design takes a phenomenological approach and bases its knowledge claim on a participatory perspective, where the sample chosen for data collection consisted of gay individuals working in the business environment, aiming at generate categories of meanings through the description of their experiences.
Resumo:
Groundwater and sandstone samples were analyzed for radon in Guarany aquifer, Parana sedimentary basin, South America. The dissolved radon ranged between 3 and 3303 pCi/l, being lognormally distributed, with a modal value of 1315 pCi/l, and a median value of 330 pCi/l. Rn-222 leakage experiments for sandstones yielded a theoretical value of 1390 pCi/l for Rn-222 in water, showing that theoretical modeling can reliably be used to interpret laboratory and field data. (C) 2002 Elsevier B.V. Ltd. All rights reserved.
Resumo:
In order to evaluate the use of shallow seismic technique to delineate geological and geotechnical features up to 40 meters depth in noisy urban areas covered with asphalt pavement, five survey lines were conducted in the metropolitan area of São Paulo City. The data were acquired using a 24-bit, 24-channel seismograph, 30 and 100 Hz geophones and a sledgehammer-plate system as seismic source. Seismic reflection data were recorded using a CMP (common mid point) acquisition method. The processing routine consisted of: prestack band-pass filtering (90-250 Hz); automatic gain control (AGC); muting (digital zeroin) of dead/noisy traces, ground roll, air-wave and refracted-wave; CMP sorting; velocity analyses; normal move-out corrections; residual static corrections; f-k filtering; CMP stacking. The near surface is geologically characterized by unconsolidated fill materials and Quaternary sediments with organic material overlying Tertiary sediments with the water table 2 to 5 m below the surface. The basement is composed of granite and gneiss. Reflections were observed from 40 milliseconds to 65 ms two-way traveltime and were related to the silt clay layer and fine sand layer contact of the Tertiary sediments and to the weathered basement. The CMP seismic-reflection technique has been shown to be useful for mapping the sedimentary layers and the bedrock of the São Paulo sedimentary basin for the purposes of shallow investigations related to engineering problems. In spite of the strong cultural noise observed in these urban areas and problems with planting geophones we verified that, with the proper equipment, field parameters and particularly great care in data collection and processing, we can overcome the adverse field conditions and to image reflections from layers as shallow as 20 meters.
Resumo:
Geophysical methods are widely used in mineral exploration. This paper discusses the results of geological and geophysical studies in supergene manganese deposits of southern Brazil. Mineralized zones as described in geological surveys were characterized as of low resistivity (20 Omega.m) and high chargeability (30ms), pattern found also in oxides and sulfite mineral deposits. Pseudo-3D modeling of geophysical data allowed mapping at several depths. A relationship between high chargeability and low resistivity may define a pattern for high grade gonditic manganese ore. Large areas of high chargeability and high resistivity may result in accumulation of manganese and iron hydroxides, due to weathering of the gonditic ore, dissolution, percolation and precipitation.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
In response to the increasing global demand for energy, oil exploration and development are expanding into frontier areas of the Arctic, where slow-growing tundra vegetation and the underlying permafrost soils are very sensitive to disturbance. The creation of vehicle trails on the tundra from seismic exploration for oil has accelerated in the past decade, and the cumulative impact represents a geographic footprint that covers a greater extent of Alaska’s North Slope tundra than all other direct human impacts combined. Seismic exploration for oil and gas was conducted on the coastal plain of the Arctic National Wildlife Refuge, Alaska, USA, in the winters of 1984 and 1985. This study documents recovery of vegetation and permafrost soils over a two-decade period after vehicle traffic on snow-covered tundra. Paired permanent vegetation plots (disturbed vs. reference) were monitored six times from 1984 to 2002. Data were collected on percent vegetative cover by plant species and on soil and ground ice characteristics. We developed Bayesian hierarchical models, with temporally and spatially autocorrelated errors, to analyze the effects of vegetation type and initial disturbance levels on recovery patterns of the different plant growth forms as well as soil thaw depth. Plant community composition was altered on the trails by species-specific responses to initial disturbance and subsequent changes in substrate. Long-term changes included increased cover of graminoids and decreased cover of evergreen shrubs and mosses. Trails with low levels of initial disturbance usually improved well over time, whereas those with medium to high levels of initial disturbance recovered slowly. Trails on ice-poor, gravel substrates of riparian areas recovered better than those on ice-rich loamy soils of the uplands, even after severe initial damage. Recovery to pre-disturbance communities was not possible where trail subsidence occurred due to thawing of ground ice. Previous studies of disturbance from winter seismic vehicles in the Arctic predicted short-term and mostly aesthetic impacts, but we found that severe impacts to tundra vegetation persisted for two decades after disturbance under some conditions. We recommend management approaches that should be used to prevent persistent tundra damage.
Resumo:
Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). Many conventional active learning algorithms focus on refining the decision boundary, at the expense of exploring new regions that the current hypothesis misclassifies. We propose a new active learning algorithm that balances such exploration with refining of the decision boundary by dynamically adjusting the probability to explore at each step. Our experimental results demonstrate improved performance on data sets that require extensive exploration while remaining competitive on data sets that do not. Our algorithm also shows significant tolerance of noise.
Resumo:
Despite their importance in the evaluation of petroleum and gas reservoirs, measurements of self-potential data under borehole conditions (well-logging) have found only minor applications in aquifer and waste-site characterization. This can be attributed to lower signals from the diffusion fronts in near-surface environments because measurements are made long after the drilling of the well, when concentration fronts are already disappearing. Proportionally higher signals arise from streaming potentials that prevent using simple interpretation models that assume signals from diffusion only. Our laboratory experiments found that dual-source self-potential signals can be described by a simple linear model, and that contributions (from diffusion and streaming potentials) can be isolated by slightly perturbing the borehole conditions. Perturbations are applied either by changing the concentration of the borehole-filling solution or its column height. Parameters useful for formation evaluation can be estimated from data measured during perturbations, namely, pore water resistivity, pressure drop across the borehole wall, and electrokinetic coupling parameter. These are important parameters to assess, respectively, water quality, aquifer lateral continuity, and interfacial properties of permeable formations.
Resumo:
One of the problems in the analysis of nucleus-nucleus collisions is to get information on the value of the impact parameter b. This work consists in the application of pattern recognition techniques aimed at associating values of b to groups of events. To this end, a support vec- tor machine (SVM) classifier is adopted to analyze multifragmentation reactions. This method allows to backtracing the values of b through a particular multidimensional analysis. The SVM classification con- sists of two main phase. In the first one, known as training phase, the classifier learns to discriminate the events that are generated by two different model:Classical Molecular Dynamics (CMD) and Heavy- Ion Phase-Space Exploration (HIPSE) for the reaction: 58Ni +48 Ca at 25 AMeV. To check the classification of events in the second one, known as test phase, what has been learned is tested on new events generated by the same models. These new results have been com- pared to the ones obtained through others techniques of backtracing the impact parameter. Our tests show that, following this approach, the central collisions and peripheral collisions, for the CMD events, are always better classified with respect to the classification by the others techniques of backtracing. We have finally performed the SVM classification on the experimental data measured by NUCL-EX col- laboration with CHIMERA apparatus for the previous reaction.