945 resultados para Data modeling
Resumo:
The design, development, and use of complex systems models raises a unique class of challenges and potential pitfalls, many of which are commonly recurring problems. Over time, researchers gain experience in this form of modeling, choosing algorithms, techniques, and frameworks that improve the quality, confidence level, and speed of development of their models. This increasing collective experience of complex systems modellers is a resource that should be captured. Fields such as software engineering and architecture have benefited from the development of generic solutions to recurring problems, called patterns. Using pattern development techniques from these fields, insights from communities such as learning and information processing, data mining, bioinformatics, and agent-based modeling can be identified and captured. Collections of such 'pattern languages' would allow knowledge gained through experience to be readily accessible to less-experienced practitioners and to other domains. This paper proposes a methodology for capturing the wisdom of computational modelers by introducing example visualization patterns, and a pattern classification system for analyzing the relationship between micro and macro behaviour in complex systems models. We anticipate that a new field of complex systems patterns will provide an invaluable resource for both practicing and future generations of modelers.
Resumo:
The developments of models in Earth Sciences, e.g. for earthquake prediction and for the simulation of mantel convection, are fare from being finalized. Therefore there is a need for a modelling environment that allows scientist to implement and test new models in an easy but flexible way. After been verified, the models should be easy to apply within its scope, typically by setting input parameters through a GUI or web services. It should be possible to link certain parameters to external data sources, such as databases and other simulation codes. Moreover, as typically large-scale meshes have to be used to achieve appropriate resolutions, the computational efficiency of the underlying numerical methods is important. Conceptional this leads to a software system with three major layers: the application layer, the mathematical layer, and the numerical algorithm layer. The latter is implemented as a C/C++ library to solve a basic, computational intensive linear problem, such as a linear partial differential equation. The mathematical layer allows the model developer to define his model and to implement high level solution algorithms (e.g. Newton-Raphson scheme, Crank-Nicholson scheme) or choose these algorithms form an algorithm library. The kernels of the model are generic, typically linear, solvers provided through the numerical algorithm layer. Finally, to provide an easy-to-use application environment, a web interface is (semi-automatically) built to edit the XML input file for the modelling code. In the talk, we will discuss the advantages and disadvantages of this concept in more details. We will also present the modelling environment escript which is a prototype implementation toward such a software system in Python (see www.python.org). Key components of escript are the Data class and the PDE class. Objects of the Data class allow generating, holding, accessing, and manipulating data, in such a way that the actual, in the particular context best, representation is transparent to the user. They are also the key to establish connections with external data sources. PDE class objects are describing (linear) partial differential equation objects to be solved by a numerical library. The current implementation of escript has been linked to the finite element code Finley to solve general linear partial differential equations. We will give a few simple examples which will illustrate the usage escript. Moreover, we show the usage of escript together with Finley for the modelling of interacting fault systems and for the simulation of mantel convection.
Resumo:
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.
Resumo:
Data visualization algorithms and feature selection techniques are both widely used in bioinformatics but as distinct analytical approaches. Until now there has been no method of measuring feature saliency while training a data visualization model. We derive a generative topographic mapping (GTM) based data visualization approach which estimates feature saliency simultaneously with the training of the visualization model. The approach not only provides a better projection by modeling irrelevant features with a separate noise model but also gives feature saliency values which help the user to assess the significance of each feature. We compare the quality of projection obtained using the new approach with the projections from traditional GTM and self-organizing maps (SOM) algorithms. The results obtained on a synthetic and a real-life chemoinformatics dataset demonstrate that the proposed approach successfully identifies feature significance and provides coherent (compact) projections. © 2006 IEEE.
Resumo:
When applying multivariate analysis techniques in information systems and social science disciplines, such as management information systems (MIS) and marketing, the assumption that the empirical data originate from a single homogeneous population is often unrealistic. When applying a causal modeling approach, such as partial least squares (PLS) path modeling, segmentation is a key issue in coping with the problem of heterogeneity in estimated cause-and-effect relationships. This chapter presents a new PLS path modeling approach which classifies units on the basis of the heterogeneity of the estimates in the inner model. If unobserved heterogeneity significantly affects the estimated path model relationships on the aggregate data level, the methodology will allow homogenous groups of observations to be created that exhibit distinctive path model estimates. The approach will, thus, provide differentiated analytical outcomes that permit more precise interpretations of each segment formed. An application on a large data set in an example of the American customer satisfaction index (ACSI) substantiates the methodology’s effectiveness in evaluating PLS path modeling results.
Resumo:
Most object-based approaches to Geographical Information Systems (GIS) have concentrated on the representation of geometric properties of objects in terms of fixed geometry. In our road traffic marking application domain we have a requirement to represent the static locations of the road markings but also enforce the associated regulations, which are typically geometric in nature. For example a give way line of a pedestrian crossing in the UK must be within 1100-3000 mm of the edge of the crossing pattern. In previous studies of the application of spatial rules (often called 'business logic') in GIS emphasis has been placed on the representation of topological constraints and data integrity checks. There is very little GIS literature that describes models for geometric rules, although there are some examples in the Computer Aided Design (CAD) literature. This paper introduces some of the ideas from so called variational CAD models to the GIS application domain, and extends these using a Geography Markup Language (GML) based representation. In our application we have an additional requirement; the geometric rules are often changed and vary from country to country so should be represented in a flexible manner. In this paper we describe an elegant solution to the representation of geometric rules, such as requiring lines to be offset from other objects. The method uses a feature-property model embraced in GML 3.1 and extends the possible relationships in feature collections to permit the application of parameterized geometric constraints to sub features. We show the parametric rule model we have developed and discuss the advantage of using simple parametric expressions in the rule base. We discuss the possibilities and limitations of our approach and relate our data model to GML 3.1. © 2006 Springer-Verlag Berlin Heidelberg.
Resumo:
PURPOSE. A methodology for noninvasively characterizing the three-dimensional (3-D) shape of the complete human eye is not currently available for research into ocular diseases that have a structural substrate, such as myopia. A novel application of a magnetic resonance imaging (MRI) acquisition and analysis technique is presented that, for the first time, allows the 3-D shape of the eye to be investigated fully. METHODS. The technique involves the acquisition of a T2-weighted MRI, which is optimized to reveal the fluid-filled chambers of the eye. Automatic segmentation and meshing algorithms generate a 3-D surface model, which can be shaded with morphologic parameters such as distance from the posterior corneal pole and deviation from sphericity. Full details of the method are illustrated with data from 14 eyes of seven individuals. The spatial accuracy of the calculated models is demonstrated by comparing the MRI-derived axial lengths with values measured in the same eyes using interferometry. RESULTS. The color-coded eye models showed substantial variation in the absolute size of the 14 eyes. Variations in the sphericity of the eyes were also evident, with some appearing approximately spherical whereas others were clearly oblate and one was slightly prolate. Nasal-temporal asymmetries were noted in some subjects. CONCLUSIONS. The MRI acquisition and analysis technique allows a novel way of examining 3-D ocular shape. The ability to stratify and analyze eye shape, ocular volume, and sphericity will further extend the understanding of which specific biometric parameters predispose emmetropic children subsequently to develop myopia. Copyright © Association for Research in Vision and Ophthalmology.
Resumo:
A consequence of a loss of coolant accident is the damage of adjacent insulation materials (IM). IM may then be transported to the containment sump strainers where water is drawn into the ECCS (emergency core cooling system). Blockage of the strainers by IM lead to an increased pressure drop acting on the operating ECCS pumps. IM can also penetrate the strainers, enter the reactor coolant system and then accumulate in the reactor pressure vessel. An experimental and theoretical study that concentrates on mineral wool fiber transport in the containment sump and the ECCS is being performed. The study entails fiber generation and the assessment of fiber transport in single and multi-effect experiments. The experiments include measurement of the terminal settling velocity, the strainer pressure drop, fiber sedimentation and resuspension in a channel flow and jet flow in a rectangular tank. An integrated test facility is also operated to assess the compounded effects. Each experimental facility is used to provide data for the validation of equivalent computational fluid dynamic models. The channel flow facility allows the determination of the steady state distribution of the fibers at different flow velocities. The fibers are modeled in the Eulerian-Eulerian reference frame as spherical wetted agglomerates. The fiber agglomerate size, density, the relative viscosity of the fluid-fiber mixture and the turbulent dispersion of the fibers all affect the steady state accumulation of fibers at the channel base. In the current simulations, two fiber phases are separately considered. The particle size is kept constant while the density is modified, which affects both the terminal velocity and volume fraction. The relative viscosity is only significant at higher concentrations. The numerical model finds that the fibers accumulate at the channel base even at high velocities; therefore, modifications to the drag and turbulent dispersion forces can be made to reduce fiber accumulation.
Resumo:
WiMAX has been introduced as a competitive alternative for metropolitan broadband wireless access technologies. It is connection oriented and it can provide very high data rates, large service coverage, and flexible quality of services (QoS). Due to the large number of connections and flexible QoS supported by WiMAX, the uplink access in WiMAX networks is very challenging since the medium access control (MAC) protocol must efficiently manage the bandwidth and related channel allocations. In this paper, we propose and investigate a cost-effective WiMAX bandwidth management scheme, named the WiMAX partial sharing scheme (WPSS), in order to provide good QoS while achieving better bandwidth utilization and network throughput. The proposed bandwidth management scheme is compared with a simple but inefficient scheme, named the WiMAX complete sharing scheme (WCPS). A maximum entropy (ME) based analytical model (MEAM) is proposed for the performance evaluation of the two bandwidth management schemes. The reason for using MEAM for the performance evaluation is that MEAM can efficiently model a large-scale system in which the number of stations or connections is generally very high, while the traditional simulation and analytical (e.g., Markov models) approaches cannot perform well due to the high computation complexity. We model the bandwidth management scheme as a queuing network model (QNM) that consists of interacting multiclass queues for different service classes. Closed form expressions for the state and blocking probability distributions are derived for those schemes. Simulation results verify the MEAM numerical results and show that WPSS can significantly improve the network's performance compared to WCPS.
Resumo:
Sentiment analysis has long focused on binary classification of text as either positive or negative. There has been few work on mapping sentiments or emotions into multiple dimensions. This paper studies a Bayesian modeling approach to multi-class sentiment classification and multidimensional sentiment distributions prediction. It proposes effective mechanisms to incorporate supervised information such as labeled feature constraints and document-level sentiment distributions derived from the training data into model learning. We have evaluated our approach on the datasets collected from the confession section of the Experience Project website where people share their life experiences and personal stories. Our results show that using the latent representation of the training documents derived from our approach as features to build a maximum entropy classifier outperforms other approaches on multi-class sentiment classification. In the more difficult task of multi-dimensional sentiment distributions prediction, our approach gives superior performance compared to a few competitive baselines. © 2012 ACM.
Resumo:
Forests play a pivotal role in timber production, maintenance and development of biodiversity and in carbon sequestration and storage in the context of the Kyoto Protocol. Policy makers and forest experts therefore require reliable information on forest extent, type and change for management, planning and modeling purposes. It is becoming increasingly clear that such forest information is frequently inconsistent and unharmonised between countries and continents. This research paper presents a forest information portal that has been developed in line with the GEOSS and INSPIRE frameworks. The web portal provides access to forest resources data at a variety of spatial scales, from global through to regional and local, as well as providing analytical capabilities for monitoring and validating forest change. The system also allows for the utilisation of forest data and processing services within other thematic areas. The web portal has been developed using open standards to facilitate accessibility, interoperability and data transfer.
Resumo:
The number of interoperable research infrastructures has increased significantly with the growing awareness of the efforts made by the Global Earth Observation System of Systems (GEOSS). One of the Societal Benefit Areas (SBA) that is benefiting most from GEOSS is biodiversity, given the costs of monitoring the environment and managing complex information, from space observations to species records including their genetic characteristics. But GEOSS goes beyond simple data sharing to encourage the publishing and combination of models, an approach which can ease the handling of complex multi-disciplinary questions. It is the purpose of this paper to illustrate these concepts by presenting eHabitat, a basic Web Processing Service (WPS) for computing the likelihood of finding ecosystems with equal properties to those specified by a user. When chained with other services providing data on climate change, eHabitat can be used for ecological forecasting and becomes a useful tool for decision-makers assessing different strategies when selecting new areas to protect. eHabitat can use virtually any kind of thematic data that can be considered as useful when defining ecosystems and their future persistence under different climatic or development scenarios. The paper will present the architecture and illustrate the concepts through case studies which forecast the impact of climate change on protected areas or on the ecological niche of an African bird.
Resumo:
Neuroimaging studies have consistently shown that working memory (WM) tasks engage a distributed neural network that primarily includes the dorsolateral prefrontal cortex, the parietal cortex, and the anterior cingulate cortex. The current challenge is to provide a mechanistic account of the changes observed in regional activity. To achieve this, we characterized neuroplastic responses in effective connectivity between these regions at increasing WM loads using dynamic causal modeling of functional magnetic resonance imaging data obtained from healthy individuals during a verbal n-back task. Our data demonstrate that increasing memory load was associated with (a) right-hemisphere dominance, (b) increasing forward (i.e., posterior to anterior) effective connectivity within the WM network, and (c) reduction in individual variability in WM network architecture resulting in the right-hemisphere forward model reaching an exceedance probability of 99% in the most demanding condition. Our results provide direct empirical support that task difficulty, in our case WM load, is a significant moderator of short-term plasticity, complementing existing theories of task-related reduction in variability in neural networks. Hum Brain Mapp, 2013. © 2013 Wiley Periodicals, Inc.
Resumo:
IEEE 802.15.4 standard has been recently developed for low power wireless personal area networks. It can find many applications for smart grid, such as data collection, monitoring and control functions. The performance of 802.15.4 networks has been widely studied in the literature. However the main focus has been on the modeling throughput performance with frame collisions. In this paper we propose an analytic model which can model the impact of frame collisions as well as frame corruptions due to channel bit errors. With this model the frame length can be carefully selected to improve system performance. The analytic model can also be used to study the 802.15.4 networks with interference from other co-located networks, such as IEEE 802.11 and Bluetooth networks. © 2011 Springer-Verlag.
Resumo:
Gene expression is frequently regulated by multiple transcription factors (TFs). Thermostatistical methods allow for a quantitative description of interactions between TFs, RNA polymerase and DNA, and their impact on the transcription rates. We illustrate three different scales of the thermostatistical approach: the microscale of TF molecules, the mesoscale of promoter energy levels and the macroscale of transcriptionally active and inactive cells in a cell population. We demonstrate versatility of combinatorial transcriptional activation by exemplifying logic functions, such as AND and OR gates. We discuss a metric for cell-to-cell transcriptional activation variability known as Fermi entropy. Suitability of thermostatistical modeling is illustrated by describing the experimental data on transcriptional induction of NF?B and the c-Fos protein.