21 resultados para Data modeling
em Aston University Research Archive
Resumo:
The authors propose a new approach to discourse analysis which is based on meta data from social networking behavior of learners who are submerged in a socially constructivist e-learning environment. It is shown that traditional data modeling techniques can be combined with social network analysis - an approach that promises to yield new insights into the largely uncharted domain of network-based discourse analysis. The chapter is treated as a non-technical introduction and is illustrated with real examples, visual representations, and empirical findings. Within the setting of a constructivist statistics course, the chapter provides an illustration of what network-based discourse analysis is about (mainly from a methodological point of view), how it is implemented in practice, and why it is relevant for researchers and educators.
Resumo:
The data available during the drug discovery process is vast in amount and diverse in nature. To gain useful information from such data, an effective visualisation tool is required. To provide better visualisation facilities to the domain experts (screening scientist, biologist, chemist, etc.),we developed a software which is based on recently developed principled visualisation algorithms such as Generative Topographic Mapping (GTM) and Hierarchical Generative Topographic Mapping (HGTM). The software also supports conventional visualisation techniques such as Principal Component Analysis, NeuroScale, PhiVis, and Locally Linear Embedding (LLE). The software also provides global and local regression facilities . It supports regression algorithms such as Multilayer Perceptron (MLP), Radial Basis Functions network (RBF), Generalised Linear Models (GLM), Mixture of Experts (MoE), and newly developed Guided Mixture of Experts (GME). This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install & use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.
Resumo:
Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. miniDVMS v1.8 provides a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualisation domain. The advantage of this interface is that the user is directly involved in the data mining process. Principled projection methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), are integrated with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, and user interaction facilities, to provide this integrated visual data mining framework. The software also supports conventional visualisation techniques such as principal component analysis (PCA), Neuroscale, and PhiVis. This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install and use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.
Resumo:
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.
Resumo:
Data visualization algorithms and feature selection techniques are both widely used in bioinformatics but as distinct analytical approaches. Until now there has been no method of measuring feature saliency while training a data visualization model. We derive a generative topographic mapping (GTM) based data visualization approach which estimates feature saliency simultaneously with the training of the visualization model. The approach not only provides a better projection by modeling irrelevant features with a separate noise model but also gives feature saliency values which help the user to assess the significance of each feature. We compare the quality of projection obtained using the new approach with the projections from traditional GTM and self-organizing maps (SOM) algorithms. The results obtained on a synthetic and a real-life chemoinformatics dataset demonstrate that the proposed approach successfully identifies feature significance and provides coherent (compact) projections. © 2006 IEEE.
Resumo:
When applying multivariate analysis techniques in information systems and social science disciplines, such as management information systems (MIS) and marketing, the assumption that the empirical data originate from a single homogeneous population is often unrealistic. When applying a causal modeling approach, such as partial least squares (PLS) path modeling, segmentation is a key issue in coping with the problem of heterogeneity in estimated cause-and-effect relationships. This chapter presents a new PLS path modeling approach which classifies units on the basis of the heterogeneity of the estimates in the inner model. If unobserved heterogeneity significantly affects the estimated path model relationships on the aggregate data level, the methodology will allow homogenous groups of observations to be created that exhibit distinctive path model estimates. The approach will, thus, provide differentiated analytical outcomes that permit more precise interpretations of each segment formed. An application on a large data set in an example of the American customer satisfaction index (ACSI) substantiates the methodology’s effectiveness in evaluating PLS path modeling results.
Resumo:
Most object-based approaches to Geographical Information Systems (GIS) have concentrated on the representation of geometric properties of objects in terms of fixed geometry. In our road traffic marking application domain we have a requirement to represent the static locations of the road markings but also enforce the associated regulations, which are typically geometric in nature. For example a give way line of a pedestrian crossing in the UK must be within 1100-3000 mm of the edge of the crossing pattern. In previous studies of the application of spatial rules (often called 'business logic') in GIS emphasis has been placed on the representation of topological constraints and data integrity checks. There is very little GIS literature that describes models for geometric rules, although there are some examples in the Computer Aided Design (CAD) literature. This paper introduces some of the ideas from so called variational CAD models to the GIS application domain, and extends these using a Geography Markup Language (GML) based representation. In our application we have an additional requirement; the geometric rules are often changed and vary from country to country so should be represented in a flexible manner. In this paper we describe an elegant solution to the representation of geometric rules, such as requiring lines to be offset from other objects. The method uses a feature-property model embraced in GML 3.1 and extends the possible relationships in feature collections to permit the application of parameterized geometric constraints to sub features. We show the parametric rule model we have developed and discuss the advantage of using simple parametric expressions in the rule base. We discuss the possibilities and limitations of our approach and relate our data model to GML 3.1. © 2006 Springer-Verlag Berlin Heidelberg.
Resumo:
PURPOSE. A methodology for noninvasively characterizing the three-dimensional (3-D) shape of the complete human eye is not currently available for research into ocular diseases that have a structural substrate, such as myopia. A novel application of a magnetic resonance imaging (MRI) acquisition and analysis technique is presented that, for the first time, allows the 3-D shape of the eye to be investigated fully. METHODS. The technique involves the acquisition of a T2-weighted MRI, which is optimized to reveal the fluid-filled chambers of the eye. Automatic segmentation and meshing algorithms generate a 3-D surface model, which can be shaded with morphologic parameters such as distance from the posterior corneal pole and deviation from sphericity. Full details of the method are illustrated with data from 14 eyes of seven individuals. The spatial accuracy of the calculated models is demonstrated by comparing the MRI-derived axial lengths with values measured in the same eyes using interferometry. RESULTS. The color-coded eye models showed substantial variation in the absolute size of the 14 eyes. Variations in the sphericity of the eyes were also evident, with some appearing approximately spherical whereas others were clearly oblate and one was slightly prolate. Nasal-temporal asymmetries were noted in some subjects. CONCLUSIONS. The MRI acquisition and analysis technique allows a novel way of examining 3-D ocular shape. The ability to stratify and analyze eye shape, ocular volume, and sphericity will further extend the understanding of which specific biometric parameters predispose emmetropic children subsequently to develop myopia. Copyright © Association for Research in Vision and Ophthalmology.
Resumo:
A consequence of a loss of coolant accident is the damage of adjacent insulation materials (IM). IM may then be transported to the containment sump strainers where water is drawn into the ECCS (emergency core cooling system). Blockage of the strainers by IM lead to an increased pressure drop acting on the operating ECCS pumps. IM can also penetrate the strainers, enter the reactor coolant system and then accumulate in the reactor pressure vessel. An experimental and theoretical study that concentrates on mineral wool fiber transport in the containment sump and the ECCS is being performed. The study entails fiber generation and the assessment of fiber transport in single and multi-effect experiments. The experiments include measurement of the terminal settling velocity, the strainer pressure drop, fiber sedimentation and resuspension in a channel flow and jet flow in a rectangular tank. An integrated test facility is also operated to assess the compounded effects. Each experimental facility is used to provide data for the validation of equivalent computational fluid dynamic models. The channel flow facility allows the determination of the steady state distribution of the fibers at different flow velocities. The fibers are modeled in the Eulerian-Eulerian reference frame as spherical wetted agglomerates. The fiber agglomerate size, density, the relative viscosity of the fluid-fiber mixture and the turbulent dispersion of the fibers all affect the steady state accumulation of fibers at the channel base. In the current simulations, two fiber phases are separately considered. The particle size is kept constant while the density is modified, which affects both the terminal velocity and volume fraction. The relative viscosity is only significant at higher concentrations. The numerical model finds that the fibers accumulate at the channel base even at high velocities; therefore, modifications to the drag and turbulent dispersion forces can be made to reduce fiber accumulation.
Resumo:
WiMAX has been introduced as a competitive alternative for metropolitan broadband wireless access technologies. It is connection oriented and it can provide very high data rates, large service coverage, and flexible quality of services (QoS). Due to the large number of connections and flexible QoS supported by WiMAX, the uplink access in WiMAX networks is very challenging since the medium access control (MAC) protocol must efficiently manage the bandwidth and related channel allocations. In this paper, we propose and investigate a cost-effective WiMAX bandwidth management scheme, named the WiMAX partial sharing scheme (WPSS), in order to provide good QoS while achieving better bandwidth utilization and network throughput. The proposed bandwidth management scheme is compared with a simple but inefficient scheme, named the WiMAX complete sharing scheme (WCPS). A maximum entropy (ME) based analytical model (MEAM) is proposed for the performance evaluation of the two bandwidth management schemes. The reason for using MEAM for the performance evaluation is that MEAM can efficiently model a large-scale system in which the number of stations or connections is generally very high, while the traditional simulation and analytical (e.g., Markov models) approaches cannot perform well due to the high computation complexity. We model the bandwidth management scheme as a queuing network model (QNM) that consists of interacting multiclass queues for different service classes. Closed form expressions for the state and blocking probability distributions are derived for those schemes. Simulation results verify the MEAM numerical results and show that WPSS can significantly improve the network's performance compared to WCPS.
Resumo:
Sentiment analysis has long focused on binary classification of text as either positive or negative. There has been few work on mapping sentiments or emotions into multiple dimensions. This paper studies a Bayesian modeling approach to multi-class sentiment classification and multidimensional sentiment distributions prediction. It proposes effective mechanisms to incorporate supervised information such as labeled feature constraints and document-level sentiment distributions derived from the training data into model learning. We have evaluated our approach on the datasets collected from the confession section of the Experience Project website where people share their life experiences and personal stories. Our results show that using the latent representation of the training documents derived from our approach as features to build a maximum entropy classifier outperforms other approaches on multi-class sentiment classification. In the more difficult task of multi-dimensional sentiment distributions prediction, our approach gives superior performance compared to a few competitive baselines. © 2012 ACM.
Resumo:
Forests play a pivotal role in timber production, maintenance and development of biodiversity and in carbon sequestration and storage in the context of the Kyoto Protocol. Policy makers and forest experts therefore require reliable information on forest extent, type and change for management, planning and modeling purposes. It is becoming increasingly clear that such forest information is frequently inconsistent and unharmonised between countries and continents. This research paper presents a forest information portal that has been developed in line with the GEOSS and INSPIRE frameworks. The web portal provides access to forest resources data at a variety of spatial scales, from global through to regional and local, as well as providing analytical capabilities for monitoring and validating forest change. The system also allows for the utilisation of forest data and processing services within other thematic areas. The web portal has been developed using open standards to facilitate accessibility, interoperability and data transfer.
Resumo:
The number of interoperable research infrastructures has increased significantly with the growing awareness of the efforts made by the Global Earth Observation System of Systems (GEOSS). One of the Societal Benefit Areas (SBA) that is benefiting most from GEOSS is biodiversity, given the costs of monitoring the environment and managing complex information, from space observations to species records including their genetic characteristics. But GEOSS goes beyond simple data sharing to encourage the publishing and combination of models, an approach which can ease the handling of complex multi-disciplinary questions. It is the purpose of this paper to illustrate these concepts by presenting eHabitat, a basic Web Processing Service (WPS) for computing the likelihood of finding ecosystems with equal properties to those specified by a user. When chained with other services providing data on climate change, eHabitat can be used for ecological forecasting and becomes a useful tool for decision-makers assessing different strategies when selecting new areas to protect. eHabitat can use virtually any kind of thematic data that can be considered as useful when defining ecosystems and their future persistence under different climatic or development scenarios. The paper will present the architecture and illustrate the concepts through case studies which forecast the impact of climate change on protected areas or on the ecological niche of an African bird.
Resumo:
Neuroimaging studies have consistently shown that working memory (WM) tasks engage a distributed neural network that primarily includes the dorsolateral prefrontal cortex, the parietal cortex, and the anterior cingulate cortex. The current challenge is to provide a mechanistic account of the changes observed in regional activity. To achieve this, we characterized neuroplastic responses in effective connectivity between these regions at increasing WM loads using dynamic causal modeling of functional magnetic resonance imaging data obtained from healthy individuals during a verbal n-back task. Our data demonstrate that increasing memory load was associated with (a) right-hemisphere dominance, (b) increasing forward (i.e., posterior to anterior) effective connectivity within the WM network, and (c) reduction in individual variability in WM network architecture resulting in the right-hemisphere forward model reaching an exceedance probability of 99% in the most demanding condition. Our results provide direct empirical support that task difficulty, in our case WM load, is a significant moderator of short-term plasticity, complementing existing theories of task-related reduction in variability in neural networks. Hum Brain Mapp, 2013. © 2013 Wiley Periodicals, Inc.
Resumo:
IEEE 802.15.4 standard has been recently developed for low power wireless personal area networks. It can find many applications for smart grid, such as data collection, monitoring and control functions. The performance of 802.15.4 networks has been widely studied in the literature. However the main focus has been on the modeling throughput performance with frame collisions. In this paper we propose an analytic model which can model the impact of frame collisions as well as frame corruptions due to channel bit errors. With this model the frame length can be carefully selected to improve system performance. The analytic model can also be used to study the 802.15.4 networks with interference from other co-located networks, such as IEEE 802.11 and Bluetooth networks. © 2011 Springer-Verlag.