896 resultados para computation- and data-intensive applications
Resumo:
Knowledge-elicitation is a common technique used to produce rules about the operation of a plant from the knowledge that is available from human expertise. Similarly, data-mining is becoming a popular technique to extract rules from the data available from the operation of a plant. In the work reported here knowledge was required to enable the supervisory control of an aluminium hot strip mill by the determination of mill set-points. A method was developed to fuse knowledge-elicitation and data-mining to incorporate the best aspects of each technique, whilst avoiding known problems. Utilisation of the knowledge was through an expert system, which determined schedules of set-points and provided information to human operators. The results show that the method proposed in this paper was effective in producing rules for the on-line control of a complex industrial process.
Resumo:
We propose a bridge between two important parallel programming paradigms: data parallelism and communicating sequential processes (CSP). Data parallel pipelined architectures obtained with the Alpha language can be embedded in a control intensive application expressed in CSP-based Handel formalism. The interface is formally defined from the semantics of the languages Alpha and Handel. This work will ease the design of compute intensive applications on FPGAs.
Resumo:
It is well known that there is a dynamic relationship between cerebral blood flow (CBF) and cerebral blood volume (CBV). With increasing applications of functional MRI, where the blood oxygen-level-dependent signals are recorded, the understanding and accurate modeling of the hemodynamic relationship between CBF and CBV becomes increasingly important. This study presents an empirical and data-based modeling framework for model identification from CBF and CBV experimental data. It is shown that the relationship between the changes in CBF and CBV can be described using a parsimonious autoregressive with exogenous input model structure. It is observed that neither the ordinary least-squares (LS) method nor the classical total least-squares (TLS) method can produce accurate estimates from the original noisy CBF and CBV data. A regularized total least-squares (RTLS) method is thus introduced and extended to solve such an error-in-the-variables problem. Quantitative results show that the RTLS method works very well on the noisy CBF and CBV data. Finally, a combination of RTLS with a filtering method can lead to a parsimonious but very effective model that can characterize the relationship between the changes in CBF and CBV.
Resumo:
In this article, we review the state-of-the-art techniques in mining data streams for mobile and ubiquitous environments. We start the review with a concise background of data stream processing, presenting the building blocks for mining data streams. In a wide range of applications, data streams are required to be processed on small ubiquitous devices like smartphones and sensor devices. Mobile and ubiquitous data mining target these applications with tailored techniques and approaches addressing scarcity of resources and mobility issues. Two categories can be identified for mobile and ubiquitous mining of streaming data: single-node and distributed. This survey will cover both categories. Mining mobile and ubiquitous data require algorithms with the ability to monitor and adapt the working conditions to the available computational resources. We identify the key characteristics of these algorithms and present illustrative applications. Distributed data stream mining in the mobile environment is then discussed, presenting the Pocket Data Mining framework. Mobility of users stimulates the adoption of context-awareness in this area of research. Context-awareness and collaboration are discussed in the Collaborative Data Stream Mining, where agents share knowledge to learn adaptive accurate models.
Resumo:
We present an overview of the MELODIES project, which is developing new data-intensive environmental services based on data from Earth Observation satellites, government databases, national and European agencies and more. We focus here on the capabilities and benefits of the project’s “technical platform”, which applies cloud computing and Linked Data technologies to enable the development of these services, providing flexibility and scalability.
Resumo:
Site-specific meteorological forcing appropriate for applications such as urban outdoor thermal comfort simulations can be obtained using a newly coupled scheme that combines a simple slab convective boundary layer (CBL) model and urban land surface model (ULSM) (here two ULSMs are considered). The former simulates daytime CBL height, air temperature and humidity, and the latter estimates urban surface energy and water balance fluxes accounting for changes in land surface cover. The coupled models are tested at a suburban site and two rural sites, one irrigated and one unirrigated grass, in Sacramento, U.S.A. All the variables modelled compare well to measurements (e.g. coefficient of determination = 0.97 and root mean square error = 1.5 °C for air temperature). The current version is applicable to daytime conditions and needs initial state conditions for the CBL model in the appropriate range to obtain the required performance. The coupled model allows routine observations from distant sites (e.g. rural, airport) to be used to predict air temperature and relative humidity in an urban area of interest. This simple model, which can be rapidly applied, could provide urban data for applications such as air quality forecasting and building energy modelling, in addition to outdoor thermal comfort.
Resumo:
Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.
Resumo:
In this article, we discuss inferential aspects of the measurement error regression models with null intercepts when the unknown quantity x (latent variable) follows a skew normal distribution. We examine first the maximum-likelihood approach to estimation via the EM algorithm by exploring statistical properties of the model considered. Then, the marginal likelihood, the score function and the observed information matrix of the observed quantities are presented allowing direct inference implementation. In order to discuss some diagnostics techniques in this type of models, we derive the appropriate matrices to assessing the local influence on the parameter estimates under different perturbation schemes. The results and methods developed in this paper are illustrated considering part of a real data set used by Hadgu and Koch [1999, Application of generalized estimating equations to a dental randomized clinical trial. Journal of Biopharmaceutical Statistics, 9, 161-178].
Resumo:
Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.
Resumo:
To evaluate the trans-enamel and trans-dentinal cytotoxic effects of a 35% H2O2 bleaching gel on an odontoblast-like cell lines (MDPC-23) after consecutive applications.Fifteen enamel/dentine discs were obtained from bovine central incisor teeth and placed individually in artificial pulp chambers. Three groups (n = 5 discs) were formed according to the following enamel treatments: G1: 35% H2O2 bleaching gel (15 min); G2: 35% H2O2 bleaching gel (15 min) + halogen light (20 s); G3: control (no treatment). After repeating the treatments three consecutive times, the extracts (culture medium + gel components that had diffused through enamel/dentine discs) in contact with the dentine were collected and applied to previously cultured MDPC-23 cells (50 000 cells cm(-2)) for 24 h. Cell metabolism was evaluated by the MTT assay and data were analysed statistically (alpha = 5%; Kruskal-Wallis and Mann-Whitney U-test). Cell morphology was analysed by scanning electron microscopy.Cell metabolism decreased by 92.03% and 82.47% in G1 and G2 respectively. G1 and G2 differed significantly (P < 0.05) from G3. Regardless of halogen light activation, the application of the bleaching gel on the cultured odontoblast-like cells caused significantly more severe cytotoxic effects than those observed in the nontreated control group. In addition, significant morphological cell alterations were observed in G1 and G2.After three consecutive applications of a 35% H2O2 bleaching agent, the diffusion of the gel components through enamel and dentine caused severe toxic effects to cultured pulp cells.
Resumo:
Detecting misbehavior (such as transmissions of false information) in vehicular ad hoc networks (VANETs) is a very important problem with wide range of implications, including safety related and congestion avoidance applications. We discuss several limitations of existing misbehavior detection schemes (MDS) designed for VANETs. Most MDS are concerned with detection of malicious nodes. In most situations, vehicles would send wrong information because of selfish reasons of their owners, e.g. for gaining access to a particular lane. It is therefore more important to detect false information than to identify misbehaving nodes. We introduce the concept of data-centric misbehavior detection and propose algorithms which detect false alert messages and misbehaving nodes by observing their actions after sending out the alert messages. With the data-centric MDS, each node can decide whether an information received is correct or false. The decision is based on the consistency of recent messages and new alerts with reported and estimated vehicle positions. No voting or majority decisions is needed, making our MDS resilient to Sybil attacks. After misbehavior is detected, we do not revoke all the secret credentials of misbehaving nodes, as done in most schemes. Instead, we impose fines on misbehaving nodes (administered by the certification authority), discouraging them to act selfishly. This reduces the computation and communication costs involved in revoking all the secret credentials of misbehaving nodes. © 2011 IEEE.
Resumo:
Data-intensive Grid applications require huge data transfers between grid computing nodes. These computing nodes, where computing jobs are executed, are usually geographically separated. A grid network that employs optical wavelength division multiplexing (WDM) technology and optical switches to interconnect computing resources with dynamically provisioned multi-gigabit rate bandwidth lightpath is called a Lambda Grid network. A computing task may be executed on any one of several computing nodes which possesses the necessary resources. In order to reflect the reality in job scheduling, allocation of network resources for data transfer should be taken into consideration. However, few scheduling methods consider the communication contention on Lambda Grids. In this paper, we investigate the joint scheduling problem while considering both optical network and computing resources in a Lambda Grid network. The objective of our work is to maximize the total number of jobs that can be scheduled in a Lambda Grid network. An adaptive routing algorithm is proposed and implemented for accomplishing the communication tasks for every job submitted in the network. Four heuristics (FIFO, ESTF, LJF, RS) are implemented for job scheduling of the computational tasks. Simulation results prove the feasibility and efficiency of the proposed solution.
Resumo:
Data-intensive Grid applications require huge data transfers between grid computing nodes. These computing nodes, where computing jobs are executed, are usually geographically separated. A grid network that employs optical wavelength division multiplexing (WDM) technology and optical switches to interconnect computing resources with dynamically provisioned multi-gigabit rate bandwidth lightpath is called a Lambda Grid network. A computing task may be executed on any one of several computing nodes which possesses the necessary resources. In order to reflect the reality in job scheduling, allocation of network resources for data transfer should be taken into consideration. However, few scheduling methods consider the communication contention on Lambda Grids. In this paper, we investigate the joint scheduling problem while considering both optical network and computing resources in a Lambda Grid network. The objective of our work is to maximize the total number of jobs that can be scheduled in a Lambda Grid network. An adaptive routing algorithm is proposed and implemented for accomplishing the communication tasks for every job submitted in the network. Four heuristics (FIFO, ESTF, LJF, RS) are implemented for job scheduling of the computational tasks. Simulation results prove the feasibility and efficiency of the proposed solution.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.
Resumo:
For the first time, we introduce a generalized form of the exponentiated generalized gamma distribution [Cordeiro et al. The exponentiated generalized gamma distribution with application to lifetime data, J. Statist. Comput. Simul. 81 (2011), pp. 827-842.] that is the baseline for the log-exponentiated generalized gamma regression model. The new distribution can accommodate increasing, decreasing, bathtub- and unimodal-shaped hazard functions. A second advantage is that it includes classical distributions reported in the lifetime literature as special cases. We obtain explicit expressions for the moments of the baseline distribution of the new regression model. The proposed model can be applied to censored data since it includes as sub-models several widely known regression models. It therefore can be used more effectively in the analysis of survival data. We obtain maximum likelihood estimates for the model parameters by considering censored data. We show that our extended regression model is very useful by means of two applications to real data.