26 resultados para Data fusion applications
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.
Resumo:
In this article, we introduce an asymmetric extension to the univariate slash-elliptical family of distributions studied in Gomez et al. (2007a). This new family results from a scale mixture between the epsilon-skew-symmetric family of distributions and the uniform distribution. A general expression is presented for the density with special cases such as the normal, Cauchy, Student-t, and Pearson type II distributions. Some special properties and moments are also investigated. Results of two real data sets applications are also reported, illustrating the fact that the family introduced can be useful in practice.
Resumo:
OBJECTIVE: To evaluate tools for the fusion of images generated by tomography and structural and functional magnetic resonance imaging. METHODS: Magnetic resonance and functional magnetic resonance imaging were performed while a volunteer who had previously undergone cranial tomography performed motor and somatosensory tasks in a 3-Tesla scanner. Image data were analyzed with different programs, and the results were compared. RESULTS: We constructed a flow chart of computational processes that allowed measurement of the spatial congruence between the methods. There was no single computational tool that contained the entire set of functions necessary to achieve the goal. CONCLUSION: The fusion of the images from the three methods proved to be feasible with the use of four free-access software programs (OsiriX, Register, MRIcro and FSL). Our results may serve as a basis for building software that will be useful as a virtual tool prior to neurosurgery.
Resumo:
Technical evaluation of analytical data is of extreme relevance considering it can be used for comparisons with environmental quality standards and decision-making as related to the management of disposal of dredged sediments and the evaluation of salt and brackish water quality in accordance with CONAMA 357/05 Resolution. It is, therefore, essential that the project manager discusses the environmental agency`s technical requirements with the laboratory contracted for the follow-up of the analysis underway and even with a view to possible re-analysis when anomalous data are identified. The main technical requirements are: (1) method quantitation limits (QLs) should fall below environmental standards; (2) analyses should be carried out in laboratories whose analytical scope is accredited by the National Institute of Metrology (INMETRO) or qualified or accepted by a licensing agency; (3) chain of custody should be provided in order to ensure sample traceability; (4) control charts should be provided to prove method performance; (5) certified reference material analysis or, if that is not available, matrix spike analysis, should be undertaken and (6) chromatograms should be included in the analytical report. Within this context and with a view to helping environmental managers in analytical report evaluation, this work has as objectives the discussion of the limitations of the application of SW 846 US EPA methods to marine samples, the consequences of having data based on method detection limits (MDL) and not sample quantitation limits (SQL), and present possible modifications of the principal method applied by laboratories in order to comply with environmental quality standards.
Resumo:
In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity
Resumo:
Despite their importance in the evaluation of petroleum and gas reservoirs, measurements of self-potential data under borehole conditions (well-logging) have found only minor applications in aquifer and waste-site characterization. This can be attributed to lower signals from the diffusion fronts in near-surface environments because measurements are made long after the drilling of the well, when concentration fronts are already disappearing. Proportionally higher signals arise from streaming potentials that prevent using simple interpretation models that assume signals from diffusion only. Our laboratory experiments found that dual-source self-potential signals can be described by a simple linear model, and that contributions (from diffusion and streaming potentials) can be isolated by slightly perturbing the borehole conditions. Perturbations are applied either by changing the concentration of the borehole-filling solution or its column height. Parameters useful for formation evaluation can be estimated from data measured during perturbations, namely, pore water resistivity, pressure drop across the borehole wall, and electrokinetic coupling parameter. These are important parameters to assess, respectively, water quality, aquifer lateral continuity, and interfacial properties of permeable formations.
Resumo:
Background: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. Methods: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. Results and conclusions: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.
Resumo:
Each plasma physics laboratory has a proprietary scheme to control and data acquisition system. Usually, it is different from one laboratory to another. It means that each laboratory has its own way to control the experiment and retrieving data from the database. Fusion research relies to a great extent on international collaboration and this private system makes it difficult to follow the work remotely. The TCABR data analysis and acquisition system has been upgraded to support a joint research programme using remote participation technologies. The choice of MDSplus (Model Driven System plus) is proved by the fact that it is widely utilized, and the scientists from different institutions may use the same system in different experiments in different tokamaks without the need to know how each system treats its acquisition system and data analysis. Another important point is the fact that the MDSplus has a library system that allows communication between different types of language (JAVA, Fortran, C, C++, Python) and programs such as MATLAB, IDL, OCTAVE. In the case of tokamak TCABR interfaces (object of this paper) between the system already in use and MDSplus were developed, instead of using the MDSplus at all stages, from the control, and data acquisition to the data analysis. This was done in the way to preserve a complex system already in operation and otherwise it would take a long time to migrate. This implementation also allows add new components using the MDSplus fully at all stages. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Resumo:
Turbulence is one of the key problems of classical physics, and it has been the object of intense research in the last decades in a large spectrum of problems involving fluids, plasmas, and waves. In order to review some advances in theoretical and experimental investigations on turbulence a mini-symposium on this subject was organized in the Dynamics Days South America 2010 Conference. The main goal of this mini-symposium was to present recent developments in both fundamental aspects and dynamical analysis of turbulence in nonlinear waves and fusion plasmas. In this paper we present a summary of the works presented at this mini-symposium. Among the questions to be addressed were the onset and control of turbulence and spatio-temporal chaos. (C) 2011 Elsevier B. V. All rights reserved.
Resumo:
The objective of this study was to estimate the prevalence of inadequate micronutrient intake and excess sodium intake among adults age 19 years and older in the city of Sao Paulo, Brazil. Twenty-four hour dietary recall and sociodemographic data were collected from each participant (n=1,663) in a cross-sectional study, Inquiry of Health of Sao Paulo, of a representative sample of the adult population of the city of Sao Paulo in 2003 (ISA-2003). The variability in intake was measured through two replications of the 24-hour recall in a subsample of this population in 2007 (ISA-2007). Usual intake was estimated by the PC-SIDE program (version 1.0, 2003, Department of Statistics, Iowa State University), which uses an approach developed by Iowa State University. The prevalence of nutrient inadequacy was calculated using the Estimated Average Requirement cut-point method for vitamins A and C, thiamin, riboflavin, niacin, copper, phosphorus, and selenium. For vitamin D, pantothenic acid, manganese, and sodium, the proportion of individuals with usual intake equal to or more than the Adequate Intake value was calculated. The percentage of individuals with intake equal to more than the Tolerable Upper Intake Level was calculated for sodium. The highest prevalence of inadequacy for males and females, respectively, occurred for vitamin A (67% and 58%), vitamin C (52% and 62%), thiamin (41% and 50%), and riboflavin (29% and 19%). The adjustment for the within-person variation presented lower prevalence of inadequacy due to removal of within-person variability. All adult residents of Sao Paulo had excess sodium intake, and the rates of nutrient inadequacy were high for certain key micronutrients. J Acad Nutr Diet. 2012;112:1614-1618.
Resumo:
For the first time, we introduce a generalized form of the exponentiated generalized gamma distribution [Cordeiro et al. The exponentiated generalized gamma distribution with application to lifetime data, J. Statist. Comput. Simul. 81 (2011), pp. 827-842.] that is the baseline for the log-exponentiated generalized gamma regression model. The new distribution can accommodate increasing, decreasing, bathtub- and unimodal-shaped hazard functions. A second advantage is that it includes classical distributions reported in the lifetime literature as special cases. We obtain explicit expressions for the moments of the baseline distribution of the new regression model. The proposed model can be applied to censored data since it includes as sub-models several widely known regression models. It therefore can be used more effectively in the analysis of survival data. We obtain maximum likelihood estimates for the model parameters by considering censored data. We show that our extended regression model is very useful by means of two applications to real data.
Resumo:
We study a five-parameter lifetime distribution called the McDonald extended exponential model to generalize the exponential, generalized exponential, Kumaraswamy exponential and beta exponential distributions, among others. We obtain explicit expressions for the moments and incomplete moments, quantile and generating functions, mean deviations, Bonferroni and Lorenz curves and Gini concentration index. The method of maximum likelihood and a Bayesian procedure are adopted for estimating the model parameters. The applicability of the new model is illustrated by means of a real data set.
Resumo:
This paper reports experiments on the use of a recently introduced advection bounded upwinding scheme, namely TOPUS (Computers & Fluids 57 (2012) 208-224), for flows of practical interest. The numerical results are compared against analytical, numerical and experimental data and show good agreement with them. It is concluded that the TOPUS scheme is a competent, powerful and generic scheme for complex flow phenomena.