899 resultados para ree software environment for statistical computing and graphics R
Resumo:
The pipeline for macro- and microarray analyses (PMmA) is a set of scripts with a web interface developed to analyze DNA array data generated by array image quantification software. PMmA is designed for use with single- or double-color array data and to work as a pipeline in five classes (data format, normalization, data analysis, clustering, and array maps). It can also be used as a plugin in the BioArray Software Environment, an open-source database for array analysis, or used in a local version of the web service. All scripts in PMmA were developed in the PERL programming language and statistical analysis functions were implemented in the R statistical language. Consequently, our package is a platform-independent software. Our algorithms can correctly select almost 90% of the differentially expressed genes, showing a superior performance compared to other methods of analysis. The pipeline software has been applied to 1536 expressed sequence tags macroarray public data of sugarcane exposed to cold for 3 to 48 h. PMmA identified thirty cold-responsive genes previously unidentified in this public dataset. Fourteen genes were up-regulated, two had a variable expression and the other fourteen were down-regulated in the treatments. These new findings certainly were a consequence of using a superior statistical analysis approach, since the original study did not take into account the dependence of data variability on the average signal intensity of each gene. The web interface, supplementary information, and the package source code are available, free, to non-commercial users at http://ipe.cbmeg.unicamp.br/pub/PMmA.
Resumo:
Land use is a crucial link between human activities and the natural environment and one of the main driving forces of global environmental change. Large parts of the terrestrial land surface are used for agriculture, forestry, settlements and infrastructure. Given the importance of land use, it is essential to understand the multitude of influential factors and resulting land use patterns. An essential methodology to study and quantify such interactions is provided by the adoption of land-use models. By the application of land-use models, it is possible to analyze the complex structure of linkages and feedbacks and to also determine the relevance of driving forces. Modeling land use and land use changes has a long-term tradition. In particular on the regional scale, a variety of models for different regions and research questions has been created. Modeling capabilities grow with steady advances in computer technology, which on the one hand are driven by increasing computing power on the other hand by new methods in software development, e.g. object- and component-oriented architectures. In this thesis, SITE (Simulation of Terrestrial Environments), a novel framework for integrated regional sland-use modeling, will be introduced and discussed. Particular features of SITE are the notably extended capability to integrate models and the strict separation of application and implementation. These features enable efficient development, test and usage of integrated land-use models. On its system side, SITE provides generic data structures (grid, grid cells, attributes etc.) and takes over the responsibility for their administration. By means of a scripting language (Python) that has been extended by language features specific for land-use modeling, these data structures can be utilized and manipulated by modeling applications. The scripting language interpreter is embedded in SITE. The integration of sub models can be achieved via the scripting language or by usage of a generic interface provided by SITE. Furthermore, functionalities important for land-use modeling like model calibration, model tests and analysis support of simulation results have been integrated into the generic framework. During the implementation of SITE, specific emphasis was laid on expandability, maintainability and usability. Along with the modeling framework a land use model for the analysis of the stability of tropical rainforest margins was developed in the context of the collaborative research project STORMA (SFB 552). In a research area in Central Sulawesi, Indonesia, socio-environmental impacts of land-use changes were examined. SITE was used to simulate land-use dynamics in the historical period of 1981 to 2002. Analogous to that, a scenario that did not consider migration in the population dynamics, was analyzed. For the calculation of crop yields and trace gas emissions, the DAYCENT agro-ecosystem model was integrated. In this case study, it could be shown that land-use changes in the Indonesian research area could mainly be characterized by the expansion of agricultural areas at the expense of natural forest. For this reason, the situation had to be interpreted as unsustainable even though increased agricultural use implied economic improvements and higher farmers' incomes. Due to the importance of model calibration, it was explicitly addressed in the SITE architecture through the introduction of a specific component. The calibration functionality can be used by all SITE applications and enables largely automated model calibration. Calibration in SITE is understood as a process that finds an optimal or at least adequate solution for a set of arbitrarily selectable model parameters with respect to an objective function. In SITE, an objective function typically is a map comparison algorithm capable of comparing a simulation result to a reference map. Several map optimization and map comparison methodologies are available and can be combined. The STORMA land-use model was calibrated using a genetic algorithm for optimization and the figure of merit map comparison measure as objective function. The time period for the calibration ranged from 1981 to 2002. For this period, respective reference land-use maps were compiled. It could be shown, that an efficient automated model calibration with SITE is possible. Nevertheless, the selection of the calibration parameters required detailed knowledge about the underlying land-use model and cannot be automated. In another case study decreases in crop yields and resulting losses in income from coffee cultivation were analyzed and quantified under the assumption of four different deforestation scenarios. For this task, an empirical model, describing the dependence of bee pollination and resulting coffee fruit set from the distance to the closest natural forest, was integrated. Land-use simulations showed, that depending on the magnitude and location of ongoing forest conversion, pollination services are expected to decline continuously. This results in a reduction of coffee yields of up to 18% and a loss of net revenues per hectare of up to 14%. However, the study also showed that ecological and economic values can be preserved if patches of natural vegetation are conservated in the agricultural landscape. -----------------------------------------------------------------------
Resumo:
Europe's widely distributed climate modelling expertise, now organized in the European Network for Earth System Modelling (ENES), is both a strength and a challenge. Recognizing this, the European Union's Program for Integrated Earth System Modelling (PRISM) infrastructure project aims at designing a flexible and friendly user environment to assemble, run and post-process Earth System models. PRISM was started in December 2001 with a duration of three years. This paper presents the major stages of PRISM, including: (1) the definition and promotion of scientific and technical standards to increase component modularity; (2) the development of an end-to-end software environment (graphical user interface, coupling and I/O system, diagnostics, visualization) to launch, monitor and analyse complex Earth system models built around state-of-art community component models (atmosphere, ocean, atmospheric chemistry, ocean bio-chemistry, sea-ice, land-surface); and (3) testing and quality standards to ensure high-performance computing performance on a variety of platforms. PRISM is emerging as a core strategic software infrastructure for building the European research area in Earth system sciences. Copyright (c) 2005 John Wiley & Sons, Ltd.
Resumo:
In the past decade, a number of mechanistic, dynamic simulation models of several components of the dairy production system have become available. However their use has been limited due to the detailed technical knowledge and special software required to run them, and the lack of compatibility between models in predicting various metabolic processes in the animal. The first objective of the current study was to integrate the dynamic models of [Brit. J. Nutr. 72 (1994) 679] on rumen function, [J. Anim. Sci. 79 (2001) 1584] on methane production, [J. Anim. Sci. 80 (2002) 2481 on N partition, and a new model of P partition. The second objective was to construct a decision support system to analyse nutrient partition between animal and environment. The integrated model combines key environmental pollutants such as N, P and methane within a nutrient-based feed evaluation system. The model was run under different scenarios and the sensitivity of various parameters analysed. A comparison of predictions from the integrated model with the original simulation models showed an improvement in N excretion since the integrated model uses the dynamic model of [Brit. J. Nutr. 72 (1994) 6791 to predict microbial N, which was not represented in detail in the original model. The integrated model can be used to investigate the degree to which production and environmental objectives are antagonistic, and it may help to explain and understand the complex mechanisms involved at the ruminal and metabolic levels. A part of the integrated model outputs were the forms of N and P in excreta and methane, which can be used as indices of environmental pollution. (C) 2004 Elsevier B.V All rights reserved.
Resumo:
Ranald Roderick Macdonald (1945-2007) was an important contributor to mathematical psychology in the UK, as a referee and action editor for British Journal of Mathematical and Statistical Psychology and as a participant and organizer at the British Psychological Society's Mathematics, statistics and computing section meetings. This appreciation argues that his most important contribution was to the foundations of significance testing, where his concern about what information was relevant in interpreting the results of significance tests led him to be a persuasive advocate for the 'Weak Fisherian' form of hypothesis testing.
The Impact of office productivity cloud computing on energy consumption and greenhouse gas emissions
Resumo:
Cloud computing is usually regarded as being energy efficient and thus emitting less greenhouse gases (GHG) than traditional forms of computing. When the energy consumption of Microsoft’s cloud computing Office 365 (O365) and traditional Office 2010 (O2010) software suites were tested and modeled, some cloud services were found to consume more energy than the traditional form. The developed model in this research took into consideration the energy consumption at the three main stages of data transmission; data center, network, and end user device. Comparable products from each suite were selected and activities were defined for each product to represent a different computing type. Microsoft provided highly confidential data for the data center stage, while the networking and user device stages were measured directly. A new measurement and software apportionment approach was defined and utilized allowing the power consumption of cloud services to be directly measured for the user device stage. Results indicated that cloud computing is more energy efficient for Excel and Outlook which consumed less energy and emitted less GHG than the standalone counterpart. The power consumption of the cloud based Outlook (8%) and Excel (17%) was lower than their traditional counterparts. However, the power consumption of the cloud version of Word was 17% higher than its traditional equivalent. A third mixed access method was also measured for Word which emitted 5% more GHG than the traditional version. It is evident that cloud computing may not provide a unified way forward to reduce energy consumption and GHG. Direct conversion from the standalone package into the cloud provision platform can now consider energy and GHG emissions at the software development and cloud service design stage using the methods described in this research.
Resumo:
Learning from anywhere anytime is a contemporary phenomenon in the field of education that is thought to be flexible, time and cost saving. The phenomenon is evident in the way computer technology mediates knowledge processes among learners. Computer technology is however, in some instances, faulted. There are studies that highlight drawbacks of computer technology use in learning. In this study we aimed at conducting a SWOT analysis on ubiquitous computing and computer-mediated social interaction and their affect on education. Students and teachers were interviewed on the mentioned concepts using focus group interviews. Our contribution in this study is, identifying what teachers and students perceive to be the strength, weaknesses, opportunities and threats of ubiquitous computing and computer-mediated social interaction in education. We also relate the findings with literature and present a common understanding on the SWOT of these concepts. Results show positive perceptions. Respondents revealed that ubiquitous computing and computer-mediated social interaction are important in their education due to advantages such as flexibility, efficiency in terms of cost and time, ability to acquire computer skills. Nevertheless disadvantages where also mentioned for example health effects, privacy and security issues, noise in the learning environment, to mention but a few. This paper gives suggestions on how to overcome threats mentioned.
Resumo:
The computers and network services became presence guaranteed in several places. These characteristics resulted in the growth of illicit events and therefore the computers and networks security has become an essential point in any computing environment. Many methodologies were created to identify these events; however, with increasing of users and services on the Internet, many difficulties are found in trying to monitor a large network environment. This paper proposes a methodology for events detection in large-scale networks. The proposal approaches the anomaly detection using the NetFlow protocol, statistical methods and monitoring the environment in a best time for the application. © 2010 Springer-Verlag Berlin Heidelberg.
Resumo:
The objective of this study was to evaluate the effect of genotype by environment interaction (GEI) on the weight of Tabapuã cattle at 240 (W240), 365 (W365) and 450 (W450) days of age. In total, 35,732 records of 8,458 Tabapuã animalswhich were born in the state of Bahia, Brazil, from 1975 to 2001, from 167 sires and 3,707 dams, were used. Two birth seasons were tested as for the environment effect: the dry (D) and rainy (R) ones. The covariance components were obtainedby a multiple-trait analysis using Bayesian inference, in which each trait was considered as being different in each season. Covariance components were estimated by software gibbs2f90. As for W240, the model was comprised of contemporary groups and cow age (in classes) as fixed effects; animal and maternal genetic additive, maternal permanent environmental and residual were considered as random effects. Concerning W365 and W450, the model included only the contemporary aged cow groups as fixed effects and the genetic additive and residual effects of the animal as the random ones. The GEI was assessed considering the genetic correlation, in which values below 0.80 indicated the presence of GEI. Regarding W365 and W450, the GEI was found in both seasons. As for post-weaning weight (W240), the effect of such interaction was not observed. ©2012 Sociedade Brasileira de Zootecnia.
Resumo:
Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.
Resumo:
In recent years, Deep Learning techniques have shown to perform well on a large variety of problems both in Computer Vision and Natural Language Processing, reaching and often surpassing the state of the art on many tasks. The rise of deep learning is also revolutionizing the entire field of Machine Learning and Pattern Recognition pushing forward the concepts of automatic feature extraction and unsupervised learning in general. However, despite the strong success both in science and business, deep learning has its own limitations. It is often questioned if such techniques are only some kind of brute-force statistical approaches and if they can only work in the context of High Performance Computing with tons of data. Another important question is whether they are really biologically inspired, as claimed in certain cases, and if they can scale well in terms of "intelligence". The dissertation is focused on trying to answer these key questions in the context of Computer Vision and, in particular, Object Recognition, a task that has been heavily revolutionized by recent advances in the field. Practically speaking, these answers are based on an exhaustive comparison between two, very different, deep learning techniques on the aforementioned task: Convolutional Neural Network (CNN) and Hierarchical Temporal memory (HTM). They stand for two different approaches and points of view within the big hat of deep learning and are the best choices to understand and point out strengths and weaknesses of each of them. CNN is considered one of the most classic and powerful supervised methods used today in machine learning and pattern recognition, especially in object recognition. CNNs are well received and accepted by the scientific community and are already deployed in large corporation like Google and Facebook for solving face recognition and image auto-tagging problems. HTM, on the other hand, is known as a new emerging paradigm and a new meanly-unsupervised method, that is more biologically inspired. It tries to gain more insights from the computational neuroscience community in order to incorporate concepts like time, context and attention during the learning process which are typical of the human brain. In the end, the thesis is supposed to prove that in certain cases, with a lower quantity of data, HTM can outperform CNN.
Resumo:
We present a framework for statistical finite element analysis combining shape and material properties, and allowing performing statistical statements of biomechanical performance across a given population. In this paper, we focus on the design of orthopaedic implants that fit a maximum percentage of the target population, both in terms of geometry and biomechanical stability. CT scans of the bone under consideration are registered non-rigidly to obtain correspondences in position and intensity between them. A statistical model of shape and intensity (bone density) is computed by means of principal component analysis. Afterwards, finite element analysis (FEA) is performed to analyse the biomechanical performance of the bones. Realistic forces are applied on the bones and the resulting displacement and bone stress distribution are calculated. The mechanical behaviour of different PCA bone instances is compared.
Resumo:
Stata is a general purpose software package that has become popular among various disciplines such as epidemiology, economics, or social sciences. Users like Stata for its scientific approach, its robustness and reliability, and the ease with which its functionality can be extended by user written programs. In this talk I will first give a brief overview of the functionality of Stata and then discuss two specific features: survey estimation and predictive margins/marginal effects. Most surveys are based on complex samples that contain multiple sampling stages, are stratified or clustered, and feature unequal selection probabilities. Standard estimators can produce misleading results in such samples unless the peculiarities of the sampling plan are taken into account. Stata offers survey statistics for complex samples for a wide variety of estimators and supports several variance estimation procedures such as linearization, jackknife, and balanced repeated replication (see Kreuter and Valliant, 2007, Stata Journal 7: 1-21). In the talk I will illustrate these features using applied examples and I will also show how user written commands can be adapted to support complex samples. Complex can also be the models we fit to our data, making it difficult to interpret them, especially in case of nonlinear or non-additive models (Mood, 2010, European Sociological Review 26: 67-82). Stata provides a number of highly useful commands to make results of such models accessible by computing and displaying predictive margins and marginal effects. In my talk I will discuss these commands provide various examples demonstrating their use.
Resumo:
Spinal image analysis and computer assisted intervention have emerged as new and independent research areas, due to the importance of treatment of spinal diseases, increasing availability of spinal imaging, and advances in analytics and navigation tools. Among others, multiple modality spinal image analysis and spinal navigation tools have emerged as two keys in this new area. We believe that further focused research in these two areas will lead to a much more efficient and accelerated research path, avoiding detours that exist in other applications, such as in brain and heart.