897 resultados para Very large scale integration
Resumo:
Background: The large-scale production of G-protein coupled receptors (GPCRs) for functional and structural studies remains a challenge. Recent successes have been made in the expression of a range of GPCRs using Pichia pastoris as an expression host. P. pastoris has a number of advantages over other expression systems including ability to post-translationally modify expressed proteins, relative low cost for production and ability to grow to very high cell densities. Several previous studies have described the expression of GPCRs in P. pastoris using shaker flasks, which allow culturing of small volumes (500 ml) with moderate cell densities (OD600 similar to 15). The use of bioreactors, which allow straightforward culturing of large volumes, together with optimal control of growth parameters including pH and dissolved oxygen to maximise cell densities and expression of the target receptors, are an attractive alternative. The aim of this study was to compare the levels of expression of the human Adenosine 2A receptor (A(2A)R) in P. pastoris under control of a methanol-inducible promoter in both flask and bioreactor cultures. Results: Bioreactor cultures yielded an approximately five times increase in cell density (OD600 similar to 75) compared to flask cultures prior to induction and a doubling in functional expression level per mg of membrane protein, representing a significant optimisation. Furthermore, analysis of a C-terminally truncated A2AR, terminating at residue V334 yielded the highest levels (200 pmol/mg) so far reported for expression of this receptor in P. pastoris. This truncated form of the receptor was also revealed to be resistant to C-terminal degradation in contrast to the WT A(2A)R, and therefore more suitable for further functional and structural studies. Conclusion: Large-scale expression of the A(2A)R in P. pastoris bioreactor cultures results in significant increases in functional expression compared to traditional flask cultures.
Resumo:
Large magnitude explosive eruptions are the result of the rapid and large-scale transport of silicic magma stored in the Earth's crust, but the mechanics of erupting teratonnes of silicic magma remain poorly understood. Here, we demonstrate that the combined effect of local crustal extension and magma chamber overpressure can sustain linear dyke-fed explosive eruptions with mass fluxes in excess of 10^10 kg/s from shallow-seated (4–6 km depth) chambers during moderate extensional stresses. Early eruption column collapse is facilitated with eruption duration of the order of few days with an intensity of at least one order of magnitude greater than the largest eruptions in the 20th century. The conditions explored in this study are one way in which high mass eruption rates can be achieved to feed large explosive eruptions. Our results corroborate geological and volcanological evidences from volcano-tectonic complexes such as the Sierra Madre Occidental (Mexico) and the Taupo Volcanic Zone (New Zealand).
Resumo:
The high complexity of cloud parameterizations now held in models puts more pressure on observational studies to provide useful means to evaluate them. One approach to the problem put forth in the modelling community is to evaluate under what atmospheric conditions the parameterizations fail to simulate the cloud properties and under what conditions they do a good job. It is the ambition of this paper to characterize the variability of the statistical properties of tropical ice clouds in different tropical "regimes" recently identified in the literature to aid the development of better process-oriented parameterizations in models. For this purpose, the statistical properties of non-precipitating tropical ice clouds over Darwin, Australia are characterized using ground-based radar-lidar observations from the Atmospheric Radiation Measurement (ARM) Program. The ice cloud properties analysed are the frequency of ice cloud occurrence, the morphological properties (cloud top height and thickness), and the microphysical and radiative properties (ice water content, visible extinction, effective radius, and total concentration). The variability of these tropical ice cloud properties is then studied as a function of the large-scale cloud regimes derived from the International Satellite Cloud Climatology Project (ISCCP), the amplitude and phase of the Madden-Julian Oscillation (MJO), and the large-scale atmospheric regime as derived from a long-term record of radiosonde observations over Darwin. The vertical variability of ice cloud occurrence and microphysical properties is largest in all regimes (1.5 order of magnitude for ice water content and extinction, a factor 3 in effective radius, and three orders of magnitude in concentration, typically). 98 % of ice clouds in our dataset are characterized by either a small cloud fraction (smaller than 0.3) or a very large cloud fraction (larger than 0.9). In the ice part of the troposphere three distinct layers characterized by different statistically-dominant microphysical processes are identified. The variability of the ice cloud properties as a function of the large-scale atmospheric regime, cloud regime, and MJO phase is large, producing mean differences of up to a factor 8 in the frequency of ice cloud occurrence between large-scale atmospheric regimes and mean differences of a factor 2 typically in all microphysical properties. Finally, the diurnal cycle of the frequency of occurrence of ice clouds is also very different between regimes and MJO phases, with diurnal amplitudes of the vertically-integrated frequency of ice cloud occurrence ranging from as low as 0.2 (weak diurnal amplitude) to values in excess of 2.0 (very large diurnal amplitude). Modellers should now use these results to check if their model cloud parameterizations are capable of translating a given atmospheric forcing into the correct statistical ice cloud properties.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
High-resolution simulations over a large tropical domain (∼20◦S–20◦N and 42◦E–180◦E) using both explicit and parameterized convection are analyzed and compared to observations during a 10-day case study of an active Madden-Julian Oscillation (MJO) event. The parameterized convection model simulations at both 40 km and 12 km grid spacing have a very weak MJO signal and little eastward propagation. A 4 km explicit convection simulation using Smagorinsky subgrid mixing in the vertical and horizontal dimensions exhibits the best MJO strength and propagation speed. 12 km explicit convection simulations also perform much better than the 12 km parameterized convection run, suggesting that the convection scheme, rather than horizontal resolution, is key for these MJO simulations. Interestingly, a 4 km explicit convection simulation using the conventional boundary layer scheme for vertical subgrid mixing (but still using Smagorinsky horizontal mixing) completely loses the large-scale MJO organization, showing that relatively high resolution with explicit convection does not guarantee a good MJO simulation. Models with a good MJO representation have a more realistic relationship between lower-free-tropospheric moisture and precipitation, supporting the idea that moisture-convection feedback is a key process for MJO propagation. There is also increased generation of available potential energy and conversion of that energy into kinetic energy in models with a more realistic MJO, which is related to larger zonal variance in convective heating and vertical velocity, larger zonal temperature variance around 200 hPa, and larger correlations between temperature and ascent (and between temperature and diabatic heating) between 500–400 hPa.
Large-scale atmospheric dynamics of the wet winter 2009–2010 and its impact on hydrology in Portugal
Resumo:
The anomalously wet winter of 2010 had a very important impact on the Portuguese hydrological system. Owing to the detrimental effects of reduced precipitation in Portugal on the environmental and socio-economic systems, the 2010 winter was predominantly beneficial by reversing the accumulated precipitation deficits during the previous hydrological years. The recorded anomalously high precipitation amounts have contributed to an overall increase in river runoffs and dam recharges in the 4 major river basins. In synoptic terms, the winter 2010 was characterised by an anomalously strong westerly flow component over the North Atlantic that triggered high precipitation amounts. A dynamically coherent enhancement in the frequencies of mid-latitude cyclones close to Portugal, also accompanied by significant increases in the occurrence of cyclonic, south and south-westerly circulation weather types, are noteworthy. Furthermore, the prevalence of the strong negative phase of the North Atlantic Oscillation (NAO) also emphasises the main dynamical features of the 2010 winter. A comparison of the hydrological and atmospheric conditions between the 2010 winter and the previous 2 anomalously wet winters (1996 and 2001) was also carried out to isolate not only their similarities, but also their contrasting conditions, highlighting the limitations of estimating winter precipitation amounts in Portugal using solely the NAO phase as a predictor.
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
Variational data assimilation is commonly used in environmental forecasting to estimate the current state of the system from a model forecast and observational data. The assimilation problem can be written simply in the form of a nonlinear least squares optimization problem. However the practical solution of the problem in large systems requires many careful choices to be made in the implementation. In this article we present the theory of variational data assimilation and then discuss in detail how it is implemented in practice. Current solutions and open questions are discussed.
Resumo:
Running hydrodynamic models interactively allows both visual exploration and change of model state during simulation. One of the main characteristics of an interactive model is that it should provide immediate feedback to the user, for example respond to changes in model state or view settings. For this reason, such features are usually only available for models with a relatively small number of computational cells, which are used mainly for demonstration and educational purposes. It would be useful if interactive modeling would also work for models typically used in consultancy projects involving large scale simulations. This results in a number of technical challenges related to the combination of the model itself and the visualisation tools (scalability, implementation of an appropriate API for control and access to the internal state). While model parallelisation is increasingly addressed by the environmental modeling community, little effort has been spent on developing a high-performance interactive environment. What can we learn from other high-end visualisation domains such as 3D animation, gaming, virtual globes (Autodesk 3ds Max, Second Life, Google Earth) that also focus on efficient interaction with 3D environments? In these domains high efficiency is usually achieved by the use of computer graphics algorithms such as surface simplification depending on current view, distance to objects, and efficient caching of the aggregated representation of object meshes. We investigate how these algorithms can be re-used in the context of interactive hydrodynamic modeling without significant changes to the model code and allowing model operation on both multi-core CPU personal computers and high-performance computer clusters.
Resumo:
Coupled-cluster theory provides one of the most successful concepts in electronic-structure theory. This work covers the parallelization of coupled-cluster energies, gradients, and second derivatives and its application to selected large-scale chemical problems, beside the more practical aspects such as the publication and support of the quantum-chemistry package ACES II MAB and the design and development of a computational environment optimized for coupled-cluster calculations. The main objective of this thesis was to extend the range of applicability of coupled-cluster models to larger molecular systems and their properties and therefore to bring large-scale coupled-cluster calculations into day-to-day routine of computational chemistry. A straightforward strategy for the parallelization of CCSD and CCSD(T) energies, gradients, and second derivatives has been outlined and implemented for closed-shell and open-shell references. Starting from the highly efficient serial implementation of the ACES II MAB computer code an adaptation for affordable workstation clusters has been obtained by parallelizing the most time-consuming steps of the algorithms. Benchmark calculations for systems with up to 1300 basis functions and the presented applications show that the resulting algorithm for energies, gradients and second derivatives at the CCSD and CCSD(T) level of theory exhibits good scaling with the number of processors and substantially extends the range of applicability. Within the framework of the ’High accuracy Extrapolated Ab initio Thermochemistry’ (HEAT) protocols effects of increased basis-set size and higher excitations in the coupled- cluster expansion were investigated. The HEAT scheme was generalized for molecules containing second-row atoms in the case of vinyl chloride. This allowed the different experimental reported values to be discriminated. In the case of the benzene molecule it was shown that even for molecules of this size chemical accuracy can be achieved. Near-quantitative agreement with experiment (about 2 ppm deviation) for the prediction of fluorine-19 nuclear magnetic shielding constants can be achieved by employing the CCSD(T) model together with large basis sets at accurate equilibrium geometries if vibrational averaging and temperature corrections via second-order vibrational perturbation theory are considered. Applying a very similar level of theory for the calculation of the carbon-13 NMR chemical shifts of benzene resulted in quantitative agreement with experimental gas-phase data. The NMR chemical shift study for the bridgehead 1-adamantyl cation at the CCSD(T) level resolved earlier discrepancies of lower-level theoretical treatment. The equilibrium structure of diacetylene has been determined based on the combination of experimental rotational constants of thirteen isotopic species and zero-point vibrational corrections calculated at various quantum-chemical levels. These empirical equilibrium structures agree to within 0.1 pm irrespective of the theoretical level employed. High-level quantum-chemical calculations on the hyperfine structure parameters of the cyanopolyynes were found to be in excellent agreement with experiment. Finally, the theoretically most accurate determination of the molecular equilibrium structure of ferrocene to date is presented.
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
The mass estimation of galaxy clusters is a crucial point for modern cosmology, and can be obtained by several different techniques. In this work we discuss a new method to measure the mass of galaxy clusters connecting the gravitational potential of the cluster with the kinematical properties of its surroundings. We explore the dynamics of the structures located in the region outside virialized cluster, We identify groups of galaxies, as sheets or filaments, in the cluster outer region, and model how the cluster gravitational potential perturbs the motion of these structures from the Hubble fow. This identification is done in the redshift space where we look for overdensities with a filamentary shape. Then we use a radial mean velocity profile that has been found as a quite universal trend in simulations, and we fit the radial infall velocity profile of the overdensities found. The method has been tested on several cluster-size haloes from cosmological N-body simulations giving results in very good agreement with the true values of virial masses of the haloes and orientation of the sheets. We then applied the method to the Coma cluster and even in this case we found a good correspondence with previous. It is possible to notice a mass discrepancy between sheets with different alignments respect to the center of the cluster. This difference can be used to reproduce the shape of the cluster, and to demonstrate that the spherical symmetry is not always a valid assumption. In fact, if the cluster is not spherical, sheets oriented along different axes should feel a slightly different gravitational potential, and so give different masses as result of the analysis described before. Even this estimation has been tested on cosmological simulations and then applied to Coma, showing the actual non-sphericity of this cluster.
Resumo:
Biotic and abiotic phenological observations can be collected from continental to local spatial scale. Plant phenological observations may only be recorded wherever there is vegetation. Fog, snow and ice are available as phenological para-meters wherever they appear. The singularity of phenological observations is the possibility of spatial intensification to a microclimatic scale where the equipment of meteorological measurements is too expensive for intensive campaigning. The omnipresence of region-specific phenological parameters allows monitoring for a spatially much more detailed assessment of climate change than with weather data. We demonstrate this concept with phenological observations with the use of a special network in the Canton of Berne, Switzerland, with up to 600 observations sites (more than 1 to 10 km² of the inhabited area). Classic cartography, gridding, the integration into a Geographic Information System GIS and large-scale analysis are the steps to a detailed knowledge of topoclimatic conditions of a mountainous area. Examples of urban phenology provide other types of spatially detailed applications. Large potential in phenological mapping in future analyses lies in combining traditionally observed species-specific phenology with remotely sensed and modelled phenology that provide strong spatial information. This is a long history from cartographic intuition to algorithm-based representations of phenology.
Resumo:
A wealth of genetic associations for cardiovascular and metabolic phenotypes in humans has been accumulating over the last decade, in particular a large number of loci derived from recent genome wide association studies (GWAS). True complex disease-associated loci often exert modest effects, so their delineation currently requires integration of diverse phenotypic data from large studies to ensure robust meta-analyses. We have designed a gene-centric 50 K single nucleotide polymorphism (SNP) array to assess potentially relevant loci across a range of cardiovascular, metabolic and inflammatory syndromes. The array utilizes a "cosmopolitan" tagging approach to capture the genetic diversity across approximately 2,000 loci in populations represented in the HapMap and SeattleSNPs projects. The array content is informed by GWAS of vascular and inflammatory disease, expression quantitative trait loci implicated in atherosclerosis, pathway based approaches and comprehensive literature searching. The custom flexibility of the array platform facilitated interrogation of loci at differing stringencies, according to a gene prioritization strategy that allows saturation of high priority loci with a greater density of markers than the existing GWAS tools, particularly in African HapMap samples. We also demonstrate that the IBC array can be used to complement GWAS, increasing coverage in high priority CVD-related loci across all major HapMap populations. DNA from over 200,000 extensively phenotyped individuals will be genotyped with this array with a significant portion of the generated data being released into the academic domain facilitating in silico replication attempts, analyses of rare variants and cross-cohort meta-analyses in diverse populations. These datasets will also facilitate more robust secondary analyses, such as explorations with alternative genetic models, epistasis and gene-environment interactions.
Resumo:
Due to the ongoing trend towards increased product variety, fast-moving consumer goods such as food and beverages, pharmaceuticals, and chemicals are typically manufactured through so-called make-and-pack processes. These processes consist of a make stage, a pack stage, and intermediate storage facilities that decouple these two stages. In operations scheduling, complex technological constraints must be considered, e.g., non-identical parallel processing units, sequence-dependent changeovers, batch splitting, no-wait restrictions, material transfer times, minimum storage times, and finite storage capacity. The short-term scheduling problem is to compute a production schedule such that a given demand for products is fulfilled, all technological constraints are met, and the production makespan is minimised. A production schedule typically comprises 500–1500 operations. Due to the problem size and complexity of the technological constraints, the performance of known mixed-integer linear programming (MILP) formulations and heuristic approaches is often insufficient. We present a hybrid method consisting of three phases. First, the set of operations is divided into several subsets. Second, these subsets are iteratively scheduled using a generic and flexible MILP formulation. Third, a novel critical path-based improvement procedure is applied to the resulting schedule. We develop several strategies for the integration of the MILP model into this heuristic framework. Using these strategies, high-quality feasible solutions to large-scale instances can be obtained within reasonable CPU times using standard optimisation software. We have applied the proposed hybrid method to a set of industrial problem instances and found that the method outperforms state-of-the-art methods.