23 resultados para Crete (Greece)
em Indian Institute of Science - Bangalore - Índia
Resumo:
Loads that miss in L1 or L2 caches and waiting for their data at the head of the ROB cause significant slow down in the form of commit stalls. We identify that most of these commit stalls are caused by a small set of loads, referred to as LIMCOS (Loads Incurring Majority of COmmit Stalls). We propose simple history-based classifiers that track commit stalls suffered by loads to help us identify this small set of loads. We study an application of these classifiers to prefetching. The classifiers are used to train the prefetcher to focus on the misses suffered by LIMCOS. This, referred to as focused prefetching, results in a 9.8% gain in IPC over naive GHB based delta correlation prefetcher along with a 20.3% reduction in memory traffic for a set of 17 memory-intensive SPEC2000 benchmarks. Another important impact of focused prefetching is a 61% improvement in the accuracy of prefetches. We demonstrate that the proposed classification criterion performs better than other existing criteria like criticality and delinquent loads. Also we show that the criterion of focusing on commit stalls is robust enough across cache levels and can be applied to any prefetcher without any modifications to the prefetcher.
Resumo:
This paper deals with the evaluation of the component-laminate load-carrying capacity, i.e., to calculate the loads that cause the failure of the individual layers and the component-laminate as a whole in four-bar mechanism. The component-laminate load-carrying capacity is evaluated using the Tsai-Wu-Hahn failure criterion for various layups. The reserve factor of each ply in the component-laminate is calculated by using the maximum resultant force and the maximum resultant moment occurring at different time steps at the joints of the mechanism. Here, all component bars of the mechanism are made of fiber reinforced laminates and have thin rectangular cross-sections. They could, in general, be pre-twisted and/or possess initial curvature, either by design or by defect. They are linked to each other by means of revolute joints. We restrict ourselves to linear materials with small strains within each elastic body (beam). Each component of the mechanism is modeled as a beam based on geometrically nonlinear 3-D elasticity theory. The component problems are thus split into 2-D analyses of reference beam cross-sections and nonlinear 1-D analyses along the three beam reference curves. For the thin rectangular cross-sections considered here, the 2-D cross-sectional nonlinearity is also overwhelming. This can be perceived from the fact that such sections constitute a limiting case between thin-walled open and closed sections, thus inviting the nonlinear phenomena observed in both. The strong elastic couplings of anisotropic composite laminates complicate the model further. However, a powerful mathematical tool called the Variational Asymptotic Method (VAM) not only enables such a dimensional reduction, but also provides asymptotically correct analytical solutions to the nonlinear cross-sectional analysis. Such closed-form solutions are used here in conjunction with numerical techniques for the rest of the problem to predict more quickly and accurately than would otherwise be possible. Local 3-D stress, strain and displacement fields for representative sections in the component-bars are recovered, based on the stress resultants from the 1-D global beam analysis. A numerical example is presented which illustrates the failure of each component-laminate and the mechanism as a whole.
Resumo:
Memory models of shared memory concurrent programs define the values a read of a shared memory location is allowed to see. Such memory models are typically weaker than the intuitive sequential consistency semantics to allow efficient execution. In this paper, we present WOMM (abbreviation for Weak Operational Memory Model) that formally unifies two sources of weak behavior in hardware memory models: reordering of instructions and weakly consistent memory. We show that a large number of optimizations are allowed by WOMM. We also show that WOMM is weaker than a number of hardware memory models. Consequently, if a program behaves correctly under WOMM, it will be correct with respect to those hardware memory models. Hence, WOMM can be used as a formally specified abstraction of the hardware memory models. Moreover; unlike most weak memory models, WOMM is described using operational semantics, making it easy to integrate into a model checker for concurrent programs. We further show that WOMM has an important property - it has sequential consistency semantics for datarace-free programs.
Resumo:
Wireless sensor networks can often be viewed in terms of a uniform deployment of a large number of nodes on a region in Euclidean space, e.g., the unit square. After deployment, the nodes self-organise into a mesh topology. In a dense, homogeneous deployment, a frequently used approximation is to take the hop distance between nodes to be proportional to the Euclidean distance between them. In this paper, we analyse the performance of this approximation. We show that nodes with a certain hop distance from a fixed anchor node lie within a certain annulus with probability approach- ing unity as the number of nodes n → ∞. We take a uniform, i.i.d. deployment of n nodes on a unit square, and consider the geometric graph on these nodes with radius r(n) = c q ln n n . We show that, for a given hop distance h of a node from a fixed anchor on the unit square,the Euclidean distance lies within [(1−ǫ)(h−1)r(n), hr(n)],for ǫ > 0, with probability approaching unity as n → ∞.This result shows that it is more likely to expect a node, with hop distance h from the anchor, to lie within this an- nulus centred at the anchor location, and of width roughly r(n), rather than close to a circle whose radius is exactly proportional to h. We show that if the radius r of the ge- ometric graph is fixed, the convergence of the probability is exponentially fast. Similar results hold for a randomised lattice deployment. We provide simulation results that il- lustrate the theory, and serve to show how large n needs to be for the asymptotics to be useful.
Resumo:
Mobile ad-hoc networks (MANETs) have recently drawn significant research attention since they offer unique benefits and versatility with respect to bandwidth spatial reuse, intrinsic fault tolerance, and low-cost rapid deployment. This paper addresses the issue of delay sensitive realtime data transport in these type of networks. An effective QoS mechanism is thereby required for the speedy transport of the realtime data. QoS issue in MANET is an open-end problem. Various QoS measures are incorporated in the upperlayers of the network, but a few techniques addresses QoS techniques in the MAC layer. There are quite a few QoS techniques in the MAC layer for the infrastructure based wireless network. The goal and the challenge is to achieve a QoS delivery and a priority access to the real time traffic in adhoc wireless environment, while maintaining democracy in the resource allocation. We propose a MAC layer protocol called "FCP based FAMA protocol", which allocates the channel resources to the needy in a more democratic way, by examining the requirements, malicious behavior and genuineness of the request. We have simulated both the FAMA as well as FCP based FAMA and tested in various MANET conditions. Simulated results have clearly shown a performance improvement in the channel utilization and a decrease in the delay parameters in the later case. Our new protocol outperforms the other QoS aware MAC layer protocols.
Resumo:
Due to the importance of collective communications in scientific parallel applications, many strategies have been devised for optimizing collective communications for different kinds of parallel environments. There has been an increasing interest to evolve efficient broadcast algorithms for computational grids. In this paper, we present application-oriented adaptive techniques that take into account resource characteristics as well as the application's usage of broadcasts for deriving efficient broadcast trees. In particular, we consider two broadcast parameters used in the application, namely, the broadcast message sizes and the time interval between the broadcasts. The results indicate that our adaptive strategies can provide 20% average improvement in performance over the popular MPICH-G2's MPI_Bcast implementation for loaded network conditions.
Resumo:
Solar dynamo models based on differential rotation inferred from helioseismology tend to produce rather strong magnetic activity at high solar latitudes, in contrast to the observed fact that sunspots appear at low latitudes. We show that a meridional circulation penetrating below the tachocline can solve this problem.
Resumo:
In the present work, we experimentally study the flow of water over textured hydrophobic surfaces in a micro-channel. Shear stress measurements are done along with direct visualization of trapped air pockets on the hydrophobic surface. The trapped air pockets on such surfaces are known to be responsible for apparent slip at these surfaces and hence in significant drag reduction. In typical circumstances, the apparent slip reduces over time as seen, for example, from our shear stress measurements. This implies that the drag reduction will not be sustained. We have performed extensive visualizations of the trapped air pockets while varying flow parameters like the flow rate and the pressure. We present here direct visualizations that show that under some conditions, the air pockets can grow with time. The variation of the air pocket size with time is found to change qualitatively and quantitatively as the flow rate is varied. These measured changes in the air pocket size with time have a direct bearing on the sustainability of apparent slip in micro-channel flows.
Resumo:
Air can be trapped on the crevices of specially textured hydrophobic surfaces immersed in water. This heterogenous state of wetting in which the water is in contact with both the solid surface and the entrapped air is not stable. Diffusion of air into the surrounding water leads to gradual reduction in the size and numbers of the air bubbles. The sustainability of the entrapped air on such surfaces is important for many underwater applications in which the surfaces have to remain submersed for longer time periods. In this paper we explore the suitability of different classes of surface textures towards the drag reduction application by evaluating the time required for the disappearance of the air bubbles under hydrostatic conditions. Different repetitive textures consisting of holes, pillars and ridges of different sizes have been generated in silicon, aluminium and brass by isotropic etching, wire EDM and chemical etching respectively. These surfaces were rendered hydrophobic with self-assembled layer of fluorooctyl trichlorosilane for silicon and aluminium surfaces and 1-dodecanethiol for brass surfaces. Using total internal reflection the air bubbles are visualized with the help of a microscope and time lapse photography. Irrespective of the texture, both the size and the number of air pockets were found to decrease with time gradually and eventually disappear. In an attempt to reverse the diffusion we explore the possibility of using electrolysis to generate gases at the textured surfaces. The gas bubbles are nucleated everywhere on the surface and as they grow they coalesce with each other and get pinned at the texture edges.
Resumo:
Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.
Resumo:
The(1-x) BiFeO3-(x) PbTiO3 solid solution exhibiting a Morphotropic Phase Boundary (MPB) has attracted considerable attention recently because of its unique features such as multiferroic, high Curie point (T-C similar to 700 degrees C) and giant tetragonality (c/a -1 similar to 0.19). Different research groups have reported different composition range of MPB for this system. In this work we have conclusively proved that the wide composition range of MPB reported in the literature is due to kinetic arrest of the metastable rhombohedral phase and that if sufficient temperature and time is allowed the metastable phase disappears. The genuine MPB was found to be x=0.27 for which the tetragonal and the rhombohedral phases are in thermodynamic equilibrium. In-situ high temperature structural study of x=0.27 revealed the sluggish kinetics associated with the temperature induced structural transformation. Neutron powder diffraction study revealed that themagnetic ordering at room temperature occurs in the rhombohedral phase. The magnetic structure was found to be commensurate G-type antiferromagnetic with magnetic moments parallel to the c-direction (of the hexagonal cell). The present study suggests that the equilibrium properties in this solid solution series should be sought for x=0.27.