611 resultados para SCIL processor
Resumo:
Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.This dissertation contributes to an architecture oriented code validation, error localization and optimization technique assisting the embedded system designer in software debugging, to make it more effective at early detection of software bugs that are otherwise hard to detect, using the static analysis of machine codes. The focus of this work is to develop methods that automatically localize faults as well as optimize the code and thus improve the debugging process as well as quality of the code.Validation is done with the help of rules of inferences formulated for the target processor. The rules govern the occurrence of illegitimate/out of place instructions and code sequences for executing the computational and integrated peripheral functions. The stipulated rules are encoded in propositional logic formulae and their compliance is tested individually in all possible execution paths of the application programs. An incorrect sequence of machine code pattern is identified using slicing techniques on the control flow graph generated from the machine code.An algorithm to assist the compiler to eliminate the redundant bank switching codes and decide on optimum data allocation to banked memory resulting in minimum number of bank switching codes in embedded system software is proposed. A relation matrix and a state transition diagram formed for the active memory bank state transition corresponding to each bank selection instruction is used for the detection of redundant codes. Instances of code redundancy based on the stipulated rules for the target processor are identified.This validation and optimization tool can be integrated to the system development environment. It is a novel approach independent of compiler/assembler, applicable to a wide range of processors once appropriate rules are formulated. Program states are identified mainly with machine code pattern, which drastically reduces the state space creation contributing to an improved state-of-the-art model checking. Though the technique described is general, the implementation is architecture oriented, and hence the feasibility study is conducted on PIC16F87X microcontrollers. The proposed tool will be very useful in steering novices towards correct use of difficult microcontroller features in developing embedded systems.
Resumo:
India is the largest producer and processor of cashew in the world. The export value of cashew is about Rupees 2600 crore during 2004-05. Kerala is the main processing and exporting center of cashew. In Kerala most of the cashew processing factories are located in Kollam district. The industry provides livelihood for about 6-7 lakhs of employees and farmers, the cashew industry has national importance. In Kollam district alone there are more than 2.5 lakhs employees directly involved in the industry, which comes about 10 per cent of the population of the district, out of which 95 per cent are women workers. It is a fact that any amount received by a woman worker will be utilized directly for the benefit of the family and hence the link relating to family welfare is quite clear. Even though the Government of Kerala has incorporated the Kerala State Cashew Development Corporation (KSCDC) and Kerala State Cashew Workers Apex Industrial Co—operative Society (CAPEX) to develop the Cashew industry, the cashew industry and ancillary industries did not grow as per the expectation. In this context, an attempt has been made to analyze the problems and potential of the industry so as to make the industry viable and sustainable for the perpetual employment and income generation as well as the overall development of the Kollam district.
Resumo:
We have investigated the effects of swift heavy ion irradiation on thermally evaporated 44 nm thick, amorphous Co77Fe23 thin films on silicon substrates using 100 MeV Ag7+ ions fluences of 1 1011 ions/ cm2, 1 1012 ions/cm2, 1 1013 ions/cm2, and 3 1013 ions/cm2. The structural modifications upon swift heavy irradiation were investigated using glancing angle X-ray diffraction. The surface morphological evolution of thin film with irradiation was studied using Atomic Force Microscopy. Power spectral density analysis was used to correlate the roughness variation with structural modifications investigated using X-ray diffraction. Magnetic measurements were carried out using vibrating sample magnetometry and the observed variation in coercivity of the irradiated films is explained on the basis of stress relaxation. Magnetic force microscopy images are subjected to analysis using the scanning probe image processor software. These results are in agreement with the results obtained using vibrating sample magnetometry. The magnetic and structural properties are correlated
Resumo:
Bank switching in embedded processors having partitioned memory architecture results in code size as well as run time overhead. An algorithm and its application to assist the compiler in eliminating the redundant bank switching codes introduced and deciding the optimum data allocation to banked memory is presented in this work. A relation matrix formed for the memory bank state transition corresponding to each bank selection instruction is used for the detection of redundant codes. Data allocation to memory is done by considering all possible permutation of memory banks and combination of data. The compiler output corresponding to each data mapping scheme is subjected to a static machine code analysis which identifies the one with minimum number of bank switching codes. Even though the method is compiler independent, the algorithm utilizes certain architectural features of the target processor. A prototype based on PIC 16F87X microcontrollers is described. This method scales well into larger number of memory blocks and other architectures so that high performance compilers can integrate this technique for efficient code generation. The technique is illustrated with an example
Resumo:
The research in the area of geopolymer is gaining momentum during the past 20 years. Studies confirm that geopolymer concrete has good compressive strength, tensile strength, flexural strength, modulus of elasticity and durability. These properties are comparable with OPC concrete.There are many occasions where concrete is exposed to elevated temperatures like fire exposure from thermal processor, exposure from furnaces, nuclear exposure, etc.. In such cases, understanding of the behaviour of concrete and structural members exposed to elevated temperatures is vital. Even though many research reports are available about the behaviour of OPC concrete at elevated temperatures, there is limited information available about the behaviour of geopolymer concrete after exposure to elevated temperatures. A preliminary study was carried out for the selection of a mix proportion. The important variable considered in the present study include alkali/fly ash ratio, percentage of total aggregate content, fine aggregate to total aggregate ratio, molarity of sodium hydroxide, sodium silicate to sodium hydroxide ratio, curing temperature and curing period. Influence of different variables on engineering properties of geopolymer concrete was investigated. The study on interface shear strength of reinforced and unreinforced geopolymer concrete as well as OPC concrete was also carried out. Engineering properties of fly ash based geopolymer concrete after exposure to elevated temperatures (ambient to 800 °C) were studied and the corresponding results were compared with those of conventional concrete. Scanning Electron Microscope analysis, Fourier Transform Infrared analysis, X-ray powder Diffractometer analysis and Thermogravimetric analysis of geopolymer mortar or paste at ambient temperature and after exposure to elevated temperature were also carried out in the present research work. Experimental study was conducted on geopolymer concrete beams after exposure to elevated temperatures (ambient to 800 °C). Load deflection characteristics, ductility and moment-curvature behaviour of the geopolymer concrete beams after exposure to elevated temperatures were investigated. Based on the present study, major conclusions derived could be summarized as follows. There is a definite proportion for various ingredients to achieve maximum strength properties. Geopolymer concrete with total aggregate content of 70% by volume, ratio of fine aggregate to total aggregate of 0.35, NaOH molarity 10, Na2SiO3/NaOH ratio of 2.5 and alkali to fly ash ratio of 0.55 gave maximum compressive strength in the present study. An early strength development in geopolymer concrete could be achieved by the proper selection of curing temperature and the period of curing. With 24 hours of curing at 100 °C, 96.4% of the 28th day cube compressive strength could be achieved in 7 days in the present study. The interface shear strength of geopolymer concrete is lower to that of OPC concrete. Compared to OPC concrete, a reduction in the interface shear strength by 33% and 29% was observed for unreinforced and reinforced geopolymer specimens respectively. The interface shear strength of geopolymer concrete is lower than ordinary Portland cement concrete. The interface shear strength of geopolymer concrete can be approximately estimated as 50% of the value obtained based on the available equations for the calculation of interface shear strength of ordinary portland cement concrete (method used in Mattock and ACI). Fly ash based geopolymer concrete undergoes a high rate of strength loss (compressive strength, tensile strength and modulus of elasticity) during its early heating period (up to 200 °C) compared to OPC concrete. At a temperature exposure beyond 600 °C, the unreacted crystalline materials in geopolymer concrete get transformed into amorphous state and undergo polymerization. As a result, there is no further strength loss (compressive strength, tensile strength and modulus of elasticity) in geopolymer concrete, whereas, OPC concrete continues to lose its strength properties at a faster rate beyond a temperature exposure of 600 °C. At present no equation is available to predict the strength properties of geopolymer concrete after exposure to elevated temperatures. Based on the study carried out, new equations have been proposed to predict the residual strengths (cube compressive strength, split tensile strength and modulus of elasticity) of geopolymer concrete after exposure to elevated temperatures (upto 800 °C). These equations could be used for material modelling until better refined equations are available. Compared to OPC concrete, geopolymer concrete shows better resistance against surface cracking when exposed to elevated temperatures. In the present study, while OPC concrete started developing cracks at 400 °C, geopolymer concrete did not show any visible cracks up to 600 °C and developed only minor cracks at an exposure temperatureof 800 °C. Geopolymer concrete beams develop crack at an early load stages if they are exposed to elevated temperatures. Even though the material strength of the geopolymer concrete does not decrease beyond 600 °C, the flexural strength of corresponding beam reduces rapidly after 600 °C temperature exposure, primarily due to the rapid loss of the strength of steel. With increase in temperature, the curvature at yield point of geopolymer concrete beam increases and thereby the ductility reduces. In the present study, compared to the ductility at ambient temperature, the ductility of geopolymer concrete beams reduces by 63.8% at 800 °C temperature exposure. Appropriate equations have been proposed to predict the service load crack width of geopolymer concrete beam exposed to elevated temperatures. These equations could be used to limit the service load on geopolymer concrete beams exposed to elevated temperatures (up to 800 °C) for a predefined crack width (between 0.1mm and 0.3 mm) or vice versa. The moment-curvature relationship of geopolymer concrete beams at ambient temperature is similar to that of RCC beams and this could be predicted using strain compatibility approach Once exposed to an elevated temperature, the strain compatibility approach underestimates the curvature of geopolymer concrete beams between the first cracking and yielding point.
Resumo:
For the theoretical investigation of local phenomena (adsorption at surfaces, defects or impurities within a crystal, etc.) one can assume that the effects caused by the local disturbance are only limited to the neighbouring particles. With this model, that is well-known as cluster-approximation, an infinite system can be simulated by a much smaller segment of the surface (Cluster). The size of this segment varies strongly for different systems. Calculations to the convergence of bond distance and binding energy of an adsorbed aluminum atom on an Al(100)-surface showed that more than 100 atoms are necessary to get a sufficient description of surface properties. However with a full-quantummechanical approach these system sizes cannot be calculated because of the effort in computer memory and processor speed. Therefore we developed an embedding procedure for the simulation of surfaces and solids, where the whole system is partitioned in several parts which itsself are treated differently: the internal part (cluster), which is located near the place of the adsorbate, is calculated completely self-consistently and is embedded into an environment, whereas the influence of the environment on the cluster enters as an additional, external potential to the relativistic Kohn-Sham-equations. The basis of the procedure represents the density functional theory. However this means that the choice of the electronic density of the environment constitutes the quality of the embedding procedure. The environment density was modelled in three different ways: atomic densities; of a large prepended calculation without embedding transferred densities; bulk-densities (copied). The embedding procedure was tested on the atomic adsorptions of 'Al on Al(100) and Cu on Cu(100). The result was that if the environment is choices appropriately for the Al-system one needs only 9 embedded atoms to reproduce the results of exact slab-calculations. For the Cu-system first calculations without embedding procedures were accomplished, with the result that already 60 atoms are sufficient as a surface-cluster. Using the embedding procedure the same values with only 25 atoms were obtained. This means a substantial improvement if one takes into consideration that the calculation time increased cubically with the number of atoms. With the embedding method Infinite systems can be treated by molecular methods. Additionally the program code was extended by the possibility to make molecular-dynamic simulations. Now it is possible apart from the past calculations of fixed cores to investigate also structures of small clusters and surfaces. A first application we made with the adsorption of Cu on Cu(100). We calculated the relaxed positions of the atoms that were located close to the adsorption site and afterwards made the full-quantummechanical calculation of this system. We did that procedure for different distances to the surface. Thus a realistic adsorption process could be examined for the first time. It should be remarked that when doing the Cu reference-calculations (without embedding) we begun to parallelize the entire program code. Only because of this aspect the investigations for the 100 atomic Cu surface-clusters were possible. Due to the good efficiency of both the parallelization and the developed embedding procedure we will be able to apply the combination in future. This will help to work on more these areas it will be possible to bring in results of full-relativistic molecular calculations, what will be very interesting especially for the regime of heavy systems.
Resumo:
The Scheme86 and the HP Precision Architectures represent different trends in computer processor design. The former uses wide micro-instructions, parallel hardware, and a low latency memory interface. The latter encourages pipelined implementation and visible interlocks. To compare the merits of these approaches, algorithms frequently encountered in numerical and symbolic computation were hand-coded for each architecture. Timings were done in simulators and the results were evaluated to determine the speed of each design. Based on these measurements, conclusions were drawn as to which aspects of each architecture are suitable for a high- performance computer.
Resumo:
Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
Resumo:
As the number of processors in distributed-memory multiprocessors grows, efficiently supporting a shared-memory programming model becomes difficult. We have designed the Protocol for Hierarchical Directories (PHD) to allow shared-memory support for systems containing massive numbers of processors. PHD eliminates bandwidth problems by using a scalable network, decreases hot-spots by not relying on a single point to distribute blocks, and uses a scalable amount of space for its directories. PHD provides a shared-memory model by synthesizing a global shared memory from the local memories of processors. PHD supports sequentially consistent read, write, and test- and-set operations. This thesis also introduces a method of describing locality for hierarchical protocols and employs this method in the derivation of an abstract model of the protocol behavior. An embedded model, based on the work of Johnson[ISCA19], describes the protocol behavior when mapped to a k-ary n-cube. The thesis uses these two models to study the average height in the hierarchy that operations reach, the longest path messages travel, the number of messages that operations generate, the inter-transaction issue time, and the protocol overhead for different locality parameters, degrees of multithreading, and machine sizes. We determine that multithreading is only useful for approximately two to four threads; any additional interleaving does not decrease the overall latency. For small machines and high locality applications, this limitation is due mainly to the length of the running threads. For large machines with medium to low locality, this limitation is due mainly to the protocol overhead being too large. Our study using the embedded model shows that in situations where the run length between references to shared memory is at least an order of magnitude longer than the time to process a single state transition in the protocol, applications exhibit good performance. If separate controllers for processing protocol requests are included, the protocol scales to 32k processor machines as long as the application exhibits hierarchical locality: at least 22% of the global references must be able to be satisfied locally; at most 35% of the global references are allowed to reach the top level of the hierarchy.
Resumo:
Research on autonomous intelligent systems has focused on how robots can robustly carry out missions in uncertain and harsh environments with very little or no human intervention. Robotic execution languages such as RAPs, ESL, and TDL improve robustness by managing functionally redundant procedures for achieving goals. The model-based programming approach extends this by guaranteeing correctness of execution through pre-planning of non-deterministic timed threads of activities. Executing model-based programs effectively on distributed autonomous platforms requires distributing this pre-planning process. This thesis presents a distributed planner for modelbased programs whose planning and execution is distributed among agents with widely varying levels of processor power and memory resources. We make two key contributions. First, we reformulate a model-based program, which describes cooperative activities, into a hierarchical dynamic simple temporal network. This enables efficient distributed coordination of robots and supports deployment on heterogeneous robots. Second, we introduce a distributed temporal planner, called DTP, which solves hierarchical dynamic simple temporal networks with the assistance of the distributed Bellman-Ford shortest path algorithm. The implementation of DTP has been demonstrated successfully on a wide range of randomly generated examples and on a pursuer-evader challenge problem in simulation.
Resumo:
We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.
Resumo:
We describe the key role played by partial evaluation in the Supercomputing Toolkit, a parallel computing system for scientific applications that effectively exploits the vast amount of parallelism exposed by partial evaluation. The Supercomputing Toolkit parallel processor and its associated partial evaluation-based compiler have been used extensively by scientists at MIT, and have made possible recent results in astrophysics showing that the motion of the planets in our solar system is chaotically unstable.
Resumo:
We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.
Resumo:
The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc. Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed. In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.
Resumo:
A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).