863 resultados para critical path methods
Resumo:
Static timing analysis provides the basis for setting the clock period of a microprocessor core, based on its worst-case critical path. However, depending on the design, this critical path is not always excited and therefore dynamic timing margins exist that can theoretically be exploited for the benefit of better speed or lower power consumption (through voltage scaling). This paper introduces predictive instruction-based dynamic clock adjustment as a technique to trim dynamic timing margins in pipelined microprocessors. To this end, we exploit the different timing requirements for individual instructions during the dynamically varying program execution flow without the need for complex circuit-level measures to detect and correct timing violations. We provide a design flow to extract the dynamic timing information for the design using post-layout dynamic timing analysis and we integrate the results into a custom cycle-accurate simulator. This simulator allows annotation of individual instructions with their impact on timing (in each pipeline stage) and rapidly derives the overall code execution time for complex benchmarks. The design methodology is illustrated at the microarchitecture level, demonstrating the performance and power gains possible on a 6-stage OpenRISC in-order general purpose processor core in a 28nm CMOS technology. We show that employing instruction-dependent dynamic clock adjustment leads on average to an increase in operating speed by 38% or to a reduction in power consumption by 24%, compared to traditional synchronous clocking, which at all times has to respect the worst-case timing identified through static timing analysis.
Resumo:
Critical illness is characterised by nutritional and metabolic disorders, resulting in increased muscle catabolism, fat-free mass loss, and hyperglycaemia. The objective of the nutritional support is to limit fat-free mass loss, which has negative consequences on clinical outcome and recovery. Early enteral nutrition is recommended by current guidelines as the first choice feeding route in ICU patients. However, enteral nutrition alone is frequently associated with insufficient coverage of the energy requirements, and subsequently energy deficit is correlated to worsened clinical outcome. Controlled trials have demonstrated that, in case of failure or contraindications to full enteral nutrition, parenteral nutrition administration on top of insufficient enteral nutrition within the first four days after admission could improve the clinical outcome, and may attenuate fat-free mass loss. Parenteral nutrition is cautious if all-in-one solutions are used, glycaemia controlled, and overnutrition avoided. Conversely, the systematic use of parenteral nutrition in the ICU patients without clear indication is not recommended during the first 48 hours. Specific methods, such as thigh ultra-sound imaging, 3rd lumbar vertebra-targeted computerised tomography and bioimpedance electrical analysis, may be helpful in the future to monitor fat-free mass during the ICU stay. Clinical studies are warranted to demonstrate whether an optimal nutritional management during the ICU stay promotes muscle mass and function, the recovery after critical illness and reduces the overall costs.
Resumo:
This qualitative study explores Thomas Green's (1999) treatise, Voices: The Educational Formation of Conscience; for the purpose of reconstruing the transformative usefulness of conscience in moral education. Conscience is "reflexive judgment about things that matter" (Green, 1999, p. 21). Paul Lehmann (1963) suggested that we must "do the conscience over or do the conscience in" (p. 327). Thomas Green "does the conscience over", arguing that a philosophy of moral education, and not a moral philosophy, provides the only framework from which governance of moral behaviour can be understood. Narratives from four one-to-one interviews and a focus group are analysed and interpreted in search of: (a) awareness and understanding of conscience, (b) voices of conscience, (c) normation, (d) reflexive emotions, and (e) the idea of the sacred. Participants in this study (ages 16-21) demonstrated an active awareness of their conscience and a willingness to engage in a reflective process of their moral behaviour. They understood their conscience to be a process of self-judgment about what is right and wrong, and that its authority comes from within themselves. Narrative accounts from childhood indicated that conscience is there "from the beginning" with evidence of selfcorrecting behaviour. A maturing conscience is accompanied by an increased cognitive capacity, more complicated life experiences, and individualization. Moral motivation was grounded in " a desire to connect with things that are most important." A model for conscience formation is proposed, which visualizes a critical path of reflexive emotions. It is argued that schools, striving to shape good citizens, can promote conscience formation through a "curriculum of moral skills"; a curriculum that embraces complexity, diversity, social criticism, and selfhood.
Resumo:
Decimal multiplication is an integral part offinancial, commercial, and internet-based computations. The basic building block of a decimal multiplier is a single digit multiplier. It accepts two Binary Coded Decimal (BCD) inputs and gives a product in the range [0, 81] represented by two BCD digits. A novel design for single digit decimal multiplication that reduces the critical path delay and area is proposed in this research. Out of the possible 256 combinations for the 8-bit input, only hundred combinations are valid BCD inputs. In the hundred valid combinations only four combinations require 4 x 4 multiplication, combinations need x multiplication, and the remaining combinations use either x or x 3 multiplication. The proposed design makes use of this property. This design leads to more regular VLSI implementation, and does not require special registers for storing easy multiples. This is a fully parallel multiplier utilizing only combinational logic, and is extended to a Hex/Decimal multiplier that gives either a decimal output or a binary output. The accumulation ofpartial products generated using single digit multipliers is done by an array of multi-operand BCD adders for an (n-digit x n-digit) multiplication.
Resumo:
The recent trends envisage multi-standard architectures as a promising solution for the future wireless transceivers to attain higher system capacities and data rates. The computationally intensive decimation filter plays an important role in channel selection for multi-mode systems. An efficient reconfigurable implementation is a key to achieve low power consumption. To this end, this paper presents a dual-mode Residue Number System (RNS) based decimation filter which can be programmed for WCDMA and 802.16e standards. Decimation is done using multistage, multirate finite impulse response (FIR) filters. These FIR filters implemented in RNS domain offers high speed because of its carry free operation on smaller residues in parallel channels. Also, the FIR filters exhibit programmability to a selected standard by reconfiguring the hardware architecture. The total area is increased only by 24% to include WiMAX compared to a single mode WCDMA transceiver. In each mode, the unused parts of the overall architecture is powered down and bypassed to attain power saving. The performance of the proposed decimation filter in terms of critical path delay and area are tabulated.
Resumo:
Decimal multiplication is an integral part of financial, commercial, and internet-based computations. A novel design for single digit decimal multiplication that reduces the critical path delay and area for an iterative multiplier is proposed in this research. The partial products are generated using single digit multipliers, and are accumulated based on a novel RPS algorithm. This design uses n single digit multipliers for an n × n multiplication. The latency for the multiplication of two n-digit Binary Coded Decimal (BCD) operands is (n + 1) cycles and a new multiplication can begin every n cycle. The accumulation of final partial products and the first iteration of partial product generation for next set of inputs are done simultaneously. This iterative decimal multiplier offers low latency and high throughput, and can be extended for decimal floating-point multiplication.
Resumo:
The recent trends envisage multi-standard architectures as a promising solution for the future wireless transceivers. The computationally intensive decimation filter plays an important role in channel selection for multi-mode systems. An efficient reconfigurable implementation is a key to achieve low power consumption. To this end, this paper presents a dual-mode Residue Number System (RNS) based decimation filter which can be programmed for WCDMA and 802.11a standards. Decimation is done using multistage, multirate finite impulse response (FIR) filters. These FIR filters implemented in RNS domain offers high speed because of its carry free operation on smaller residues in parallel channels. Also, the FIR filters exhibit programmability to a selected standard by reconfiguring the hardware architecture. The total area is increased only by 33% to include WLANa compared to a single mode WCDMA transceiver. In each mode, the unused parts of the overall architecture is powered down and bypassed to attain power saving. The performance of the proposed decimation filter in terms of critical path delay and area are tabulated
Resumo:
Nanoparticles are of immense importance both from the fundamental and application points of view. They exhibit quantum size effects which are manifested in their improved magnetic and electric properties. Mechanical attrition by high energy ball milling (HEBM) is a top down process for producing fine particles. However, fineness is associated with high surface area and hence is prone to oxidation which has a detrimental effect on the useful properties of these materials. Passivation of nanoparticles is known to inhibit surface oxidation. At the same time, coating polymer film on inorganic materials modifies the surface properties drastically. In this work a modified set-up consisting of an RF plasma polymerization technique is employed to coat a thin layer of a polymer film on Fe nanoparticles produced by HEBM. Ball-milled particles having different particle size ranges are coated with polyaniline. Their electrical properties are investigated by measuring the dc conductivity in the temperature range 10–300 K. The low temperature dc conductivity (I–V ) exhibited nonlinearity. This nonlinearity observed is explained on the basis of the critical path model. There is clear-cut evidence for the occurrence of intergranular tunnelling. The results are presented here in this paper
Resumo:
Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
Resumo:
A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).
Resumo:
Exam questions and solutions in LaTex
Resumo:
Exam questions and solutions in PDF
Resumo:
Exam questions and solutions in PDF