871 resultados para NETWORK-ON-CHIP
Resumo:
Embedded many-core architectures contain dozens to hundreds of CPU cores that are connected via a highly scalable NoC interconnect. Our Multiprocessor-System-on-Chip CoreVAMPSoC combines the advantages of tightly coupled bus-based communication with the scalability of NoC approaches by adding a CPU cluster as an additional level of hierarchy. In this work, we analyze different cluster interconnect implementations with 8 to 32 CPUs and compare them in terms of resource requirements and performance to hierarchical NoCs approaches. Using 28nm FD-SOI technology the area requirement for 32 CPUs and AXI crossbar is 5.59mm2 including 23.61% for the interconnect at a clock frequency of 830 MHz. In comparison, a hierarchical MPSoC with 4 CPU cluster and 8 CPUs in each cluster requires only 4.83mm2 including 11.61% for the interconnect. To evaluate the performance, we use a compiler for streaming applications to map programs to the different MPSoC configurations. We use this approach for a design-space exploration to find the most efficient architecture and partitioning for an application.
Resumo:
Introduction. The venous drainage system within vertebral bodies (VBs) has been well documented previously in cadaveric specimens. Advances in 3D imaging and image processing now allow for in vivo quantification of larger venous vessels, such as the basivertebral vein. Differences between healthy and scoliotic VB veins can therefore be investigated. Methods. 20 healthy adolescent controls and 21 AIS patients were recruited (with ethics approval) to undergo 3D MRI, using a 3 Tesla, T1-weighted 3D gradient echo sequence, resulting in 512 slices across the thoraco-lumbar spine, with a voxel size of 0.5x0.5x0.5mm. Using Amira Filament Editor, five transverse slices through the VB were examined simultaneously and the resulting observable vascular network traced. Each VB was assessed, and a vascular network recorded when observable. A local coordinate system was created in the centre of each VB and the vascular networks aligned to this. The length of the vascular network on the left and right sides (with a small central region) of the VB was calculated, and the spatial patterning of the networks assessed level-by-level within each subject. Results. An average of 6 (range 4-10) vascular networks, consistent with descriptions of the basivertebral vein, were identifiable within each subject, most commonly between T10-L1. Differences were seen in the left/right distribution of vessels in the control and AIS subjects. Healthy controls saw a percentage distribution of 29:18:53 across the left:centre:right regions respectively, whereas the AIS subjects had a slightly shifted distribution of 33:25:42. The control group showed consistent spatial patterning of the vascular networks across most levels, but this was not seen in the AIS group. Conclusion. Observation and quantification of the basivertebral vein in vivo is possible using 3D MRI. The AIS group lacked the spatial pattern repetition seen in the control group and minor differences were seen in the left/right distribution of vessels.
Resumo:
We report a circuit technique to measure the on-chip delay of an individual logic gate (both inverting and non-inverting) in its unmodified form using digitally reconfigurable ring oscillator (RO). Solving a system of linear equations with different configuration setting of the RO gives delay of an individual gate. Experimental results from a test chip in 65nm process node show the feasibility of measuring the delay of an individual inverter to within 1pS accuracy. Delay measurements of different nominally identical inverters in close physical proximity show variations of up to 26% indicating the large impact of local or within-die variations.
Resumo:
We report the design and characterization of a circuit technique to measure the on-chip delay of an individual logic gate (both inverting and noninverting) in its unmodified form. The test circuit comprises of digitally reconfigurable ring oscillator (RO). The gate under test is embedded in each stage of the ring oscillator. A system of linear equations is then formed with different configuration settings of the RO, relating the individual gate delay to the measured period of the RO, whose solution gives the delay of the individual gates. Experimental results from a test chip in 65-nm process node show the feasibility of measuring the delay of an individual inverter to within 1 ps accuracy. Delay measurements of different nominally identicall inverters in close physical proximity show variations of up to 28% indicating the large impact of local variations. As a demonstration of this technique, we have studied delay variation with poly-pitch, length of diffusion (LOD) and different orientations of layout in silicon. The proposed technique is quite suitable for early process characterization, monitoring mature process in manufacturing and correlating model-to-hardware.
Resumo:
The continuous production of blood cells, a process termed hematopoiesis, is sustained throughout the lifetime of an individual by a relatively small population of cells known as hematopoietic stem cells (HSCs). HSCs are unique cells characterized by their ability to self-renew and give rise to all types of mature blood cells. Given their high proliferative potential, HSCs need to be tightly regulated on the cellular and molecular levels or could otherwise turn malignant. On the other hand, the tight regulatory control of HSC function also translates into difficulties in culturing and expanding HSCs in vitro. In fact, it is currently not possible to maintain or expand HSCs ex vivo without rapid loss of self-renewal. Increased knowledge of the unique features of important HSC niches and of key transcriptional regulatory programs that govern HSC behavior is thus needed. Additional insight in the mechanisms of stem cell formation could enable us to recapitulate the processes of HSC formation and self-renewal/expansion ex vivo with the ultimate goal of creating an unlimited supply of HSCs from e.g. human embryonic stem cells (hESCs) or induced pluripotent stem cells (iPS) to be used in therapy. We thus asked: How are hematopoietic stem cells formed and in what cellular niches does this happen (Papers I, II)? What are the molecular mechanisms that govern hematopoietic stem cell development and differentiation (Papers III, IV)? Importantly, we could show that placenta is a major fetal hematopoietic niche that harbors a large number of HSCs during midgestation (Paper I)(Gekas et al., 2005). In order to address whether the HSCs found in placenta were formed there we utilized the Runx1-LacZ knock-in and Ncx1 knockout mouse models (Paper II). Importantly, we could show that HSCs emerge de novo in the placental vasculature in the absence of circulation (Rhodes et al., 2008). Furthermore, we could identify defined microenvironmental niches within the placenta with distinct roles in hematopoiesis: the large vessels of the chorioallantoic mesenchyme serve as sites of HSC generation whereas the placental labyrinth is a niche supporting HSC expansion (Rhodes et al., 2008). Overall, these studies illustrate the importance of distinct milieus in the emergence and subsequent maturation of HSCs. To ensure proper function of HSCs several regulatory mechanisms are in place. The microenvironment in which HSCs reside provides soluble factors and cell-cell interactions. In the cell-nucleus, these cell-extrinsic cues are interpreted in the context of cell-intrinsic developmental programs which are governed by transcription factors. An essential transcription factor for initiation of hematopoiesis is Scl/Tal1 (stem cell leukemia gene/T-cell acute leukemia gene 1). Loss of Scl results in early embryonic death and total lack of all blood cells, yet deactivation of Scl in the adult does not affect HSC function (Mikkola et al., 2003b. In order to define the temporal window of Scl requirement during fetal hematopoietic development, we deactivated Scl in all hematopoietic lineages shortly after hematopoietic specification in the embryo . Interestingly, maturation, expansion and function of fetal HSCs was unaffected, and, as in the adult, red blood cell and platelet differentiation was impaired (Paper III)(Schlaeger et al., 2005). These findings highlight that, once specified, the hematopoietic fate is stable even in the absence of Scl and is maintained through mechanisms that are distinct from those required for the initial fate choice. As the critical downstream targets of Scl remain unknown, we sought to identify and characterize target genes of Scl (Paper IV). We could identify transcription factor Mef2C (myocyte enhancer factor 2 C) as a novel direct target gene of Scl specifically in the megakaryocyte lineage which largely explains the megakaryocyte defect observed in Scl deficient mice. In addition, we observed an Scl-independent requirement of Mef2C in the B-cell compartment, as loss of Mef2C leads to accelerated B-cell aging (Gekas et al. Submitted). Taken together, these studies identify key extracellular microenvironments and intracellular transcriptional regulators that dictate different stages of HSC development, from emergence to lineage choice to aging.
Resumo:
This dissertation deals with the design, fabrication, and applications of microscale electrospray ionization chips for mass spectrometry. The microchip consists of microchannel, which leads to a sharp electrospray tip. Microchannel contain micropillars that facilitate a powerful capillary action in the channels. The capillary action delivers the liquid sample to the electrospray tip, which sprays the liquid sample to gas phase ions that can be analyzed with mass spectrometry. The microchip uses a high voltage, which can be utilized as a valve between the microchip and mass spectrometry. The microchips can be used in various applications, such as for analyses of drugs, proteins, peptides, or metabolites. The microchip works without pumps for liquid transfer, is usable for rapid analyses, and is sensitive. The characteristics of performance of the single microchips are studied and a rotating multitip version of the microchips are designed and fabricated. It is possible to use the microchip also as a microreactor and reaction products can be detected online with mass spectrometry. This property can be utilized for protein identification for example. Proteins can be digested enzymatically on-chip and reaction products, which are in this case peptides, can be detected with mass spectrometry. Because reactions occur faster in a microscale due to shorter diffusion lengths, the amount of protein can be very low, which is a benefit of the method. The microchip is well suited to surface activated reactions because of a high surface-to-volume ratio due to a dense micropillar array. For example, titanium dioxide nanolayer on the micropillar array combined with UV radiation produces photocatalytic reactions which can be used for mimicking drug metabolism biotransformation reactions. Rapid mimicking with the microchip eases the detection of possibly toxic compounds in preclinical research and therefore could speed up the research of new drugs. A micropillar array chip can also be utilized in the fabrication of liquid chromatographic columns. Precisely ordered micropillar arrays offer a very homogenous column, where separation of compounds has been demonstrated by using both laser induced fluorescence and mass spectrometry. Because of small dimensions on the microchip, the integrated microchip based liquid chromatography electrospray microchip is especially well suited to low sample concentrations. Overall, this work demonstrates that the designed and fabricated silicon/glass three dimensionally sharp electrospray tip is unique and facilitates stable ion spray for mass spectrometry.
Resumo:
The increasing variability in device leakage has made the design of keepers for wide OR structures a challenging task. The conventional feedback keepers (CONV) can no longer improve the performance of wide dynamic gates for the future technologies. In this paper, we propose an adaptive keeper technique called rate sensing keeper (RSK) that enables faster switching and tracks the variation across different process corners. It can switch upto 1.9x faster (for 20 legs) than CONV and can scale upto 32 legs as against 20 legs for CONV in a 130-nm 1.2-V process. The delay tracking is within 8% across the different process corners. We demonstrate the circuit operation of RSK using a 32 x 8 register file implemented in an industrial 130-nm 1.2-V CMOS process. The performance of individual dynamic logic gates are also evaluated on chip for various keeper techniques. We show that the RSK technique gives superior performance compared to the other alternatives such as Conditional Keeper (CKP) and current mirror-based keeper (LCR).
Resumo:
his paper studies the problem of designing a logical topology over a wavelength-routed all-optical network (AON) physical topology, The physical topology consists of the nodes and fiber links in the network, On an AON physical topology, we can set up lightpaths between pairs of nodes, where a lightpath represents a direct optical connection without any intermediate electronics, The set of lightpaths along with the nodes constitutes the logical topology, For a given network physical topology and traffic pattern (relative traffic distribution among the source-destination pairs), our objective is to design the logical topology and the routing algorithm on that topology so as to minimize the network congestion while constraining the average delay seen by a source-destination pair and the amount of processing required at the nodes (degree of the logical topology), We will see that ignoring the delay constraints can result in fairly convoluted logical topologies with very long delays, On the other hand, in all our examples, imposing it results in a minimal increase in congestion, While the number of wavelengths required to imbed the resulting logical topology on the physical all optical topology is also a constraint in general, we find that in many cases of interest this number can be quite small, We formulate the combined logical topology design and routing problem described above (ignoring the constraint on the number of available wavelengths) as a mixed integer linear programming problem which we then solve for a number of cases of a six-node network, Since this programming problem is computationally intractable for larger networks, we split it into two subproblems: logical topology design, which is computationally hard and will probably require heuristic algorithms, and routing, which can be solved by a linear program, We then compare the performance of several heuristic topology design algorithms (that do take wavelength assignment constraints into account) against that of randomly generated topologies, as well as lower bounds derived in the paper.
Resumo:
In recent years, parallel computers have been attracting attention for simulating artificial neural networks (ANN). This is due to the inherent parallelism in ANN. This work is aimed at studying ways of parallelizing adaptive resonance theory (ART), a popular neural network algorithm. The core computations of ART are separated and different strategies of parallelizing ART are discussed. We present mapping strategies for ART 2-A neural network onto ring and mesh architectures. The required parallel architecture is simulated using a parallel architectural simulator, PROTEUS and parallel programs are written using a superset of C for the algorithms presented. A simulation-based scalability study of the algorithm-architecture match is carried out. The various overheads are identified in order to suggest ways of improving the performance. Our main objective is to find out the performance of the ART2-A network on different parallel architectures. (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
In this work, we evaluate performance of a real-world image processing application that uses a cross-correlation algorithm to compare a given image with a reference one. The algorithm processes individual images represented as 2-dimensional matrices of single-precision floating-point values using O(n4) operations involving dot-products and additions. We implement this algorithm on a nVidia GTX 285 GPU using CUDA, and also parallelize it for the Intel Xeon (Nehalem) and IBM Power7 processors, using both manual and automatic techniques. Pthreads and OpenMP with SSE and VSX vector intrinsics are used for the manually parallelized version, while a state-of-the-art optimization framework based on the polyhedral model is used for automatic compiler parallelization and optimization. The performance of this algorithm on the nVidia GPU suffers from: (1) a smaller shared memory, (2) unaligned device memory access patterns, (3) expensive atomic operations, and (4) weaker single-thread performance. On commodity multi-core processors, the application dataset is small enough to fit in caches, and when parallelized using a combination of task and short-vector data parallelism (via SSE/VSX) or through fully automatic optimization from the compiler, the application matches or beats the performance of the GPU version. The primary reasons for better multi-core performance include larger and faster caches, higher clock frequency, higher on-chip memory bandwidth, and better compiler optimization and support for parallelization. The best performing versions on the Power7, Nehalem, and GTX 285 run in 1.02s, 1.82s, and 1.75s, respectively. These results conclusively demonstrate that, under certain conditions, it is possible for a FLOP-intensive structured application running on a multi-core processor to match or even beat the performance of an equivalent GPU version.
Resumo:
Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences critical system design objectives like area, power and performance. Hence the embedded system designer performs a complete memory architecture exploration to custom design a memory architecture for a given set of applications. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of exhaustive-search based memory exploration at the outer level and a two step based integrated data layout for SPRAM-Cache based architectures at the inner level. We present a two step integrated approach for data layout for SPRAM-Cache based hybrid architectures with the first step as data-partitioning that partitions data between SPRAM and Cache, and the second step is the cache conscious data layout. We formulate the cache-conscious data layout as a graph partitioning problem and show that our approach gives up to 34% improvement over an existing approach and also optimizes the off-chip memory address space. We experimented our approach with 3 embedded multimedia applications and our approach explores several hundred memory configurations for each application, yielding several optimal design points in a few hours of computation on a standard desktop.
Resumo:
A low-power frequency multiplication technique, developed for ZigBee (IEEE 802.15.4) like applications is presented. We have provided an estimate for the power consumption for a given output voltage swing using our technique. The advantages and disadvantages which determine the application areas of the technique are discussed. The issues related to design, layout and process variation are also addressed. Finally, a design is presented for operation in 2.405-2.485-GHz band of ZigBee receiver. SpectreRF simulations show 30% improvement in efficiency for our circuit with regard to conversion of DC bias current to output amplitude, against a LC-VCO. To establish the low-power credentials, we have compared our circuit with an existing technique; our circuit performs better with just 1/3 of total current from supply, and uses one inductor as against three in the latter case. A test chip was implemented in UMC 0.13-mum RF process with spiral on-chip inductors and MIM (metal-insulator-metal) capacitor option.
Resumo:
Denial-of-service (DoS) attacks form a very important category of security threats that are prevalent in MIPv6 (mobile internet protocol version 6) today. Many schemes have been proposed to alleviate such threats, including one of our own [9]. However, reasoning about the correctness of such protocols is not trivial. In addition, new solutions to mitigate attacks may need to be deployed in the network on a frequent basis as and when attacks are detected, as it is practically impossible to anticipate all attacks and provide solutions in advance. This makes it necessary to validate the solutions in a timely manner before deployment in the real network. However, threshold schemes needed in group protocols make analysis complex. Model checking threshold-based group protocols that employ cryptography have not been successful so far. Here, we propose a new simulation based approach for validation using a tool called FRAMOGR that supports executable specification of group protocols that use cryptography. FRAMOGR allows one to specify attackers and track probability distributions of values or paths. We believe that infrastructure such as FRAMOGR would be required in future for validating new group based threshold protocols that may be needed for making MIPv6 more robust.
Resumo:
An all-digital on-chip clock skew measurement system via subsampling is presented. The clock nodes are sub-sampled with a near-frequency asynchronous sampling clock to result in beat signals which are themselves skewed in the same proportion but on a larger time scale. The beat signals are then suitably masked to extract only the skews of the rising edges of the clock signals. We propose a histogram of the arithmetic difference of the beat signals which decouples the relationship of clock jitter to the minimum measurable skew, and allows skews arbitrarily close to zero to be measured with a precision limited largely by measurement time, unlike the conventional XOR based histogram approach. We also analytically show that the proposed approach leads to an unbiased estimate of skew. The measured results from a 65 nm delay measurement front-end indicate that for an input skew range of +/- 1 fan-out-of-4 (FO4) delay, +/- 3 sigma resolution of 0.84 ps can be obtained with an integral error of 0.65 ps. We also experimentally demonstrate that a frequency modulation on a sampling clock maintains precision, indicating the robustness of the technique to jitter. We also show how FM modulation helps in restoring precision in case of rationally related clocks.
Resumo:
The success of an ABV IP depends highly on the associated debugging environment. An efficient debugging environment helps the user to find out the exact location of the failure. Moreover, it provides information to the user in a refined detail of abstraction and permit adequate interaction. It has also been realized that adequate visualization support helps in tracking the behavioral aspects of the Design Under Test (DUT). Currently, the debugging tools provide information in the signal level and do not provide any information about the high-level behavior of the DUT. We present a debugging framework that takes the design specification, assertions and the user intent in a simple format and provides detailed information by processing the design trace on-line, or off-line. We also present a visualization framework to ease the debugging procedure. We have experimented with industrial standard on-chip bus protocols that ensure that this utility can be incorporated successfully in the present functional verification flow.