995 resultados para Memory architecture
Resumo:
JNK1 is a MAP-kinase that has proven a significant player in the central nervous system. It regulates brain development and the maintenance of dendrites and axons. Several novel phosphorylation targets of JNK1 were identified in a screen performed in the Coffey lab. These proteins were mainly involved in the regulation of neuronal cytoskeleton, influencing the dynamics and stability of microtubules and actin. These structural proteins form the dynamic backbone for the elaborate architecture of the dendritic tree of a neuron. The initiation and branching of the dendrites requires a dynamic interplay between the cytoskeletal building blocks. Both microtubules and actin are decorated by associated proteins which regulate their dynamics. The dendrite-specific, high molecular weight microtubule associated protein 2 (MAP2) is an abundant protein in the brain, the binding of which stabilizes microtubules and influences their bundling. Its expression in non-neuronal cells induces the formation of neurite-like processes from the cell body, and its function is highly regulated by phosphorylation. JNK1 was shown to phosphorylate the proline-rich domain of MAP2 in vivo in a previous study performed in the group. Here we verify three threonine residues (T1619, T1622 and T1625) as JNK1 targets, the phosphorylation of which increases the binding of MAP2 to microtubules. This binding stabilizes the microtubules and increases process formation in non-neuronal cells. Phosphorylation-site mutants were engineered in the lab. The non-phosphorylatable mutant of MAP2 (MAP2- T1619A, T1622A, T1625A) in these residues fails to bind microtubules, while the pseudo-phosphorylated form, MAP2- T1619D, T1622D, Thr1625D, efficiently binds and induces process formation even without the presence of active JNK1. Ectopic expression of the MAP2- T1619D, T1622D, Thr1625D in vivo in mouse brain led to a striking increase in the branching of cortical layer 2/3 (L2/3) pyramidal neurons, compared to MAP2-WT. The dendritic complexity defines the receptive field of a neuron and dictates the output to the postsynaptic cells. Previous studies in the group indicated altered dendrite architecture of the pyramidal neurons in the Jnk1-/- mouse motor cortex. Here, we used Lucifer Yellow loading and Sholl analysis of neurons in order to study the dendritic branching in more detail. We report a striking, opposing effect in the absence of Jnk1 in the cortical layers 2/3 and 5 of the primary motor cortex. The basal dendrites of pyramidal neurons close to the pial surface at L2/3 show a reduced complexity. In contrast, the L5 neurons, which receive massive input from the L2/3 neurons, show greatly increased branching. Another novel substrate identified for JNK1 was MARCKSL1, a protein that regulates actin dynamics. It is highly expressed in neurons, but also in various cancer tissues. Three phosphorylation target residues for JNK1 were identified, and it was demonstrated that their phosphorylation reduces actin turnover and retards migration of these cells. Actin is the main cytoskeletal component in dendritic spines, the site of most excitatory synapses in pyramidal neurons. The density and gross morphology of the Lucifer Yellow filled dendrites were characterized and we show reduced density and altered morphology of spines in the motor cortex and in the hippocampal area CA3. The dynamic dendritic spines are widely considered to function as the cellular correlate during learning. We used a Morris water maze to test spatial memory. Here, the wild-type mice outperformed the knock-out mice during the acquisition phase of the experiment indicating impaired special memory. The L5 pyramidal neurons of the motor cortex project to the spinal cord and regulate the movement of distinct muscle groups. Thus the altered dendrite morphology in the motor cortex was expected to have an effect on the input-output balance in the signaling from the cortex to the lower motor circuits. A battery of behavioral tests were conducted for the wild-type and Jnk1-/- mice, and the knock-outs performed poorly compared to wild-type mice in tests assessing balance and fine motor movements. This study expands our knowledge of JNK1 as an important regulator of the dendritic fields of neurons and their manifestations in behavior.
Resumo:
Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.Embedded systems are usually designed for a single or a specified set of tasks. This specificity means the system design as well as its hardware/software development can be highly optimized. Embedded software must meet the requirements such as high reliability operation on resource-constrained platforms, real time constraints and rapid development. This necessitates the adoption of static machine codes analysis tools running on a host machine for the validation and optimization of embedded system codes, which can help meet all of these goals. This could significantly augment the software quality and is still a challenging field.This dissertation contributes to an architecture oriented code validation, error localization and optimization technique assisting the embedded system designer in software debugging, to make it more effective at early detection of software bugs that are otherwise hard to detect, using the static analysis of machine codes. The focus of this work is to develop methods that automatically localize faults as well as optimize the code and thus improve the debugging process as well as quality of the code.Validation is done with the help of rules of inferences formulated for the target processor. The rules govern the occurrence of illegitimate/out of place instructions and code sequences for executing the computational and integrated peripheral functions. The stipulated rules are encoded in propositional logic formulae and their compliance is tested individually in all possible execution paths of the application programs. An incorrect sequence of machine code pattern is identified using slicing techniques on the control flow graph generated from the machine code.An algorithm to assist the compiler to eliminate the redundant bank switching codes and decide on optimum data allocation to banked memory resulting in minimum number of bank switching codes in embedded system software is proposed. A relation matrix and a state transition diagram formed for the active memory bank state transition corresponding to each bank selection instruction is used for the detection of redundant codes. Instances of code redundancy based on the stipulated rules for the target processor are identified.This validation and optimization tool can be integrated to the system development environment. It is a novel approach independent of compiler/assembler, applicable to a wide range of processors once appropriate rules are formulated. Program states are identified mainly with machine code pattern, which drastically reduces the state space creation contributing to an improved state-of-the-art model checking. Though the technique described is general, the implementation is architecture oriented, and hence the feasibility study is conducted on PIC16F87X microcontrollers. The proposed tool will be very useful in steering novices towards correct use of difficult microcontroller features in developing embedded systems.
Resumo:
The Scheme86 and the HP Precision Architectures represent different trends in computer processor design. The former uses wide micro-instructions, parallel hardware, and a low latency memory interface. The latter encourages pipelined implementation and visible interlocks. To compare the merits of these approaches, algorithms frequently encountered in numerical and symbolic computation were hand-coded for each architecture. Timings were done in simulators and the results were evaluated to determine the speed of each design. Based on these measurements, conclusions were drawn as to which aspects of each architecture are suitable for a high- performance computer.
Resumo:
The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.
Resumo:
If we are to understand how we can build machines capable of broad purpose learning and reasoning, we must first aim to build systems that can represent, acquire, and reason about the kinds of commonsense knowledge that we humans have about the world. This endeavor suggests steps such as identifying the kinds of knowledge people commonly have about the world, constructing suitable knowledge representations, and exploring the mechanisms that people use to make judgments about the everyday world. In this work, I contribute to these goals by proposing an architecture for a system that can learn commonsense knowledge about the properties and behavior of objects in the world. The architecture described here augments previous machine learning systems in four ways: (1) it relies on a seven dimensional notion of context, built from information recently given to the system, to learn and reason about objects' properties; (2) it has multiple methods that it can use to reason about objects, so that when one method fails, it can fall back on others; (3) it illustrates the usefulness of reasoning about objects by thinking about their similarity to other, better known objects, and by inferring properties of objects from the categories that they belong to; and (4) it represents an attempt to build an autonomous learner and reasoner, that sets its own goals for learning about the world and deduces new facts by reflecting on its acquired knowledge. This thesis describes this architecture, as well as a first implementation, that can learn from sentences such as ``A blue bird flew to the tree'' and ``The small bird flew to the cage'' that birds can fly. One of the main contributions of this work lies in suggesting a further set of salient ideas about how we can build broader purpose commonsense artificial learners and reasoners.
Resumo:
The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc. Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed. In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.
Resumo:
Emerging evidence suggests that a group of dietary-derived phytochemicals known as flavonoids are able to induce improvements in memory acquisition, consolidation, storage and retrieval. These low molecular weight polyphenols are widespread in the human diet, are absorbed to only a limited degree and localise in the brain at low concentration. However, they have been found to be highly effective in reversing age-related declines in memory via their ability to interact with the cellular and molecular architecture of the brain responsible for memory. These interactions include an ability to activate signalling pathways, critical in controlling synaptic plasticity, and a potential to induce vascular effects capable of causing new nerve cell growth in the hippocampus. Their ability to activate the extracellular signal-regulated kinase (ERK1/2) and the protein kinase B (PKB/Akt) signalling pathways, leading to the activation of the cAMP response element-binding protein (CREB), a transcription factor responsible for increasing the expression of a number of neurotrophins important in de. ning memory, will be discussed. How these effects lead to improvements in memory through induction of synapse growth and connectivity, increases in dendritic spine density and the functional integration of old and new neurons will be illustrated. The overall goal of this critical review is to emphasize future areas of investigation as well as to highlight these dietary agents as promising candidates for the design of memory-enhancing drugs with relevance to normal and pathological brain ageing (161 references).
Resumo:
There is intense interest in the studies related to the potential of phytochemical-rich foods to prevent age-related neurodegeneration and cognitive decline. Recent evidence has indicated that a group of plant-derived compounds known as flavonoids may exert particularly powerful actions on mammalian cognition and may reverse age-related declines in memory and learning. In particular, evidence suggests that foods rich in three specific flavonoid sub-groups, the flavanols, anthocyanins and/or flavanones, possess the greatest potential to act on the cognitive processes. This review will highlight the evidence for the actions of such flavonoids, found most commonly in fruits, such as apples, berries and citrus, on cognitive behaviour and the underlying cellular architecture. Although the precise mechanisms by which these flavonoids act within the brain remain unresolved, the present review focuses on their ability to protect vulnerable neurons and enhance the function of existing neuronal structures, two processes known to be influenced by flavonoids and also known to underpin neuro-cognitive function. Most notably, we discuss their selective interactions with protein kinase and lipid kinase signalling cascades (i.e. phosphoinositide-3 kinase/Akt and mitogen-activated protein kinase pathways), which regulate transcription factors and gene expression involved in both synaptic plasticity and cerebrovascular blood flow. Overall, the review attempts to provide an initial insight into the potential impact of regular flavonoid-rich fruit consumption on normal or abnormal deteriorations in cognitive performance.
A benchmark-driven modelling approach for evaluating deployment choices on a multi-core architecture
Resumo:
The complexity of current and emerging architectures provides users with options about how best to use the available resources, but makes predicting performance challenging. In this work a benchmark-driven model is developed for a simple shallow water code on a Cray XE6 system, to explore how deployment choices such as domain decomposition and core affinity affect performance. The resource sharing present in modern multi-core architectures adds various levels of heterogeneity to the system. Shared resources often includes cache, memory, network controllers and in some cases floating point units (as in the AMD Bulldozer), which mean that the access time depends on the mapping of application tasks, and the core's location within the system. Heterogeneity further increases with the use of hardware-accelerators such as GPUs and the Intel Xeon Phi, where many specialist cores are attached to general-purpose cores. This trend for shared resources and non-uniform cores is expected to continue into the exascale era. The complexity of these systems means that various runtime scenarios are possible, and it has been found that under-populating nodes, altering the domain decomposition and non-standard task to core mappings can dramatically alter performance. To find this out, however, is often a process of trial and error. To better inform this process, a performance model was developed for a simple regular grid-based kernel code, shallow. The code comprises two distinct types of work, loop-based array updates and nearest-neighbour halo-exchanges. Separate performance models were developed for each part, both based on a similar methodology. Application specific benchmarks were run to measure performance for different problem sizes under different execution scenarios. These results were then fed into a performance model that derives resource usage for a given deployment scenario, with interpolation between results as necessary.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
This paper adresses the problem on processing biological data such as cardiac beats, audio and ultrasonic range, calculating wavelet coefficients in real time, with processor clock running at frequency of present ASIC's and FPGA. The Paralell Filter Architecture for DWT has been improved, calculating wavelet coefficients in real time with hardware reduced to 60%. The new architecture, which also processes IDWT, is implemented with the Radix-2 or the Booth-Wallace Constant multipliers. Including series memory register banks, one integrated circuit Signal Analyzer, ultrasonic range, is presented.
Resumo:
Transactional memory (TM) is a new synchronization mechanism devised to simplify parallel programming, thereby helping programmers to unleash the power of current multicore processors. Although software implementations of TM (STM) have been extensively analyzed in terms of runtime performance, little attention has been paid to an equally important constraint faced by nearly all computer systems: energy consumption. In this work we conduct a comprehensive study of energy and runtime tradeoff sin software transactional memory systems. We characterize the behavior of three state-of-the-art lock-based STM algorithms, along with three different conflict resolution schemes. As a result of this characterization, we propose a DVFS-based technique that can be integrated into the resolution policies so as to improve the energy-delay product (EDP). Experimental results show that our DVFS-enhanced policies are indeed beneficial for applications with high contention levels. Improvements of up to 59% in EDP can be observed in this scenario, with an average EDP reduction of 16% across the STAMP workloads. © 2012 IEEE.
Resumo:
Due to the lack of optical random access memory, optical fiber delay line (FDL) is currently the only way to implement optical buffering. Feed-forward and feedback are two kinds of FDL structures in optical buffering. Both have advantages and disadvantages. In this paper, we propose a more effective hybrid FDL architecture that combines the merits of both schemes. The core of this switch is the arrayed waveguide grating (AWG) and the tunable wavelength converter (TWC). It requires smaller optical device sizes and fewer wavelengths and has less noise than feedback architecture. At the same time, it can facilitate preemptive priority routing which feed-forward architecture cannot support. Our numerical results show that the new switch architecture significantly reduces packet loss probability.
Resumo:
Constructing ontology networks typically occurs at design time at the hands of knowledge engineers who assemble their components statically. There are, however, use cases where ontology networks need to be assembled upon request and processed at runtime, without altering the stored ontologies and without tampering with one another. These are what we call "virtual [ontology] networks", and keeping track of how an ontology changes in each virtual network is called "multiplexing". Issues may arise from the connectivity of ontology networks. In many cases, simple flat import schemes will not work, because many ontology managers can cause property assertions to be erroneously interpreted as annotations and ignored by reasoners. Also, multiple virtual networks should optimize their cumulative memory footprint, and where they cannot, this should occur for very limited periods of time. We claim that these problems should be handled by the software that serves these ontology networks, rather than by ontology engineering methodologies. We propose a method that spreads multiple virtual networks across a 3-tier structure, and can reduce the amount of erroneously interpreted axioms, under certain raw statement distributions across the ontologies. We assumed OWL as the core language handled by semantic applications in the framework at hand, due to the greater availability of reasoners and rule engines. We also verified that, in common OWL ontology management software, OWL axiom interpretation occurs in the worst case scenario of pre-order visit. To measure the effectiveness and space-efficiency of our solution, a Java and RESTful implementation was produced within an Apache project. We verified that a 3-tier structure can accommodate reasonably complex ontology networks better, in terms of the expressivity OWL axiom interpretation, than flat-tree import schemes can. We measured both the memory overhead of the additional components we put on top of traditional ontology networks, and the framework's caching capabilities.
Resumo:
The efficient emulation of a many-core architecture is a challenging task, each core could be emulated through a dedicated thread and such threads would be interleaved on an either single-core or a multi-core processor. The high number of context switches will results in an unacceptable performance. To support this kind of application, the GPU computational power is exploited in order to schedule the emulation threads on the GPU cores. This presents a non trivial divergence issue, since GPU computational power is offered through SIMD processing elements, that are forced to synchronously execute the same instruction on different memory portions. Thus, a new emulation technique is introduced in order to overcome this limitation: instead of providing a routine for each ISA opcode, the emulator mimics the behavior of the Micro Architecture level, here instructions are date that a unique routine takes as input. Our new technique has been implemented and compared with the classic emulation approach, in order to investigate the chance of a hybrid solution.