972 resultados para Performance improvements
Resumo:
The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented.
Resumo:
Il tennis è uno sport molto diffuso che negli ultimi trent’anni ha subito molti cambiamenti. Con l’avvento di nuovi materiali più leggeri e maneggevoli la velocità della palla è aumentata notevolmente, rendendo così necessario una modifica a livello tecnico dei colpi fondamentali. Dalla ricerca bibliografica sono emerse interessanti indicazioni su angoli e posizioni corporee ideali da mantenere durante le varie fasi dei colpi, confrontando i giocatori di altissimo livello. Non vi sono invece indicazioni per i maestri di tennis su quali siano i parametri più importanti da allenare a seconda del livello di gioco del proprio atleta. Lo scopo di questa tesi è quello di individuare quali siano le variabili tecniche che influenzano i colpi del diritto e del servizio confrontando atleti di genere differente, giocatori di livello di gioco diverso (esperti, intermedi, principianti) e dopo un anno di attività programmata. Confrontando giocatori adulti di genere diverso, è emerso che le principali differenze sono legate alle variabili di prestazione (velocità della palla e della racchetta) per entrambi i colpi. Questi dati sono simili a quelli riscontrati nel test del lancio della palla, un gesto non influenzato dalla tecnica del colpo. Le differenze tecniche di genere sono poco rilevanti ed attribuibili alla diversa interpretazione dei soggetti. Nel confronto di atleti di vario livello di gioco le variabili di prestazione presentano evidenti differenze, che possono essere messe in relazione con alcune differenze tecniche rilevate nei gesti specifici. Nel servizio i principianti tendono a direzionare l’arto superiore dominante verso la zona bersaglio, abducendo maggiormente la spalla ed avendo il centro della racchetta più a destra rispetto al polso. Inoltre, effettuano un caricamento minore degli arti inferiori, del tronco e del gomito. Per quanto riguarda il diritto si possono evidenziare queste differenze: l’arto superiore è sempre maggiormente esteso per il gruppo dei principianti; il tronco, nei giocatori più abili viene utilizzato in maniera più marcata, durante la fase di caricamento, in movimenti di torsione e di inclinazione laterale. Gli altri due gruppi hanno maggior difficoltà nell’eseguire queste azioni preparatorie, in particolare gli atleti principianti. Dopo un anno di attività programmata sono stati evidenziati miglioramenti prestativi. Anche dal punto di vista tecnico sono state notate delle differenze che possono spiegare il miglioramento della performance nei colpi. Nel servizio l’arto superiore si estende maggiormente per colpire la palla più in alto possibile. Nel diritto sono da sottolineare soprattutto i miglioramenti dei movimenti del tronco in torsione ed in inclinazione laterale. Quindi l’atleta si avvicina progressivamente ad un’esecuzione tecnica corretta. In conclusione, dal punto di vista tecnico non sono state rilevate grosse differenze tra i due generi che possano spiegare le differenze di performance. Perciò questa è legata più ad un fattore di forza che dovrà essere allenata con un programma specifico. Nel confronto fra i vari livelli di gioco e gli effetti di un anno di pratica si possono individuare variabili tecniche che mostrano differenze significative tra i gruppi sperimentali. Gli evoluti utilizzano tutto il corpo per effettuare dei colpi più potenti, utilizzando in maniera tecnicamente più valida gli arti inferiori, il tronco e l’arto superiore. I principianti utilizzano prevalentemente l’arto superiore con contributi meno evidenti degli altri segmenti. Dopo un anno di attività i soggetti esaminati hanno dimostrato di saper utilizzare meglio il tronco e l’arto superiore e ciò può spiegare il miglioramento della performance. Si può ipotizzare che, per il corretto utilizzo degli arti inferiori, sia necessario un tempo più lungo di apprendimento oppure un allenamento più specifico.
Resumo:
This thesis deal with the design of advanced OFDM systems. Both waveform and receiver design have been treated. The main scope of the Thesis is to study, create, and propose, ideas and novel design solutions able to cope with the weaknesses and crucial aspects of modern OFDM systems. Starting from the the transmitter side, the problem represented by low resilience to non-linear distortion has been assessed. A novel technique that considerably reduces the Peak-to-Average Power Ratio (PAPR) yielding a quasi constant signal envelope in the time domain (PAPR close to 1 dB) has been proposed.The proposed technique, named Rotation Invariant Subcarrier Mapping (RISM),is a novel scheme for subcarriers data mapping,where the symbols belonging to the modulation alphabet are not anchored, but maintain some degrees of freedom. In other words, a bit tuple is not mapped on a single point, rather it is mapped onto a geometrical locus, which is totally or partially rotation invariant. The final positions of the transmitted complex symbols are chosen by an iterative optimization process in order to minimize the PAPR of the resulting OFDM symbol. Numerical results confirm that RISM makes OFDM usable even in severe non-linear channels. Another well known problem which has been tackled is the vulnerability to synchronization errors. Indeed in OFDM system an accurate recovery of carrier frequency and symbol timing is crucial for the proper demodulation of the received packets. In general, timing and frequency synchronization is performed in two separate phases called PRE-FFT and POST-FFT synchronization. Regarding the PRE-FFT phase, a novel joint symbol timing and carrier frequency synchronization algorithm has been presented. The proposed algorithm is characterized by a very low hardware complexity, and, at the same time, it guarantees very good performance in in both AWGN and multipath channels. Regarding the POST-FFT phase, a novel approach for both pilot structure and receiver design has been presented. In particular, a novel pilot pattern has been introduced in order to minimize the occurrence of overlaps between two pattern shifted replicas. This allows to replace conventional pilots with nulls in the frequency domain, introducing the so called Silent Pilots. As a result, the optimal receiver turns out to be very robust against severe Rayleigh fading multipath and characterized by low complexity. Performance of this approach has been analytically and numerically evaluated. Comparing the proposed approach with state of the art alternatives, in both AWGN and multipath fading channels, considerable performance improvements have been obtained. The crucial problem of channel estimation has been thoroughly investigated, with particular emphasis on the decimation of the Channel Impulse Response (CIR) through the selection of the Most Significant Samples (MSSs). In this contest our contribution is twofold, from the theoretical side, we derived lower bounds on the estimation mean-square error (MSE) performance for any MSS selection strategy,from the receiver design we proposed novel MSS selection strategies which have been shown to approach these MSE lower bounds, and outperformed the state-of-the-art alternatives. Finally, the possibility of using of Single Carrier Frequency Division Multiple Access (SC-FDMA) in the Broadband Satellite Return Channel has been assessed. Notably, SC-FDMA is able to improve the physical layer spectral efficiency with respect to single carrier systems, which have been used so far in the Return Channel Satellite (RCS) standards. However, it requires a strict synchronization and it is also sensitive to phase noise of local radio frequency oscillators. For this reason, an effective pilot tone arrangement within the SC-FDMA frame, and a novel Joint Multi-User (JMU) estimation method for the SC-FDMA, has been proposed. As shown by numerical results, the proposed scheme manages to satisfy strict synchronization requirements and to guarantee a proper demodulation of the received signal.
Resumo:
It is usual to hear a strange short sentence: «Random is better than...». Why is randomness a good solution to a certain engineering problem? There are many possible answers, and all of them are related to the considered topic. In this thesis I will discuss about two crucial topics that take advantage by randomizing some waveforms involved in signals manipulations. In particular, advantages are guaranteed by shaping the second order statistic of antipodal sequences involved in an intermediate signal processing stages. The first topic is in the area of analog-to-digital conversion, and it is named Compressive Sensing (CS). CS is a novel paradigm in signal processing that tries to merge signal acquisition and compression at the same time. Consequently it allows to direct acquire a signal in a compressed form. In this thesis, after an ample description of the CS methodology and its related architectures, I will present a new approach that tries to achieve high compression by design the second order statistics of a set of additional waveforms involved in the signal acquisition/compression stage. The second topic addressed in this thesis is in the area of communication system, in particular I focused the attention on ultra-wideband (UWB) systems. An option to produce and decode UWB signals is direct-sequence spreading with multiple access based on code division (DS-CDMA). Focusing on this methodology, I will address the coexistence of a DS-CDMA system with a narrowband interferer. To do so, I minimize the joint effect of both multiple access (MAI) and narrowband (NBI) interference on a simple matched filter receiver. I will show that, when spreading sequence statistical properties are suitably designed, performance improvements are possible with respect to a system exploiting chaos-based sequences minimizing MAI only.
Resumo:
MultiProcessor Systems-on-Chip (MPSoC) are the core of nowadays and next generation computing platforms. Their relevance in the global market continuously increase, occupying an important role both in everydaylife products (e.g. smartphones, tablets, laptops, cars) and in strategical market sectors as aviation, defense, robotics, medicine. Despite of the incredible performance improvements in the recent years processors manufacturers have had to deal with issues, commonly called “Walls”, that have hindered the processors development. After the famous “Power Wall”, that limited the maximum frequency of a single core and marked the birth of the modern multiprocessors system-on-chip, the “Thermal Wall” and the “Utilization Wall” are the actual key limiter for performance improvements. The former concerns the damaging effects of the high temperature on the chip caused by the large power densities dissipation, whereas the second refers to the impossibility of fully exploiting the computing power of the processor due to the limitations on power and temperature budgets. In this thesis we faced these challenges by developing efficient and reliable solutions able to maximize performance while limiting the maximum temperature below a fixed critical threshold and saving energy. This has been possible by exploiting the Model Predictive Controller (MPC) paradigm that solves an optimization problem subject to constraints in order to find the optimal control decisions for the future interval. A fully-distributedMPC-based thermal controller with a far lower complexity respect to a centralized one has been developed. The control feasibility and interesting properties for the simplification of the control design has been proved by studying a partial differential equation thermal model. Finally, the controller has been efficiently included in more complex control schemes able to minimize energy consumption and deal with mixed-criticalities tasks
Resumo:
The reserves of gasoline and diesel fuels are ever decreasing, which plays an important role in the technological development of automobiles. Numerous countries, especially the United States, wish to slowly decrease their fuel dependence on other countries by producing in house renewable fuels like biodiesels or ethanol. Therefore, the new automobile engines have to successfully run on a variety of fuels without significant changes to their designs. The current study focuses on assessing the potential of ethanol fuels to improve the performance of 'flex-fuel SI engines,' which literally means 'engines that are flexible in their fuel requirement.' Another important area within spark ignition (SI) engine research is the implementation of new technologies like Variable Valve Timing (VVT) or Variable Compression Ratio (VCR) to improve engine performance. These technologies add more complexity to the original system by adding extra degrees of freedom. Therefore, the potential of these technologies has to be evaluated before they are installed in any SI engine. The current study focuses on evaluating the advantages and drawbacks of these technologies, primarily from an engine brake efficiency perspective. The results show a significant improvement in engine efficiency with the use of VVT and VCR together. Spark ignition engines always operate at a lower compression ratio as compared to compression ignition (CI) engines primarily due to knock constraints. Therefore, even if the use of a higher compression ratio would result in a significant improvement in SI engine efficiency, the engine may still operate at a lower compression ratio due to knock limitations. Ethanol fuels extend the knock limit making the use of higher compression ratios possible. Hence, the current study focuses on using VVT, VCR, and ethanol-gasoline blends to improve overall engine performance. The results show that these technologies promise definite engine performance improvements provided both their positive and negative potentials have been evaluated prior to installation.
Resumo:
This study will look at the passenger air bag (PAB) performance in a fix vehicle environment using Partial Low Risk Deployment (PLRD) as a strategy. This development will follow test methods against actual baseline vehicle data and Federal Motor Vehicle Safety Standards 208 (FMVSS 208). FMVSS 208 states that PAB compliance in vehicle crash testing can be met using one of three deployment methods. The primary method suppresses PAB deployment, with the use of a seat weight sensor or occupant classification sensor (OCS), for three-year old and six-year old occupants including the presence of a child seat. A second method, PLRD allows deployment on all size occupants suppressing only for the presents of a child seat. A third method is Low Risk Deployment (LRD) which allows PAB deployment in all conditions, all statures including any/all child seats. This study outlines a PLRD development solution for achieving FMVSS 208 performance. The results of this study should provide an option for system implementation including opportunities for system efficiency and other considerations. The objective is to achieve performance levels similar too or incrementally better than the baseline vehicles National Crash Assessment Program (NCAP) Star rating. In addition, to define systemic flexibility where restraint features can be added or removed while improving occupant performance consistency to the baseline. A certified vehicles’ air bag system will typically remain in production until the vehicle platform is redesigned. The strategy to enable the PLRD hypothesis will be to first match the baseline out of position occupant performance (OOP) for the three and six-year old requirements. Second, improve the 35mph belted 5th percentile female NCAP star rating over the baseline vehicle. Third establish an equivalent FMVSS 208 certification for the 25mph unbelted 50th percentile male. FMVSS 208 high-speed requirement defines the federal minimum crash performance required for meeting frontal vehicle crash-test compliance. The intent of NCAP 5-Star rating is to provide the consumer with information about crash protection, beyond what is required by federal law. In this study, two vehicles segments were used for testing to compare and contrast to their baseline vehicles performance. Case Study 1 (CS1) used a cross over vehicle platform and Case Study 2 (CS2) used a small vehicle segment platform as their baselines. In each case study, the restraints systems were from different restraint supplier manufactures and each case contained that suppliers approach to PLRD. CS1 incorporated a downsized twins shaped bag, a carryover inflator, standard vents, and a strategic positioned bag diffuser to help disperse the flow of gas to improve OOP. The twin shaped bag with two segregated sections (lobes) to enabled high-speed baseline performance correlation on the HYGE Sled. CS2 used an A-Symmetric (square shape) PAB with standard size vents, including a passive vent, to obtain OOP similar to the baseline. The A-Symmetric shape bag also helped to enabled high-speed baseline performance improvements in HYGE Sled testing in CS2. The anticipated CS1 baseline vehicle-pulse-index (VPI) target was in the range of 65-67. However, actual dynamic vehicle (barrier) testing was overshadowed with the highest crash pulse from the previous tested vehicles with a VPI of 71. The result from the 35mph NCAP Barrier test was a solid 4-Star (4.7 Star) respectfully. In CS2, the vehicle HYGE Sled development VPI range, from the baseline was 61-62 respectively. Actual NCAP test produced a chest deflection result of 26mm versus the anticipated baseline target of 12mm. The initial assessment of this condition was thought to be due to the vehicles significant VPI increase to 67. A subsequent root cause investigation confirmed a data integrity issue due to the instrumentation. In an effort to establish a true vehicle test data point a second NCAP test was performed but faced similar instrumentation issues. As a result, the chest deflect hit the target of 12.1mm; however a femur load spike, similar to the baseline, now skewed the results. With noted level of performance improvement in chest deflection, the NCAP star was assessed as directional for 5-Star capable performance. With an actual rating of 3-Star due to instrumentation, using data extrapolation raised the ratings to 5-Star. In both cases, no structural changes were made to the surrogate vehicle and the results in each case matched their perspective baseline vehicle platforms. These results proved the PLRD is viable for further development and production implementation.
Resumo:
ABSTRACT Everyday routine in general and school settings in particular make high demands on children's abilities to sustain their focus of attention over longer time periods. School tasks thus require the child to accomplish the task on an appropriate level of performance while maintaining the focus of attention even under repetitious or distracting conditions. However, sustained attention (SA) may be a more heterogeneous construct than commonly assumed as it requires the individual not only to sustain attentional capacities but also to store and maintain the task rule (working memory), to inhibit inappropriate responses (inhibition), and to switch according to requirements (switching). It might thus involve processes counted among executive functions (EF). In the present study, performance in EF tasks (covering the core components inhibition, switching, and working memory) and in a SA task was assessed in 118 children, aged between 5;0 and 8;11 years. Similar age-dependent performance trajectories were found in EF components and SA, indicating ongoing performance improvements between 5 until at least 8 years of age in SA and in EF. Interrelations between single EF components and SA showed to be small to moderate. Finally, different patterns of SA performance predictions were found in age-homogeneous subgroups with inhibition being crucial for SA performance in the youngest and switching in the oldest age group. Taken as a whole, even though similarities in assumed developmental trajectories and substantial interrelations point to common underlying processes in EF and SA, age-dependent patterns of explained variance indicate clear discriminability.
Resumo:
Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also,a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with medium to large parallel execution overheads.
Resumo:
Several models for context-sensitive analysis of modular programs have been proposed, each with different characteristics and representing different trade-offs. The advantage of these context-sensitive analyses is that they provide information which is potentially more accurate than that provided by context-free analyses. Such information can then be applied to validating/debugging the program and/or to specializing the program in order to obtain important performance improvements. Some very preliminary experimental results have also been reported for some of these models which provided initial evidence on their potential. However, further experimentation, which is needed in order to understand the many issues left open and to show that the proposed modes scale and are usable in the context of large, real-life modular programs, was left as future work. The aim of this paper is two-fold. On one hand we provide an empirical comparison of the different models proposed in previous work, as well as experimental data on the different choices left open in those designs. On the other hand we explore the scalability of these models by using larger modular programs as benchmarks. The results have been obtained from a realistic implementation of the models, integrated in a production-quality compiler (CiaoPP/Ciao). Our experimental results shed light on the practical implications of the different design choices and of the models themselves. We also show that contextsensitive analysis of modular programs is indeed feasible in practice, and that in certain critical cases it provides better performance results than those achievable by analyzing the whole program at once, specially in terms of memory consumption and when reanalyzing after making changes to a program, as is often the case during program development.
Resumo:
Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also, a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with médium to large parallel execution overheads.
Resumo:
While logic programming languages offer a great deal of scope for parallelism, there is usually some overhead associated with the execution of goals in parallel because of the work involved in task creation and scheduling. In practice, therefore, the "granularity" of a goal, i.e. an estimate of the work available under it, should be taken into account when deciding whether or not to execute a goal concurrently as a sepárate task. This paper describes a method for estimating the granularity of a goal at compile time. The runtime overhead associated with our approach is usually quite small, and the performance improvements resulting from the incorporation of grainsize control can be quite good. This is shown by means of experimental results.
Resumo:
Data grid services have been used to deal with the increasing needs of applications in terms of data volume and throughput. The large scale, heterogeneity and dynamism of grid environments often make management and tuning of these data services very complex. Furthermore, current high-performance I/O approaches are characterized by their high complexity and specific features that usually require specialized administrator skills. Autonomic computing can help manage this complexity. The present paper describes an autonomic subsystem intended to provide self-management features aimed at efficiently reducing the I/O problem in a grid environment, thereby enhancing the quality of service (QoS) of data access and storage services in the grid. Our proposal takes into account that data produced in an I/O system is not usually immediately required. Therefore, performance improvements are related not only to current but also to any future I/O access, as the actual data access usually occurs later on. Nevertheless, the exact time of the next I/O operations is unknown. Thus, our approach proposes a long-term prediction designed to forecast the future workload of grid components. This enables the autonomic subsystem to determine the optimal data placement to improve both current and future I/O operations.
Resumo:
Neuroimaging studies provide evidence for organized intrinsic activity under task-free conditions. This activity serves functionally relevant brain systems supporting cognition. Here, we analyze changes in resting-state functional connectivity after videogame practice applying a test–retest design. Twenty young females were selected from a group of 100 participants tested on four standardized cognitive ability tests. The practice and control groups were carefully matched on their ability scores. The practice group played during two sessions per week across 4 weeks (16 h total) under strict supervision in the laboratory, showing systematic performance improvements in the game. A group independent component analysis (GICA) applying multisession temporal concatenation on test–retest resting-state fMRI, jointly with a dual-regression approach, was computed. Supporting the main hypothesis, the key finding reveals an increased correlated activity during rest in certain predefined resting state networks (albeit using uncorrected statistics) attributable to practice with the cognitively demanding tasks of the videogame. Observed changes were mainly concentrated on parietofrontal networks involved in heterogeneous cognitive functions.
Resumo:
Modern object oriented languages like C# and JAVA enable developers to build complex application in less time. These languages are based on selecting heap allocated pass-by-reference objects for user defined data structures. This simplifies programming by automatically managing memory allocation and deallocation in conjunction with automated garbage collection. This simplification of programming comes at the cost of performance. Using pass-by-reference objects instead of lighter weight pass-by value structs can have memory impact in some cases. These costs can be critical when these application runs on limited resource environments such as mobile devices and cloud computing systems. We explore the problem by using the simple and uniform memory model to improve the performance. In this work we address this problem by providing an automated and sounds static conversion analysis which identifies if a by reference type can be safely converted to a by value type where the conversion may result in performance improvements. This works focus on C# programs. Our approach is based on a combination of syntactic and semantic checks to identify classes that are safe to convert. We evaluate the effectiveness of our work in identifying convertible types and impact of this transformation. The result shows that the transformation of reference type to value type can have substantial performance impact in practice. In our case studies we optimize the performance in Barnes-Hut program which shows total memory allocation decreased by 93% and execution time also reduced by 15%.