338 resultados para HDFS bottleneck
Resumo:
The Strait of Melaka is the longest strait in the world, stretching for about 800 km from the northern tip of Sumatra to Singapore. It exhibits a dual character like no other, being simultaneously a privileged linking passage of two seas and two knots of human civilization – India and China – and a »bottleneck« that constrains the maritime connections between them. Today, the latter aspect is globally dominant. The strait is considered and analysed mostly as an obstacle rather than a linking point: how to reach China from the West or elsewhere is no longer an issue, but securing the vital flows that pass into the strait on a daily basis undoubtedly is. Accidents, natural catastrophes, political local crises or terrorist attacks are permanent dangers that could cut this umbilical cord of world trade and jeopardize a particularly sensitive and vulnerable area; piracy and pollution are the most common local threats and vulnerabilities.
Resumo:
Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↵ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead. In this dissertation, we present a scalability study of two general purpose IMDBs on multicore systems. The results show that current general purpose IMDBs do not scale on multicores, due to contention among threads running concurrent transactions. In this work, we explore di↵erent direction to overcome the scalability issues of IMDBs in multicores, while enforcing strong isolation semantics. First, we present a solution that requires no modification to either database systems or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↵er scalable performance under read-only workloads, while updateintensive workloads su↵er from performance loss, when compared to the standalone engine. Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↵ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads. Next we addressed the scalability limitations for update-intensive workloads, and propose the reduction of locking granularity from the table level to the attribute level. This further improved performance for intensive and moderate update workloads, at a slight cost for read-only workloads. Scalability is limited to intensive-read and read-only workloads. Finally, we investigate the impact applications have on the performance of database systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under which transaction perform all read operations before executing write operations. The RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation.
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão Industrial
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão Industrial
Resumo:
The modern computer systems that are in use nowadays are mostly processor-dominant, which means that their memory is treated as a slave element that has one major task – to serve execution units data requirements. This organization is based on the classical Von Neumann's computer model, proposed seven decades ago in the 1950ties. This model suffers from a substantial processor-memory bottleneck, because of the huge disparity between the processor and memory working speeds. In order to solve this problem, in this paper we propose a novel architecture and organization of processors and computers that attempts to provide stronger match between the processing and memory elements in the system. The proposed model utilizes a memory-centric architecture, wherein the execution hardware is added to the memory code blocks, allowing them to perform instructions scheduling and execution, management of data requests and responses, and direct communication with the data memory blocks without using registers. This organization allows concurrent execution of all threads, processes or program segments that fit in the memory at a given time. Therefore, in this paper we describe several possibilities for organizing the proposed memory-centric system with multiple data and logicmemory merged blocks, by utilizing a high-speed interconnection switching network.
Quantitative comparison of reconstruction methods for intra-voxel fiber recovery from diffusion MRI.
Resumo:
Validation is arguably the bottleneck in the diffusion magnetic resonance imaging (MRI) community. This paper evaluates and compares 20 algorithms for recovering the local intra-voxel fiber structure from diffusion MRI data and is based on the results of the "HARDI reconstruction challenge" organized in the context of the "ISBI 2012" conference. Evaluated methods encompass a mixture of classical techniques well known in the literature such as diffusion tensor, Q-Ball and diffusion spectrum imaging, algorithms inspired by the recent theory of compressed sensing and also brand new approaches proposed for the first time at this contest. To quantitatively compare the methods under controlled conditions, two datasets with known ground-truth were synthetically generated and two main criteria were used to evaluate the quality of the reconstructions in every voxel: correct assessment of the number of fiber populations and angular accuracy in their orientation. This comparative study investigates the behavior of every algorithm with varying experimental conditions and highlights strengths and weaknesses of each approach. This information can be useful not only for enhancing current algorithms and develop the next generation of reconstruction methods, but also to assist physicians in the choice of the most adequate technique for their studies.
Resumo:
Durant el període de gaudiment de la beca, des del dia 9 de març del 2007 fins el dia 8 de març del 2010, s’han dut a terme diferents tipus d’experiments amb sistemes bidimensionals com són les monocapes de Langmuir. Inicialment es va començar per l’estudi i la caracterització d’aquests sistemes experimentals, tant en repòs com en dinàmic, com és l’estudi de la reposta col•lectiva molecular de dominis d’un azoderivat fotosensible al rotar el pla de polarització mentre es mante sota il•luminació constant i els estudis de sistemes bidimensionals al collapse que es poden relacionar a les propietats viscoplàstiques dels sòlids. Una altra via d’estudi és la reologia d’aquests sistemes bidimensionals quan flueixen a través de canals. Arrel del sistema experimental més simple, una monocapa fluint per un canal, s’ha observat i estudiat l’efecte coll d’ampolla. Un cop assolit i estudiat el sistema més senzill, s’han aplicat tècniques més complexes de fabricació per fotolitografia per fer fluir monocapes de Langmuir per circuits on hi ha un gran contrast de mullat. Un cop aquests circuits es van implementar satisfactòriament en un sistema pel control de fluxos bidimensionals, es posen de manifest les possibles aplicacions futures d’aquests sistemes per l’estudi i el desenvolupament de la microfluídica bidimensional.
Resumo:
La E/S Paralela es un área de investigación que tiene una creciente importancia en el cómputo de Altas Prestaciones. Si bien durante años ha sido el cuello de botella de los computadores paralelos en la actualidad, debido al gran aumento del poder de cómputo, el problema de la E/S se ha incrementado y la comunidad del Cómputo de Altas Prestaciones considera que se debe trabajar en mejorar el sistema de E/S de los computadores paralelos, para lograr cubrir las exigencias de las aplicaciones científicas que usan HPC. La Configuración de la Entrada/Salida (E/S) Paralela tiene una gran influencia en las prestaciones y disponibilidad, por ello es importante “Analizar configuraciones de E/S paralela para identificar los factores claves que influyen en las prestaciones y disponibilidad de la E/S de Aplicaciones Científicas que se ejecutan en un clúster”. Para realizar el análisis de las configuraciones de E/S se propone una metodología que permite identificar los factores de E/S y evaluar su influencia para diferentes configuraciones de E/S formada por tres fases: Caracterización, Configuración y Evaluación. La metodología permite analizar el computador paralelo a nivel de Aplicación Científica, librerías de E/S y de arquitectura de E/S, pero desde el punto de vista de la E/S. Los experimentos realizados para diferentes configuraciones de E/S y los resultados obtenidos indican la complejidad del análisis de los factores de E/S y los diferentes grados de influencia en las prestaciones del sistema de E/S. Finalmente se explican los trabajos futuros, el diseño de un modelo que de soporte al proceso de Configuración del sistema de E/S paralela para aplicaciones científicas. Por otro lado, para identificar y evaluar los factores de E/S asociados con la disponibilidad a nivel de datos, se pretende utilizar la Arquitectura Tolerante a Fallos RADIC.
Resumo:
The undisputed, worldwide success of chemotherapy notwithstanding, schistosomiasis continues to defy control efforts in as much rapid reinfection demands repeated treatment, sometimes as often as once a year. There is thus a need for a complementary tool with effect for the longer term, notably a vaccine. International efforts in this direction have been ongoing for several decades but, until the recombinant DNA techniques were introduced, antigen production remained an unsurmountable bottleneck. Although animal experiments have been highly productive and are still much needed, they probably do not reflect the human situation adequately and real progress can not be expected until more is known about human immune responses to schistosome infection. It is well-known that irradiated cercariae consistently produce high levels of protection in experimental animals but, for various reasons, this proof of principle cannot be directly exploited. Research has instead been focussed on the identification and testing of specific schistosome antigens. This work has been quite successful and is already at the stage where clinical trials are called for. Preliminary results from coordinated in vitro laboratory and field epidemiological studies regarding the protective potential of several antigens support the initiation of such trials. A series of meetings, organized earlier this year in Cairo, Egypt, reviewed recent progress, selecteded suitable vaccine candidates and made firm recommendations for future action including pledging support for large-scale production according to good manufacturing practice (GMP) and Phase I trials. Scientists at the American Centers for Disease Control and Prevention (CDC) have drawn up a detailed research plan. The major financial support will come from USAID, Cairo, which has established a scientific advisory group of Egyptian scientists and representatives from current and previous international donors such as WHO, NIAID, the European Union and the Edna McConnell Clark Foundation.
Resumo:
We investigated the role of the number of loci coding for a neutral trait on the release of additive variance for this trait after population bottlenecks. Different bottleneck sizes and durations were tested for various matrices of genotypic values, with initial conditions covering the allele frequency space. We used three different types of matrices. First, we extended Cheverud and Routman's model by defining matrices of "pure" epistasis for three and four independent loci; second, we used genotypic values drawn randomly from uniform, normal, and exponential distributions; and third we used two models of simple metabolic pathways leading to physiological epistasis. For all these matrices of genotypic values except the dominant metabolic pathway, we find that, as the number of loci increases from two to three and four, an increase in the release of additive variance is occurring. The amount of additive variance released for a given set of genotypic values is a function of the inbreeding coefficient, independently of the size and duration of the bottleneck. The level of inbreeding necessary to achieve maximum release in additive variance increases with the number of loci. We find that additive-by-additive epistasis is the type of epistasis most easily converted into additive variance. For a wide range of models, our results show that epistasis, rather than dominance, plays a significant role in the increase of additive variance following bottlenecks.
Resumo:
TCP flows from applications such as the web or ftp are well supported by a Guaranteed Minimum Throughput Service (GMTS), which provides a minimum network throughput to the flow and, if possible, an extra throughput. We propose a scheme for a GMTS using Admission Control (AC) that is able to provide different minimum throughput to different users and that is suitable for "standard" TCP flows. Moreover, we consider a multidomain scenario where the scheme is used in one of the domains, and we propose some mechanisms for the interconnection with neighbor domains. The whole scheme uses a small set of packet classes in a core-stateless network where each class has a different discarding priority in queues assigned to it. The AC method involves only edge nodes and uses a special probing packet flow (marked as the highest discarding priority class) that is sent continuously from ingress to egress through a path. The available throughput in the path is obtained at the egress using measurements of flow aggregates, and then it is sent back to the ingress. At the ingress each flow is detected using an implicit way and then it is admission controlled. If it is accepted, it receives the GMTS and its packets are marked as the lowest discarding priority classes; otherwise, it receives a best-effort service. The scheme is evaluated through simulation in a simple "bottleneck" topology using different traffic loads consisting of "standard" TCP flows that carry files of varying sizes
Resumo:
Photo-mosaicing techniques have become popular for seafloor mapping in various marine science applications. However, the common methods cannot accurately map regions with high relief and topographical variations. Ortho-mosaicing borrowed from photogrammetry is an alternative technique that enables taking into account the 3-D shape of the terrain. A serious bottleneck is the volume of elevation information that needs to be estimated from the video data, fused, and processed for the generation of a composite ortho-photo that covers a relatively large seafloor area. We present a framework that combines the advantages of dense depth-map and 3-D feature estimation techniques based on visual motion cues. The main goal is to identify and reconstruct certain key terrain feature points that adequately represent the surface with minimal complexity in the form of piecewise planar patches. The proposed implementation utilizes local depth maps for feature selection, while tracking over several views enables 3-D reconstruction by bundle adjustment. Experimental results with synthetic and real data validate the effectiveness of the proposed approach
Resumo:
Habitat restoration measures may result in artificially high breeding density, for instance when nest-boxes saturate the environment, which can negatively impact species' demography. Potential risks include changes in mating and reproductive behaviour such as increased extra-pair paternity, conspecific brood parasitism, and polygyny. Under particular cicumstances, these mechanisms may disrupt reproduction, with populations dragged into an extinction vortex. With the use of nuclear microsatellite markers, we investigated the occurrence of these potentially negative effects in a recovered population of a rare secondary cavity-nesting farmland bird of Central Europe, the hoopoe (Upupa epops). High intensity farming in the study area has resulted in a total eradication of cavity trees, depriving hoopoes from breeding sites. An intensive nest-box campaign rectified this problem, resulting in a spectacular population recovery within a few years only. There was some concern, however, that the new, high artificially-induced breeding density might alter hoopoe mating and reproductive behaviour. As the species underwent a serious demographic bottleneck in the 1970-1990s, we also used the microsatellite markers to reconstitute the demo-genetic history of the population, looking in particular for signs of genetic erosion. We found i) a low occurrence of extra-pair paternity, polygyny and conspecific brood parasitism, ii) a high level of neutral genetic diversity (mean number of alleles and expected heterozygosity per locus: 13.8 and 83%, respectively) and, iii) evidence for genetic connectivity through recent immigration of individuals from well differentiated populations. The recent increase in breeding density did thus not induce so far any noticeable detrimental changes in mating and reproductive behaviour. The demographic bottleneck undergone by the population in the 1970s-1990s was furthermore not accompanied by any significant drop in neutral genetic diversity. Finally, genetic data converged with a concomitant demographic study to evidence that immigration strongly contributed to local population recovery.
Resumo:
Genetic diversity is essential for population survival and adaptation to changing environments. Demographic processes (e.g., bottleneck and expansion) and spatial structure (e.g., migration, number, and size of populations) are known to shape the patterns of the genetic diversity of populations. However, the impact of temporal changes in migration on genetic diversity has seldom been considered, although such events might be the norm. Indeed, during the millions of years of a species' lifetime, repeated isolation and reconnection of populations occur. Geological and climatic events alternately isolate and reconnect habitats. We analytically document the dynamics of genetic diversity after an abrupt change in migration given the mutation rate and the number and sizes of the populations. We demonstrate that during transient dynamics, genetic diversity can reach unexpectedly high values that can be maintained over thousands of generations. We discuss the consequences of such processes for the evolution of species based on standing genetic variation and how they can affect the reconstruction of a population's demographic and evolutionary history from genetic data. Our results also provide guidelines for the use of genetic data for the conservation of natural populations.
Resumo:
Language Resources are a critical component for Natural Language Processing applications. Throughout the years many resources were manually created for the same task, but with different granularity and coverage information. To create richer resources for a broad range of potential reuses, nformation from all resources has to be joined into one. The hight cost of comparing and merging different resources by hand has been a bottleneck for merging existing resources. With the objective of reducing human intervention, we present a new method for automating merging resources. We have addressed the merging of two verbs subcategorization frame (SCF) lexica for Spanish. The results achieved, a new lexicon with enriched information and conflicting information signalled, reinforce our idea that this approach can be applied for other task of NLP.