338 resultados para HDFS bottleneck
Resumo:
Lexical Resources are a critical component for Natural Language Processing applications. However, the high cost of comparing and merging different resources has been a bottleneck to have richer resources with a broad range of potential uses for a significant number of languages.With the objective of reducing cost byeliminating human intervention, we present a new method for automating the merging of resources,with special emphasis in what we call the mapping step. This mapping step, which converts the resources into a common format that allows latter the merging, is usually performed with huge manual effort and thus makes the whole process very costly. Thus, we propose a method to perform this mapping fully automatically. To test our method, we have addressed the merging of two verb subcategorization frame lexica for Spanish, The resultsachieved, that almost replicate human work, demonstrate the feasibility of the approach.
Resumo:
Lexical Resources are a critical component for Natural Language Processing applications. However, the high cost of comparing and merging different resources has been a bottleneck to obtain richer resources and a broader range of potential uses for a significant number of languages. With the objective of reducing cost by eliminating human intervention, we present a new method towards the automatic merging of resources. This method includes both, the automatic mapping of resources involved to a common format and merging them, once in this format. This paper presents how we have addressed the merging of two verb subcategorization frame lexica for Spanish, but our method will be extended to cover other types of Lexical Resources. The achieved results, that almost replicate human work, demonstrate the feasibility of the approach.
Resumo:
We sequenced 998 base pairs (bp) of mitochondrial DNA cytochrome b and 799 bp of nuclear gene BRCA1 in the Lesser white-toothed shrew (Crocidura suaveolens group) over its geographic range from Portugal to Japan. The aims of the study were to identify the main clades within the group and respective refugia resulting from Pleistocene glaciations. Analyses revealed the Asian lesser white-toothed shrew (C. shantungensis) as the basal clade, followed by a major branch of C. suaveolens, subdivided sensu stricto into six clades, which split-up in the Upper Pliocene and Lower Pleistocene (1.9-0.9 Myr). The largest clade, occurring over a huge range from east Europe to Mongolia, shows evidence of population expansion after a bottleneck. West European clades originated from Iberian and Italo-Balkanic refugia. In the Near East, three clades evolved in an apparent hotspot of refugia (west Turkey, south-west and south-east of the Caucasus). Most clades include specimens of different morphotypes and the validity of many taxa in the C. suaveolens group has to be re-evaluated.
Resumo:
As características do tráfego na Internet são cada vez mais complexas devido à crescente diversidade de aplicações, à existência de diferenças drásticas no comportamento de utilizadores, à mobilidade de utilizadores e equipamentos, à complexidade dos mecanismos de geração e controlo de tráfego, e à crescente diversidade dos tipos de acesso e respectivas capacidades. Neste cenário é inevitável que a gestão da rede seja cada vez mais baseada em medições de tráfego em tempo real. Devido à elevada quantidade de informação que é necessário processar e armazenar, é também cada vez maior a necessidade das plataformas de medição de tráfego assumirem uma arquitectura distribuída, permitindo o armazenamento distribuído, replicação e pesquisa dos dados medidos de forma eficiente, possivelmente imitando o paradigma Peer-to-Peer (P2P). Esta dissertação descreve a especificação, implementação e teste de um sistema de medição de tráfego com uma arquitectura distribuída do tipo P2P, que fornece aos gestores de rede uma ferramenta para configurar remotamente sistemas de monitorização instalados em diversos pontos da rede para a realização de medições de tráfego. O sistema pode também ser usado em redes orientadas à comunidade onde os utilizadores podem partilhar recursos das suas máquinas para permitir que outros realizem medições e partilhem os dados obtidos. O sistema é baseado numa rede de overlay com uma estrutura hierárquica organizada em áreas de medição. A rede de overlay é composta por dois tipos de nós, denominados de probes e super-probes, que realizam as medições e armazenam os resultados das mesmas. As superprobes têm ainda a função de garantir a ligação entre áreas de medição e gerir a troca de mensagens entre a rede e as probes a elas conectadas. A topologia da rede de overlay pode mudar dinamicamente, com a inserção de novos nós e a remoção de outros, e com a promoção de probes a super-probes e viceversa, em resposta a alterações dos recursos disponíveis. Os nós armazenam dois tipos de resultados de medições: Light Data Files (LDFs) e Heavy Data Files (HDFs). Os LDFs guardam informação relativa ao atraso médio de ida-evolta de cada super-probe para todos os elementos a ela ligados e são replicados em todas as super-probes, fornecendo uma visão simples mas facilmente acessível do estado da rede. Os HDFs guardam os resultados detalhados das medições efectuadas a nível do pacote ou do fluxo e podem ser replicados em alguns nós da rede. As réplicas são distribuídas pela rede tendo em consideração os recursos disponíveis nos nós, de forma a garantir resistência a falhas. Os utilizadores podem configurar medições e pesquisar os resultados através do elemento denominado de cliente. Foram realizados diversos testes de avaliação do sistema que demonstraram estar o mesmo a operar correctamente e de forma eficiente.
Resumo:
The study of organisms and their resources is critical to further understanding population dynamics in space and time. Although drosophilids have been widely used as biological models, their relationship with breeding and feeding sites has received little attention. Here, we investigate drosophilids breeding in fruits in the Brazilian Savanna, in two contrasting vegetation types, throughout 16 months. Specifically, larval assemblages were compared between savannas and forests, as well as between rainy and dry seasons. The relationships between resource availability and drosophilid abundance and richness were also tested. The community (4,022 drosophilids of 23 species and 2,496 fruits of 57 plant taxa) varied widely in space and time. Drosophilid assemblages experienced a strong bottleneck during the dry season, decreasing to only 0.5% of the abundance of the rainy season. Additionally, savannas displayed lower richness and higher abundance than the forests, and were dominated by exotic species. Both differences in larval assemblages throughout the year and between savannas and gallery forests are consistent with those previously seen in adults. Although the causes of this dynamic are clearly multifactorial, resource availability (richness and abundance of rotten fruits) was a good predictor of the fly assemblage structure.
Resumo:
This PhD thesis addresses the issue of scalable media streaming in large-scale networking environments. Multimedia streaming is one of the largest sink of network resources and this trend is still growing as testified by the success of services like Skype, Netflix, Spotify and Popcorn Time (BitTorrent-based). In traditional client-server solutions, when the number of consumers increases, the server becomes the bottleneck. To overcome this problem, the Content-Delivery Network (CDN) model was invented. In CDN model, the server copies the media content to some CDN servers, which are located in different strategic locations on the network. However, they require heavy infrastructure investment around the world, which is too expensive. Peer-to-peer (P2P) solutions are another way to achieve the same result. These solutions are naturally scalable, since each peer can act as both a receiver and a forwarder. Most of the proposed streaming solutions in P2P networks focus on routing scenarios to achieve scalability. However, these solutions cannot work properly in video-on-demand (VoD) streaming, when resources of the media server are not sufficient. Replication is a solution that can be used in these situations. This thesis specifically provides a family of replication-based media streaming protocols, which are scalable, efficient and reliable in P2P networks. First, it provides SCALESTREAM, a replication-based streaming protocol that adaptively replicates media content in different peers to increase the number of consumers that can be served in parallel. The adaptiveness aspect of this solution relies on the fact that it takes into account different constraints like bandwidth capacity of peers to decide when to add or remove replicas. SCALESTREAM routes media blocks to consumers over a tree topology, assuming a reliable network composed of homogenous peers in terms of bandwidth. Second, this thesis proposes RESTREAM, an extended version of SCALESTREAM that addresses the issues raised by unreliable networks composed of heterogeneous peers. Third, this thesis proposes EAGLEMACAW, a multiple-tree replication streaming protocol in which two distinct trees, named EAGLETREE and MACAWTREE, are built in a decentralized manner on top of an underlying mesh network. These two trees collaborate to serve consumers in an efficient and reliable manner. The EAGLETREE is in charge of improving efficiency, while the MACAWTREE guarantees reliability. Finally, this thesis provides TURBOSTREAM, a hybrid replication-based streaming protocol in which a tree overlay is built on top of a mesh overlay network. Both these overlays cover all peers of the system and collaborate to improve efficiency and low-latency in streaming media to consumers. This protocol is implemented and tested in a real networking environment using PlanetLab Europe testbed composed of peers distributed in different places in Europe.
Resumo:
Ten microsatellite loci and a partial sequence of the COII mitochondrial gene were used to investigate genetic differentiation in B. terrestris, a bumble bee of interest for its high-value crop pollination. The analysis included eight populations from the European continent, five from Mediterranean islands (six subspecies altogether) and one from Tenerife (initially described as a colour form of B. terrestris but recently considered as a separate species, B. canariensis). Eight of the 10 microsatellite loci displayed high levels of polymorphism in most populations. In B. terrestris populations, the total number of alleles detected per polymorphic locus ranged from 3 to 16, with observed allelic diversity from 3.8 +/- 0.5 to 6.5 +/- 1.4 and average calculated heterozygosities from 0.41 +/- 0.09 to 0.65 +/- 0.07. B. canariensis showed a significantly lower average calculated heterozygosity (0.12 +/- 0.08) and observed allelic diversity (1.5 +/- 0.04) as compared to both continental and island populations of B. terrestris. No significant differentiation was found among populations of B. terrestris from the European continent. In contrast, island populations were all significantly and most of them strongly differentiated from continental populations. B. terrestris mitochondrial DNA is characterized by a low nucleotide diversity: 0.18% +/- 0.07%, 0.20% +/- 0.04% and 0.27% +/- 0.04% for the continental populations, the island populations and all populations together, respectively. The only haplotype found in the Tenerife population differs by a single nucleotide substitution from the most common continental haplotype of B. terrestris. This situation, identical to that of Tyrrhenian islands populations and quite different from that of B. lucorum (15 substitutions between terrestris and lucorum mtDNA) casts doubts on the species status of B. canariensis. The large genetic distance between the Tenerife and B. terrestris populations estimated from microsatellite data result, most probably, from a severe bottleneck in the Canary island population. Microsatellite and mitochondrial DNA data call for the protection of the island populations of B. terrestris against importation of bumble bees of foreign origin which are used as crop pollinators.
Resumo:
Wolves in Italy strongly declined in the past and were confined south of the Alps since the turn of the last century, reduced in the 1970s to approximately 100 individuals surviving in two fragmented subpopulations in the central-southern Apennines. The Italian wolves are presently expanding in the Apennines, and started to recolonize the western Alps in Italy, France and Switzerland about 16 years ago. In this study, we used a population genetic approach to elucidate some aspects of the wolf recolonization process. DNA extracted from 3068 tissue and scat samples collected in the Apennines (the source populations) and in the Alps (the colony), were genotyped at 12 microsatellite loci aiming to assess (i) the strength of the bottleneck and founder effects during the onset of colonization; (ii) the rates of gene flow between source and colony; and (iii) the minimum number of colonizers that are needed to explain the genetic variability observed in the colony. We identified a total of 435 distinct wolf genotypes, which showed that wolves in the Alps: (i) have significantly lower genetic diversity (heterozygosity, allelic richness, number of private alleles) than wolves in the Apennines; (ii) are genetically distinct using pairwise F(ST) values, population assignment test and Bayesian clustering; (iii) are not in genetic equilibrium (significant bottleneck test). Spatial autocorrelations are significant among samples separated up to c. 230 km, roughly correspondent to the apparent gap in permanent wolf presence between the Alps and north Apennines. The estimated number of first-generation migrants indicates that migration has been unidirectional and male-biased, from the Apennines to the Alps, and that wolves in southern Italy did not contribute to the Alpine population. These results suggest that: (i) the Alps were colonized by a few long-range migrating wolves originating in the north Apennine subpopulation; (ii) during the colonization process there has been a moderate bottleneck; and (iii) gene flow between sources and colonies was moderate (corresponding to 1.25-2.50 wolves per generation), despite high potential for dispersal. Bottleneck simulations showed that a total of c. 8-16 effective founders are needed to explain the genetic diversity observed in the Alps. Levels of genetic diversity in the expanding Alpine wolf population, and the permanence of genetic structuring, will depend on the future rates of gene flow among distinct wolf subpopulation fragments.
Resumo:
We sequenced 1077 bp of the mitochondrial cytochrome b gene and 511 bp of the nuclear Apolipoprotein B gene in bicoloured shrew (Crocidura leucodon, Soricidae) populations ranging from France to Georgia. The aims of the study were to identify the main genetic clades within this species and the influence of Pleistocene climatic variations on the respective clades. The mitochondrial analyses revealed a European clade distributed from France eastwards to north-western Turkey and a Near East clade distributed from Georgia to Romania; the two clades separated during the Middle Pleistocene. We clearly identified a population expansion after a bottleneck for the European clade based on mitochondrial and nuclear sequencing data; this expansion was not observed for the eastern clade. We hypothesize that the western population was confined to a small Italo-Balkanic refugium, whereas the eastern population subsisted in several refugia along the southern coast of the Black Sea.
Resumo:
African clawed frogs of the widespread polytypic species Xenopus laevis Daudin, 1802 (ranging large parts of sub-Saharan Africa) have been spreading since the 1940s, and have established reproductive populations in Europe, Asia and the Americas, where they can have negative impact as competitors of native amphibians and as disease vectors for chytridomycosis or ranaviruses. Here we use two mitochondrial (cytochrome b, 16S rDNA) and one nuclear (RAG 1: Recombination Associated Gene 1) DNA markers to infer the potential origin of invasive clawed frogs from Sicily that represent the largest invasive population in Europe. Identical mtDNA haplotypes match with those of Xenopus laevis, and Sicilian clawed frogs very probably belong to a lineage from the Cape Region of South Africa, most likely originating from a laboratory stock. Nuclear data support this conclusion. Identical mtDNA sequences (cyt b, 16S) of frogs sampled across their range in Sicily suggest the occurrence of a single source population and a potential bottleneck at their release, but faster evolving multilocus nuclear data (microsatellites, SNPs) on the population genetics would be important in the future to better support this hypothesis
Resumo:
We sequenced 998 base pairs (bp) of mitochondrial DNA cytochrome b and 799 bp of nuclear gene BRCA1 in the Lesser white-toothed shrew (Crocidura suaveolens group) over its geographic range from Portugal to Japan. The aims of the study were to identify the main clades within the group and respective refugia resulting from Pleistocene glaciations. Analyses revealed the Asian lesser white-toothed shrew (C. shantungensis) as the basal clade, followed by a major branch of C. suaveolens, subdivided sensu stricto into six clades, which split-up in the Upper Pliocene and Lower Pleistocene (1.9-0.9 Myr). The largest clade, occurring over a huge range from east Europe to Mongolia, shows evidence of population expansion after a bottleneck. West European clades originated from Iberian and Italo-Balkanic refugia. In the Near East, three clades evolved in an apparent hotspot of refugia (west Turkey, south-west and south-east of the Caucasus). Most clades include specimens of different morphotypes and the validity of many taxa in the C. suaveolens group has to be re-evaluated.
Resumo:
Rapid rebound of plasma viremia in patients after interruption of long-term combination antiretroviral therapy (cART) suggests persistence of low-level replicating cells or rapid reactivation of latently infected cells. To further characterize rebounding virus, we performed extensive longitudinal clonal evolutionary studies of HIV env C2-V3-C3 regions and exploited the temporal relationships of rebounding plasma viruses with regard to pretreatment sequences in 20 chronically HIV-1-infected patients having undergone multiple 2-week structured treatment interruptions (STI). Rebounding virus during the short STI was homogeneous, suggesting mono- or oligoclonal origin during reactivation. No evidence for a temporal structure of rebounding virus in regard to pretreatment sequences was found. Furthermore, expansion of distinct lineages at different STI cycles emerged. Together, these findings imply stochastic reactivation of different clones from long-lived latently infected cells rather than expansion of viral populations replicating at low levels. After treatment was stopped, diversity increased steadily, but pretreatment diversity was, on average, achieved only >2.5 years after the start of STI when marked divergence from preexisting quasispecies also emerged. In summary, our results argue against persistence of ongoing low-level replication in patients on suppressive cART. Furthermore, a prolonged delay in restoration of pretreatment viral diversity after treatment interruption demonstrates a surprisingly sustained evolutionary bottleneck induced by punctuated antiretroviral therapy.
Resumo:
Understanding levels of population differentiation and inbreeding are important issues in conservation biology, especially for social Hymenoptera with fragmented and small population sizes. Isolated populations are more vulnerable to genetic loss and extinction than those with extended continuous distributions. However, small populations are not always a consequence of a recent reduction of their habitat. Thus, determining the history of population isolation and current patterns of genetic variation of a species is crucial for its conservation. Rossomyrmex minuchae is a slave-making ant with patchy distribution in South Eastern Spain and is classified as vulnerable by the IUCN. In contrast, the other three known species of the genus are presumed to show more uniform distributions. Here we investigate the genetic diversity and population structure of R. minuchae and compare it with that found in two other species of the genus: R. anatolicus and R. quandratinodum. We conclude that although genetic diversity of R. minuchae is low, there is no evidence of a recent bottleneck, suggesting a gradual and natural fragmentation process. We also show extreme population differentiation at nuclear and mitochondrial markers, and isolation by distance at a local scale. Despite some evidence for inbreeding and low genetic variation within populations, we found almost no diploid males, a finding which contrasts with that expected in inbred Hymenoptera with single locus complementary sex determination. This could mean that sex is determined by another mechanism. We argue that continued low population size means that detrimental effects of inbreeding and low genetic variation are likely in the future. We suggest that a policy of artificial gene flow aimed at increasing within population variation is considered as a management option.
Resumo:
Résumé La thématique de cette thèse peut être résumée par le célèbre paradoxe de biologie évolutive sur le maintien du polymorphisme face à la sélection et par l'équation du changement de fréquence gamétique au cours du temps dû, à la sélection. La fréquence d'un gamète xi à la génération (t + 1) est: !!!Equation tronquée!!! Cette équation est utilisée pour générer des données utlisée tout au long de ce travail pour 2, 3 et 4 locus dialléliques. Le potentiel de l'avantage de l'hétérozygote pour le maintien du polymorphisme est le sujet de la première partie. La définition commune de l'avantage de l'hétérozygote n'etant applicable qu'a un locus ayant 2 allèles, cet avantage est redéfini pour un système multilocus sur les bases de précédentes études. En utilisant 5 définitions différentes de l'avantage de l'hétérozygote, je montre que cet avantage ne peut être un mécanisme général dans le maintien du polymorphisme sous sélection. L'étude de l'influence de locus non-détectés sur les processus évolutifs, seconde partie de cette thèse, est motivée par les travaux moléculaires ayant pour but de découvrir le nombre de locus codant pour un trait. La plupart de ces études sous-estiment le nombre de locus. Je montre que des locus non-détectés augmentent la probabilité d'observer du polymorphisme sous sélection. De plus, les conclusions sur les facteurs de maintien du polymorphisme peuvent être trompeuses si tous les locus ne sont pas détectés. Dans la troisième partie, je m'intéresse à la valeur attendue de variance additive après un goulot d'étranglement pour des traits sélectionés. Une études précédente montre que le niveau de variance additive après goulot d'étranglement augmente avec le nombre de loci. Je montre que le niveau de variance additive après un goulot d'étranglement augmente (comparé à des traits neutres), mais indépendamment du nombre de loci. Par contre, le taux de recombinaison a une forte influence, entre autre en regénérant les gamètes disparus suite au goulot d'étranglement. La dernière partie de ce travail de thèse décrit un programme pour le logiciel de statistique R. Ce programme permet d'itérer l'équation ci-dessus en variant les paramètres de sélection, recombinaison et de taille de populations pour 2, 3 et 4 locus dialléliques. Cette thèse montre qu'utiliser un système multilocus permet d'obtenir des résultats non-conformes à ceux issus de systèmes rnonolocus (la référence en génétique des populations). Ce programme ouvre donc d'intéressantes perspectives en génétique des populations. Abstract The subject of this PhD thesis can be summarized by one famous paradox of evolu-tionary biology: the maintenance of polymorphism in the face of selection, and one classical equation of theoretical population genetics: the changes in gametic frequencies due to selection and recombination. The frequency of gamete xi at generation (t + 1) is given by: !!! Truncated equation!!! This equation is used to generate data on selection at two, three, and four diallelic loci for the different parts of this work. The first part focuses on the potential of heterozygote advantage to maintain genetic polymorphism. Results of previous studies are used to (re)define heterozygote advantage for multilocus systems, since the classical definition is for one diallelic locus. I use 5 different definitions of heterozygote advantage. And for these five definitions, I show that heterozygote advantage is not a general mechanism for the maintenance of polymorphism. The study of the influence of undetected loci on evolutionary processes (second part of this work) is motivated by molecular works which aim at discovering the loci coding for a trait. For most of these works, some coding loci remains undetected. I show that undetected loci increases the probability of maintaining polymorphism under selection. In addition, conclusions about the factor that maintain polymorphism can be misleading if not all loci are considered. This is, therefore, only when all loci are detected that exact conclusions on the level of maintained polymorphism or on the factor(s) that maintain(s) polymorphism could be drawn. In the third part, the focus is on the expected release of additive genetic variance after bottleneck for selected traits. A previous study shows that the expected release of additive variance increases with an increase in the number of loci. I show that the expected release of additive variance after bottleneck increases for selected traits (compared with neutral), but this increase is not a function of the number of loci, but function of the recombination rate. Finally, the last part of this PhD thesis is a description of a package for the statistical software R that implements the Equation given above. It allows to generate data for different scenario regarding selection, recombination, and population size. This package opens perspectives for the theoretical population genetics that mainly focuses on one locus, while this work shows that increasing the number of loci leads not necessarily to straightforward results.
Resumo:
Connectivity among populations plays a crucial role in maintaining genetic variation at a local scale, especially in small populations affected strongly by genetic drift. The negative consequences of population disconnection on allelic richness and gene diversity (heterozygosity) are well recognized and empirically established. It is not well recognized, however, that a sudden drop in local effective population size induced by such disconnection produces a temporary disequilibrium in allelic frequency distributions that is akin to the genetic signature of a demographic bottleneck. To document this effect, we used individual-based simulations and empirical data on allelic richness and gene diversity in six pairs of isolated versus well-connected (core) populations of European tree frogs. In our simulations, population disconnection depressed allelic richness more than heterozygosity and thus resulted in a temporary excess in gene diversity relative to mutation drift equilibrium (i.e., signature of a genetic bottleneck). We observed a similar excess in gene diversity in isolated populations of tree frogs. Our results show that population disconnection can create a genetic bottleneck in the absence of demographic collapse.