953 resultados para Molecular sequence data


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this study, the partial molar volumes of L-serine and L-threonine in aqueous solutions of ammonium sulfate at (0.0, 0.1, 0.3, 0.7, and 1.0) mol.kg(-1) are reported between 278.15 and 308.15 K. Transfer volumes and hydration numbers were obtained, which are larger in L-serine than in L-threonine. Dehydration of the amino acids is observed, rising with the temperature and salt molality. The data suggest that interactions between ions and charged/hydrophilic groups are predominant, and by applying the McMillan and Mayer formalism, it was concluded that they are mainly pair wise. The combination of the data presented in this study with solubility and molecular dynamics data suggests a stronger interaction of the ammonium cation with the zwitterionic centers of the amino acids when compared to the interactions of those centers with the sulfate anion.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tese (doutorado)—Universidade de Brasília, Instituto de Ciências Biológicas, Programa de Pós-Graduação em Biologia Animal, 2016.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Aedes notoscriptus (Skuse), a mosquito from the southwest Pacific region including Australia, has been implicated as a vector of arboviruses, but its status as a species is unclear. To investigate the taxonomic situation, we assessed genetic variation and phylogenetic relationships among Ae. notoscriptus from the east coast of Australia, Western Australia and New Zealand. Phylogenetic analyses of DNA sequence data from mitochondrial markers indicate that Ae. notoscriptus is a complex of divergent genetic lineages, some of which appear geographically restricted, while others are widespread in eastern Australia. Samples from New Zealand and Western Australia were related to populations from one southern Australian lineage. Nuclear markers show no evidence of genetic isolation by geographic distance in the overall sample of mosquitoes, but strong isolation by distance is obvious within two of the lineages, supporting their status as isolated gene pools. The morphological character of wing centroid size variation is also associated with genetic lineage. These findings point to the possibility that Ae. notoscriptus is a complex of species, highlighting the need to understand physiological and ecological differences that may influence future control strategies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Habitat fragmentation is a major threat to biodiversity, as it can alter ecological processes at various spatial and trophic scales. At the species level, fragmentation leading to the isolation of populations can trigger reductions in genetic diversity, potentially having detrimental effects on population fitness, adaptability and ultimately population persistence. Leptomyrmex pallens is a widespread rainforest ant endemic to New Caledonia but now confined to habitat patches that have been fragmented by anthropogenic fire regimes over the last 200 years. We investigated the social structure of L. pallens in the Aoupinié region (c.a. 4900 ha), and assessed the impacts of habitat fragmentation on its population genetic structure. Allele frequencies at 13 polymorphic microsatellite loci were compared among 411 worker ants from 21 nests distributed across the region. High within-nest relatedness (r = 0.70 ± 0.02), and a single queen found in 38 % of the nests by pedigree analysis indicate that the species is monogynous to weakly polygynous. Estimates of gene flow and genetic structure across the region were subsequently determined using a combined dataset of single workers per nest and of unrelated foraging workers. These estimates coupled with a comprehensive landscape genetic analysis revealed no evidence of significant population structure or habitat effects, suggesting that the Aoupinié region harbours a single panmictic population. In contrast, analyses of mitochondrial DNA sequence data revealed a high degree of genetic structuring, indicating limited maternal gene flow and suggesting that gene flow among nests is driven primarily by winged males. Overall these findings suggest that fire-induced habitat fragmentation has had little impact on the population dynamics of L. pallens. Additional studies of less mobile species should therefore be conducted to gain further insights into fire related disturbances on the unique biodiversity and function of New Caledonian ecosystems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Aim: Comparative phylogeographic analyses of alpine biota from the Northern Hemisphere have linked patterns of genetic diversification to glacial expansion and contraction events in the Pliocene and Pleistocene. Furthermore, the extent of diversification across species groups appears to be associated with vagility. In this study we test whether these patterns apply to a geologically stable system from eastern Australia with comparatively shallow elevational gradients and minimal influence from historical glacial activity. Location: The Australian Alps, Victoria, eastern Australia. Methods: We considered phylogeographic patterns across five alpine invertebrate species based on mitochondrial and nuclear DNA sequence data. Bayesian inference methods were used to estimate species phylogenies and divergence times among lineages. GIS tools were used to map interpopulation genetic divergence and intrapopulation genetic diversity estimates and to visualize spatial patterns across species, providing insights into patterns of endemism and demographic history. Results: Phylogeographic patterns and the timing of lineage diversification were consistent across taxonomic groups. Mountain summits harbour highly differentiated haplogroups, including summits connected by high-elevational plateaus, pointing to diversifications being maintained since the early to mid-Pleistocene. These findings are consistent with previous studies of alpine mammals and reptiles, demonstrating a high degree of endemism in this region, regardless of species vagility. Main conclusions: The fine spatial scales at which deep genetic differentiation among alpine communities was observed in this study are unprecedented. This suggests that glacial periods have had less of an impact on species distributions and genetic diversity than they have in alpine systems in the Northern Hemisphere. Historical gene flow among sky-island populations has been limited despite connecting snowlines during glacial periods, suggesting that factors other than snow cover have influenced patterns of gene flow in this region. These findings emphasize the unique phylogeographic history affecting Victorian alpine biodiversity, and the importance of conserving biodiversity from multiple mountain summits in this region of high endemism.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Glenelg spiny freshwater crayfish Euastacus bispinosus is a large endangered freshwater invertebrate of southeastern Australia that has suffered major population declines over the last century. Disjunct populations in the state of South Australia are in a particularly critical condition, restricted to a few isolated rising-spring habitats and in an ongoing state of decline. We assessed genetic diversity and gene flow within E. bispinosus across its current range using allele frequencies from 11 nuclear microsatellite loci and DNA sequence data from a single mitochon -drial locus (cytochrome oxidase subunit I). Populations were characterized by low levels of genetic diversity and found to be highly structured, with gene flow restricted both within and across catchments, highlighting the species' vulnerability to further habitat fragmentation and the importance of managing environmental threats on local scales across its current natural range. South Australian populations were characterized by critically low levels of genetic diversity generally, highlighting their potential vulnerability to localized extinction. Holistic conservation efforts are necessary to conserve populations, including local habitat management and, potentially, translocations to increase genetic diversity and evolutionary potential, and reduce possible inbreeding effects and the threat of extinction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The gammacoronavirus, Infectious Bronchitis Virus (IBV), is a respiratory pathogen of chickens. IBV is a constant threat to poultry production as established vaccines are often ineffective against emerging strains. This requires constant and rapid vaccine production by a process of viral attenuation by egg passage, but the essential forces leading to attenuation in the virus have not yet been characterised. Knowledge of these factors will lead to the development of more effective, rationally attenuated, live vaccines and reduction of the mortality and morbidity caused by this pathogen. M41 CK strain was egg passaged four times many years ago at Houghton Poultry Research Station and stored as M41-CK EP4 (stock virus at The Pirbright Institute since 1992). It was the first egg passage to have its genome pyrosequenced and was therefore used as the baseline reference. The overall aim of this project was to analyse deep sequence data obtained from four IBV isolates (called A, A1, C and D) each originating from the common M41-CK EP4 (ep4) and independently passaged multiple times in embryonated chicken eggs (figure 1.1). Highly polymorphic encoding regions of the IBV genome were then identified which are likely involved in the attenuation process through the formation of independent SNPs and/or SNP clusters. This was then used to direct targeted investigation of SNPs during the attenuation process of the four IBV passages. A previously generated deep sequence dataset was used as a preliminary map of attenuation for one virulent strain of IBV. This investigation showed the nucleocapsid and spike as two highly polymorphic encoding regions within the IBV genome with the highest proportion of SNPs compared to encoding region size. This analysis then led to more focussed studies of the nucleocapsid and spike encoding region with the ultimate aim of mapping key attenuating regions and nucleotide positions. The 454 pyrosequencing data and further investigation of nucleocapsid and spike encoding regions have identified the SNPs present at the same nucleotide positions within analysed A, A1, C and D isolates. These SNPs probably play a crucial role in viral attenuation and universal vaccine production but it is not clear if independent SNPs are also involved in loss of virulence. The majority of SNPs accumulated at different nucleotide positions without further continuation in Sanger sequenced egg passages presenting S2 subunit (spike) and nucleocapsid as polymorphic encoding regions which in nature remain highly conserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The diagnosis of mixed genotype hepatitis C virus (HCV) infection is rare and information on incidence in the UK, where genotypes 1a and 3 are the most prevalent, is sparse. Considerable variations in the efficacies of direct-acting antivirals (DAAs) for the HCV genotypes have been documented and the ability of DAAs to treat mixed genotype HCV infections remains unclear, with the possibility that genotype switching may occur. In order to estimate the prevalence of mixed genotype 1a/3 infections in Scotland, a cohort of 512 samples was compiled and then screened using a genotype-specific nested PCR assay. Mixed genotype 1a/3 infections were found in 3.8% of samples tested, with a significantly higher prevalence rate of 6.7% (p<0.05) observed in individuals diagnosed with genotype 3 infections than genotype 1a (0.8%). An analysis of the samples using genotypic-specific qPCR assays found that in two-thirds of samples tested, the minor strain contributed <1% of the total viral load. The potential of deep sequencing methods for the diagnosis of mixed genotype infections was assessed using two pan-genotypic PCR assays compatible with the Illumina MiSeq platform that were developed targeting the E1-E2 and NS5B regions of the virus. The E1-E2 assay detected 75% of the mixed genotype infections, proving to be more sensitive than the NS5B assay which identified only 25% of the mixed infections. Studies of sequence data and linked patient records also identified significantly more neurological disorders in genotype 3 patients. Evidence of distinctive dinucleotide expression within the genotypes was also uncovered. Taken together these findings raise interesting questions about the evolutionary history of the virus and indicate that there is still more to understand about the different genotypes. In an era where clinical medicine is frequently more personalised, the development of diagnostic methods for HCV providing increased patient stratification is increasingly important. This project has shown that sequence-based genotyping methods can be highly discriminatory and informative, and their use should be encouraged in diagnostic laboratories. Mixed genotype infections were challenging to identify and current deep sequencing methods were not as sensitive or cost-effective as Sanger-based approaches in this study. More research is needed to evaluate the clinical prognosis of patients with mixed genotype infection and to develop clinical guidelines on their treatment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The myogenic differentiation 1 gene (MYOD1) has a key role in skeletal muscle differentiation and composition through its regulation of the expression of several muscle-specific genes. We first used a general linear mixed model approach to evaluate the association of MYOD1 expression levels on individual beef tenderness phenotypes. MYOD1 mRNA levels measured by quantitative polymerase chain reactions in 136 Nelore steers were significantly associated (P ? 0.01) with Warner?Bratzler shear force, measured on the longissimus dorsi muscle after 7 and 14 days of beef aging. Transcript abundance for the muscle regulatory gene MYOD1 was lower in animals with more tender beef. We also performed a coexpression network analysis using whole transcriptome sequence data generated from 30 samples of longissimus muscle tissue to identify genes that are potentially regulated by MYOD1. The effect of MYOD1 gene expression on beef tenderness may emerge from its function as an activator of muscle-specific gene transcription such as for the serum response factor (C-fos serum response element-binding transcription factor) gene (SRF), which determines muscle tissue development, composition, growth and maturation.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Scytalidium thermophilum plays an important role in determining selectivity of compost produced for growing Agaricus bisporus. The objective of this study was to characterise S. thermophilum isolates by random amplified polymorphic DNA (RAPD) analysis and sequence analysis of internally transcribed spacer (ITS) regions of the rDNA, to assess the genetic variation exhibited by this species complex and to compare this with existing morphological and thermogravimetric data. RAPD analysis of 34 isolates from various parts of the world revealed two distinct groups, which could be separated on the basis of the differences in the banding patterns produced with five random primers. Nucleotide sequence analysis of the ITS region, which was ca 536 bp in length, revealed only very minor variation among S. thermophilum isolates examined. Several nucleotide base changes within this region demonstrated variation. Genetic distance values among type 1 and 2 S. thermophilum isolates, as determined by ITS sequence analysis, varied by a value of 0.005 %. Molecular analyses carried out in the present study would suggest that isolates within this species complex exhibit genetic differences which correlate well with morphological variation and thermogravimetric data previously determined.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Sequence-specific binding is demonstrated between pyrene-based tweezer molecules and soluble, high molar mass copolyimides. The binding involves complementary pi - pi stacking interactions, polymer chain-folding, and hydrogen bonding and is extremely sensitive to the steric environment around the pyromellitimide binding-site. A detailed picture of the intermolecular interactions involved has been obtained through single-crystal X-ray studies of tweezer complexes with model diimides. Ring-current magnetic shielding of polyimide protons by the pyrene '' arms '' of the tweezer molecule induces large complexation shifts of the corresponding H-1 NMR resonances, enabling specific triplet sequences to be identified by their complexation shifts. Extended comonomer sequences (triplets of triplets in which the monomer residues differ only by the presence or absence of a methyl group) can be '' read '' by a mechanism which involves multiple binding of tweezer molecules to adjacent diimide residues within the copolymer chain. The adjacent-binding model for sequence recognition has been validated by two conceptually different sets of tweezer binding experiments. One approach compares sequence-recognition events for copolyimides having either restricted or unrestricted triple-triplet sequences, and the other makes use of copolymers containing both strongly binding and completely nonbinding diimide residues. In all cases the nature and relative proportions of triple-triplet sequences predicted by the adjacent-binding model are fully consistent with the observed H-1 NMR data.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Coleodactylus amazonicus, a small leaf-litter diurnal gecko widely distributed in Amazon Basin has been, considered a single species with no significant morphological differences between populations along its range. A recent molecular study, however, detected large genetic differences between populations of central Amazonia and those in the easternmost part of the Amazon Basin, suggesting the presence of taxonomically unrecognised diversity. In this study, DNA sequences of three mitochondrial (165, cytb, and ND4) and two nuclear genes (RAG-1, c-mos) were used to investigate whether the species currently identified as C. amazonicus contains morphologically cryptic species lineages. The present phylogenetic analysis reveals further genetic subdivision including at least five potential species lineages, restricted to northeastern (lineage A), southeastern (lineage B), central-northern (lineage E) and central-southern (lineages C and D) parts of Amazon Basin. All clades are characterized by exclusive groups of alleles for both nuclear genes and highly divergent mitochondrial haplotype clades, with corrected pairwise net sequence divergence between sister lineages ranging from 9.1% to 20.7% for the entire mtDNA dataset. Results of this study suggest that the real diversity of ""C. amazonicus"" has been underestimated due to its apparent cryptic diversification. (C) 2009 Elsevier Inc. All rights reserved.