911 resultados para Molecular Sequence Data


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The potential role of viruses in coral disease has only recently begun to receive attention. Here we describe our attempts to determine whether viruses are present in thermally stressed corals Pavona danai, Acropora formosa and Stylophora pistillata and zoanthids Zoanthus sp., and their zooxanthellae. Heat-shocked P. danai, A. formosa and Zoanthus sp. all produced numerous virus-like particles (VLPs) that were evident in the animal tissue, zooxanthellae and the surrounding seawater; VLPs were also seen around heat-shocked freshly isolated zooxanthellae (FIZ) from P. danai and S. pistillata. The most commonly seen VLPs were tail-less, hexagonal and about 40 to 50 nm in diameter, though a diverse range of other VLP morphotypes (e.g. rounded, rod-shaped, droplet-shaped, filamentous) were also present around corals. When VLPs around heat-shocked FIZ from S. pistillata were added to non-stressed FIZ from this coral, they resulted in cell lysis, suggesting that an infectious agent was present; however, analysis with transmission electron microscopy provided no clear evidence of viral infection. The release of diverse VLPs was again apparent when flow cytometry was used to enumerate release by heat-stressed A. formosa nubbins. Our data support the infection of reef corals by viruses, though we cannot yet determine the precise origin (i.e. coral, zooxanthellae and/or surface microbes) of the VLPs seen. Furthermore, genome sequence data are required to establish the presence of viruses unequivocally.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a machine learning model that predicts a structural disruption score from a protein’s primary structure. SCHEMA was introduced by Frances Arnold and colleagues as a method for determining putative recombination sites of a protein on the basis of the full (PDB) description of its structure. The present method provides an alternative to SCHEMA that is able to determine the same score from sequence data only. Circumventing the need for resolving the full structure enables the exploration of yet unresolved and even hypothetical sequences for protein design efforts. Deriving the SCHEMA score from a primary structure is achieved using a two step approach: first predicting a secondary structure from the sequence and then predicting the SCHEMA score from the predicted secondary structure. The correlation coefficient for the prediction is 0.88 and indicates the feasibility of replacing SCHEMA with little loss of precision. ©2005 IEEE

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A major task of traditional temporal event sequence mining is to find all frequent event patterns from a long temporal sequence. In many real applications, however, events are often grouped into different types, and not all types are of equal importance. In this paper, we consider the problem of efficient mining of temporal event sequences which lead to an instance of a specific type of event. Temporal constraints are used to ensure sensibility of the mining results. We will first generalise and formalise the problem of event-oriented temporal sequence data mining. After discussing some unique issues in this new problem, we give a set of criteria, which are adapted from traditional data mining techniques, to measure the quality of patterns to be discovered. Finally we present an algorithm to discover potentially interesting patterns.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this study, I determined the identity, taxonomic placement, and distribution of digenetic trematodes parasitizing the snails Pomacea paludosa and Planorbella duryi at Pa-hay-okee, Everglades National Park. I also characterized temporal and geographic variation in the probability of parasite infection for these snails based on two years of sampling. Although studies indicate that digenean parasites may have important effects both on individual species and the structure of communities, there have been no studies of digenean parasitism on snails within the Everglades ecosystem. For example, the endangered Everglade Snail Kite, a specialist that feeds almost exclusively on Pomacea paludosa, and is known to be a definitive host of digenean parasites, may suffer direct and indirect effects from consumption of parasitized apple snails. Therefore, information on the diversity and abundance of parasites harbored in snail populations in the Everglades should be of considerable interest for management and conservation of wildlife. Juvenile digeneans (cercariae) representing 20 species were isolated from these two snails, representing a quadrupling of the number of species known. Species were characterized based on morphological, morphometric, and sequence data (18S rDNA, COI, and ITS). Species richness of shed cercariae from P. duryi was greater than P. paludosa, with 13 and 7 species respectively. These species represented 14 families. P. paludosa and P. duryi had no digenean species in common. Probability of digenean infection was higher for P. duryi than P. paludosa and adults showed a greater risk of infection than juveniles for both of these snails. Planorbella duryi showed variation in probability of infection between sampling sites and hydrological seasons. The number of unique combinations of multi-species infections was greatest among P. duryi individuals, while the overall percentage of multi-species infections was greatest in P. paludosa. Analyses of six frequently-observed multiple infections from P. duryi suggest the presence of negative interactions, positive interactions, and neutral associations between larval digeneans. These results should contribute to an understanding of the factors controlling the abundance and distribution of key species in the Everglades ecosystem and may in particular help in the management and recovery planning for the Everglade Snail Kite.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background The HIV virus is known for its ability to exploit numerous genetic and evolutionary mechanisms to ensure its proliferation, among them, high replication, mutation and recombination rates. Sliding MinPD, a recently introduced computational method [1], was used to investigate the patterns of evolution of serially-sampled HIV-1 sequence data from eight patients with a special focus on the emergence of X4 strains. Unlike other phylogenetic methods, Sliding MinPD combines distance-based inference with a nonparametric bootstrap procedure and automated recombination detection to reconstruct the evolutionary history of longitudinal sequence data. We present serial evolutionary networks as a longitudinal representation of the mutational pathways of a viral population in a within-host environment. The longitudinal representation of the evolutionary networks was complemented with charts of clinical markers to facilitate correlation analysis between pertinent clinical information and the evolutionary relationships. Results Analysis based on the predicted networks suggests the following:: significantly stronger recombination signals (p = 0.003) for the inferred ancestors of the X4 strains, recombination events between different lineages and recombination events between putative reservoir virus and those from a later population, an early star-like topology observed for four of the patients who died of AIDS. A significantly higher number of recombinants were predicted at sampling points that corresponded to peaks in the viral load levels (p = 0.0042). Conclusion Our results indicate that serial evolutionary networks of HIV sequences enable systematic statistical analysis of the implicit relations embedded in the topology of the structure and can greatly facilitate identification of patterns of evolution that can lead to specific hypotheses and new insights. The conclusions of applying our method to empirical HIV data support the conventional wisdom of the new generation HIV treatments, that in order to keep the virus in check, viral loads need to be suppressed to almost undetectable levels.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mammalian C3 is a complement protein which consists of an α chain (125kDa) and β chain (75kDa) held together by a disulfide bond. The a chain contains a conserved thiolester site which provides the molecule with opsonic properties. The protein is synthesized as a single pro-C3 molecule which is post-translationally modified. C3 genes have been identified in organisms from different phyla, however, the shark C3 gene remains to be cloned. Sequence data from the shark will contribute to understanding further the evolution of this key protein. To obtain additional sequence data for shark C3 genes a cDNA library was constructed and screened with a DIG-labeled C3 probe. Fifty clones were isolated and sequenced. Analysis identified four sequences that yielded positive alignments with C3 of a variety of organisms including human C3. Deduced amino acid sequence analysis confirmed a β/α cut site (RRRR), the CR3 and properdin binding sites, the catalytic histidine, and the reactive thiolester sequence. In the shark there are at least two C3-like genes as the gene sequence obtained is distinct from that previously described.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A uniform chronology for foraminifera-based sea surface temperature records has been established in more than 120 sediment cores obtained from the equatorial and eastern Atlantic up to the Arctic Ocean. The chronostratigraphy of the last 30,000 years is mainly based on published d18O records and 14C ages from accelerator mass spectrometry, converted into calendar-year ages. The high-precision age control provides the database necessary for the uniform reconstruction of the climate interval of the Last Glacial Maximum within the GLAMAP-2000 project.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Olfactory sensory neurons (OSNs), which detect a myriad of odorants, are known to express one allele of one olfactory receptor (OR) gene (Olfr) from the largest gene family in the mammalian genome. The OSNs expressing the same OR project their axons to the main olfactory bulb where they converge to form glomeruli. This “One neuron-one receptor rule” makes the olfactory epithelium (OE), which consists of a vast number of OSNs expressing unique ORs, one of the most heterogeneous cell populations. However, the mechanism of how the single OR allele is chosen remains unclear along with the question of whether one OSN only expresses a single OR gene, a hypothesis that has not been rigorously verified while we performed the experiments. Moreover, failure of axonal targeting to single glomerulus was observed in MeCP2 deficient OSNs where delayed development was proposed as an explanation for the phenotype. How Mecp2 mutation caused this aberrant targeting is not entirely understood.

In this dissertation, we explored the transcriptomes of single and mature OSNs by single-cell RNA-Seq to reveal their heterogeneity and further studied the OR gene expression from these isolated OSNs. The singularity of sequenced OSNs was ensured by the observation of monoallelic expression of X-linked genes from the hybrid samples from crosses between mice of different strains where strain-specific polymorphisms could be used to track the allelic origins of SNP-containing reads. The clustering of expression profiles from triplicates that originated from the same cell assured that the transcriptomic identities of OSNs were maintained through the experimental process. The average gene expression profiles of sequenced OSNs correlated well to the conventional transcriptome data of FACS-sorted Omp-positive cells, and the top-ranked expression of OR was conceded in the single-OSN transcriptomes. While exploring cellular diversity, in addition to OR genes, we revealed nearly 200 differentially expressed genes among the sequenced OSNs in this study. Among the 36 sequenced OSNs, eight cells (22.2%) showed multiple OR gene expression and the presences of additional ORs were not restricted to the neighbor loci that shared the transcriptional effect of the primary OR expression, suggesting that the “One neuron-one receptor rule” might not be strictly true at the transcription level. All of the inferable ORs, including additional co-expressed ORs, were shown to be monoallelic. Our sequencing of 21 Mecp2308 mutant OSNs, of which 62% expressed more than one OR genes, and the expression levels of the additional ORs were significantly higher than those in the wild-type, suggested that MeCP2 plays a role in the regulation of singular OR gene expression. Dual label in situ hybridization along with the sequence data revealed that dorsal and ventral ORs were co-expressed in the same Mecp2 mutant OSN, further implying that MeCP2 might be involved in regulation of OR territories in the OE. Our results suggested a new role of MeCP2 in OR gene choice and ratified that this multiple-OR expression caused by Mecp2 mutation did not accompany delayed OSN development that has been observed in the previous studies on the Mecp2 mutants.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) have identified several risk variants for late-onset Alzheimer's disease (LOAD)1, 2. These common variants have replicable but small effects on LOAD risk and generally do not have obvious functional effects. Low-frequency coding variants, not detected by GWAS, are predicted to include functional variants with larger effects on risk. To identify low-frequency coding variants with large effects on LOAD risk, we carried out whole-exome sequencing (WES) in 14 large LOAD families and follow-up analyses of the candidate variants in several large LOAD case–control data sets. A rare variant in PLD3 (phospholipase D3; Val232Met) segregated with disease status in two independent families and doubled risk for Alzheimer’s disease in seven independent case–control series with a total of more than 11,000 cases and controls of European descent. Gene-based burden analyses in 4,387 cases and controls of European descent and 302 African American cases and controls, with complete sequence data for PLD3, reveal that several variants in this gene increase risk for Alzheimer’s disease in both populations. PLD3 is highly expressed in brain regions that are vulnerable to Alzheimer’s disease pathology, including hippocampus and cortex, and is expressed at significantly lower levels in neurons from Alzheimer’s disease brains compared to control brains. Overexpression of PLD3 leads to a significant decrease in intracellular amyloid-β precursor protein (APP) and extracellular Aβ42 and Aβ40 (the 42- and 40-residue isoforms of the amyloid-β peptide), and knockdown of PLD3 leads to a significant increase in extracellular Aβ42 and Aβ40. Together, our genetic and functional data indicate that carriers of PLD3 coding variants have a twofold increased risk for LOAD and that PLD3 influences APP processing. This study provides an example of how densely affected families may help to identify rare variants with large effects on risk for disease or other complex traits.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A specimen of downy mildew on leaves of Sphagneticola trilobata found in northern Queensland was identified by a systematic approach as a novel species of Plasmopara. A new species, Plasmopara sphagneticolae, is proposed for this specimen, which differs from other species of Plasmopara by morphology, host range, and sequence data from nuclear-ribosomal DNA and mitochondrial DNA. Plasmopara sphagneticolae, together with P. halstedii, are downy mildews found on host species in the tribe Heliantheae (Asteraceae). Plasmopara halstedii causes downy mildew on Helianthus annuus, and is not present on sunflower in Australia. Phylogenetic analysis of the large subunit region of ribosomal DNA showed that P. sphagneticolae was sister to P. halstedii on sunflower.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Effective pest management relies on accurate delimitation of species and, beyond this, on accurate species identification. Mitochondrial COI sequences are useful for providing initial indications in delimiting species but, despite acknowledged limitations in the method, many studies involving COI sequences and species problems remain unresolved. Here we illustrate how such impasses can be resolved with microsatellite and nuclear sequence data, to assess more directly the amount of gene flow between divergent lineages. We use a population genetics approach to test for random mating between two 8 ± 2% divergent COI lineages of the rusty grain beetle, Cryptolestes ferrugineus (Stephens). This species has become strongly resistant to phosphine, a fumigant used worldwide for disinfesting grain. The possibility of cryptic species would have significant consequences for resistance management, especially if resistance was confined to one mitochondrial lineage. We find no evidence of restricted gene flow or nonrandom mating across the two COI lineages of these beetles, rather we hypothesize that historic population structure associated with early Pleistocene climate changes likely contributed to divergent lineages within this species.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Data files to accompany the article in Nature Communications, in press.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this study, the partial molar volumes of L-serine and L-threonine in aqueous solutions of ammonium sulfate at (0.0, 0.1, 0.3, 0.7, and 1.0) mol.kg(-1) are reported between 278.15 and 308.15 K. Transfer volumes and hydration numbers were obtained, which are larger in L-serine than in L-threonine. Dehydration of the amino acids is observed, rising with the temperature and salt molality. The data suggest that interactions between ions and charged/hydrophilic groups are predominant, and by applying the McMillan and Mayer formalism, it was concluded that they are mainly pair wise. The combination of the data presented in this study with solubility and molecular dynamics data suggests a stronger interaction of the ammonium cation with the zwitterionic centers of the amino acids when compared to the interactions of those centers with the sulfate anion.