194 resultados para sequence components
Resumo:
This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of components in a finite mixture model via a Zmix algorithm. Zmix provides a bridge between multidimensional samplers and test based estimation methods, whereby priors are chosen to encourage extra groups to have weights approaching zero. MCMC sampling is made possible by the implementation of prior parallel tempering, an extension of parallel tempering. Zmix can accurately estimate the number of components, posterior parameter estimates and allocation probabilities given a sufficiently large sample size. The results will reflect uncertainty in the final model and will report the range of possible candidate models and their respective estimated probabilities from a single run. Label switching is resolved with a computationally light-weight method, Zswitch, developed for overfitted mixtures by exploiting the intuitiveness of allocation-based relabelling algorithms and the precision of label-invariant loss functions. Four simulation studies are included to illustrate Zmix and Zswitch, as well as three case studies from the literature. All methods are available as part of the R package Zmix, which can currently be applied to univariate Gaussian mixture models.
Resumo:
Much interest has been expressed in the construct metacognition, the individual's knowledge and control of his own cognitive processes. Recent educational proposals have suggested the training of general metacognitive principles in schools. The exact nature of the construct has, however, remained vague. The aim of the present study was to provide some clarity. In a study of the metacognitive responses of 144 primary school children (aged 7‐11 years) four measures commonly used to assess metacognitive function were examined. First, the content of each measure was examined. Secondly, in an attempt to identify a metacognitive factor, commonality among the measures, both of developmental patterns and statistical relationship, was sought. Whilst a common pattern of development in the children's responses to the four measures was identified, factor analysis failed to provide evidence for a common metacognitive factor and unified construct.
Resumo:
This paper describes a vision-only system for place recognition in environments that are tra- versed at different times of day, when chang- ing conditions drastically affect visual appear- ance, and at different speeds, where places aren’t visited at a consistent linear rate. The ma- jor contribution is the removal of wheel-based odometry from the previously presented algo- rithm (SMART), allowing the technique to op- erate on any camera-based device; in our case a mobile phone. While we show that the di- rect application of visual odometry to our night- time datasets does not achieve a level of perfor- mance typically needed, the VO requirements of SMART are orthogonal to typical usage: firstly only the magnitude of the velocity is required, and secondly the calculated velocity signal only needs to be repeatable in any one part of the environment over day and night cycles, but not necessarily globally consistent. Our results show that the smoothing effect of motion constraints is highly beneficial for achieving a locally consis- tent, lighting-independent velocity estimate. We also show that the advantage of our patch-based technique used previously for frame recogni- tion, surprisingly, does not transfer to VO, where SIFT demonstrates equally good performance. Nevertheless, we present the SMART system us- ing only vision, which performs sequence-base place recognition in extreme low-light condi- tions where standard 6-DOF VO fails and that improves place recognition performance over odometry-less benchmarks, approaching that of wheel odometry.
Resumo:
This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case,and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex symbolic sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets.
Resumo:
Blood cells participate in vital physiological processes, and their numbers are tightly regulated so that homeostasis is maintained. Disruption of key regulatory mechanisms underlies many blood-related Mendelian diseases but also contributes to more common disorders, including atherosclerosis. We searched for quantitative trait loci (QTL) for hematology traits through a whole-genome association study, because these could provide new insights into both hemopoeitic and disease mechanisms. We tested 1.8 million variants for association with 13 hematology traits measured in 6015 individuals from the Australian and Dutch populations. These traits included hemoglobin composition, platelet counts, and red blood cell and white blood cell indices. We identified three regions of strong association that, to our knowledge, have not been previously reported in the literature. The first was located in an intergenic region of chromosome 9q31 near LPAR1, explaining 1.5% of the variation in monocyte counts (best SNP rs7023923, p=8.9x10(-14)). The second locus was located on chromosome 6p21 and associated with mean cell erythrocyte volume (rs12661667, p=1.2x10(-9), 0.7% variance explained) in a region that spanned five genes, including CCND3, a member of the D-cyclin gene family that is involved in hematopoietic stem cell expansion. The third region was also associated with erythrocyte volume and was located in an intergenic region on chromosome 6q24 (rs592423, p=5.3x10(-9), 0.6% variance explained). All three loci replicated in an independent panel of 1543 individuals (p values=0.001, 9.9x10(-5), and 7x10(-5), respectively). The identification of these QTL provides new opportunities for furthering our understanding of the mechanisms regulating hemopoietic cell fate.
Resumo:
The commonly used "end diagnosis" phenotype that is adopted in linkage and association studies of complex traits is likely to represent an oversimplified model of the genetic background of a disease. This is also likely to be the case for common types of migraine, for which no convincingly associated genetic variants have been reported. In headache disorders, most genetic studies have used end diagnoses of the International Headache Society (IHS) classification as phenotypes. Here, we introduce an alternative strategy; we use trait components--individual clinical symptoms of migraine--to determine affection status in genomewide linkage analyses of migraine-affected families. We identified linkage between several traits and markers on chromosome 4q24 (highest LOD score under locus heterogeneity [HLOD] 4.52), a locus we previously reported to be linked to the end diagnosis migraine with aura. The pulsation trait identified a novel locus on 17p13 (HLOD 4.65). Additionally, a trait combination phenotype (IHS full criteria) revealed a locus on 18q12 (HLOD 3.29), and the age at onset trait revealed a locus on 4q28 (HLOD 2.99). Furthermore, suggestive or nearly suggestive evidence of linkage to four additional loci was observed with the traits phonophobia (10q22) and aggravation by physical exercise (12q21, 15q14, and Xp21), and, interestingly, these loci have been linked to migraine in previous studies. Our findings suggest that the use of symptom components of migraine instead of the end diagnosis provides a useful tool in stratifying the sample for genetic studies.
Resumo:
The peptide hormone ghrelin is a potent orexigen produced predominantly in the stomach. It has a number of other biological actions, including roles in appetite stimulation, energy balance, the stimulation of growth hormone release and the regulation of cell proliferation. Recently, several ghrelin gene splice variants have been described. Here, we attempted to identify conserved alternative splicing of the ghrelin gene by cross-species sequence comparisons. We identified a novel human exon 2-deleted variant and provide preliminary evidence that this splice variant and in1-ghrelin encode a C-terminally truncated form of the ghrelin peptide, termed minighrelin. These variants are expressed in humans and mice, demonstrating conservation of alternative splicing spanning 90 million years. Minighrelin appears to have similar actions to full-length ghrelin, as treatment with exogenous minighrelin peptide stimulates appetite and feeding in mice. Forced expression of the exon 2-deleted preproghrelin variant mirrors the effect of the canonical preproghrelin, stimulating cell proliferation and migration in the PC3 prostate cancer cell line. This is the first study to characterise an exon 2-deleted preproghrelin variant and to demonstrate sequence conservation of ghrelin gene-derived splice variants that encode a truncated ghrelin peptide. This adds further impetus for studies into the alternative splicing of the ghrelin gene and the function of novel ghrelin peptides in vertebrates.
Resumo:
Expressed sequence tag (EST) databases provide a primary source of nuclear DNA sequences for genetic marker development in non-model organisms. To date, the process has been relatively inefficient for several reasons: - 1) priming site polymorphism in the template leads to inferior or erratic amplification; - 2) introns in the target amplicon are too large and/or numerous to allow effective amplification under standard screening conditions, and; - 3) at least occasionally, a PCR primer straddles an exon–intron junction and is unable to bind to genomic DNA template. The first is only a minor issue for species or strains with low heterozygosity but becomes a significant problem for species with high genomic variation, such as marine organisms with extremely large effective population sizes. Problems arising from unanticipated introns are unavoidable but are most pronounced in intron-rich species, such as vertebrates and lophotrochozoans. We present an approach to marker development in the Pacific oyster Crassostrea gigas, a highly polymorphic and intron-rich species, which minimizes these problems, and should be applicable to other non-model species for which EST databases are available. Placement of PCR primers in the 3′ end of coding sequence and 3′ UTR improved PCR success rate from 51% to 97%. Almost all (37 of 39) markers developed for the Pacific oyster were polymorphic in a small test panel of wild and domesticated oysters.
Resumo:
The incidence of human infections by the fungal pathogen Candida species has been increasing in recent years. Enolase is an essential protein in fungal metabolism. Sequence data is available for human and a number of medically important fungal species. An understanding of the structural and functional features of fungal enolases may provide the structural basis for their use as a target for the development of new anti-fungal drugs. We have obtained the sequence of the enolase of Candida krusei (C. krusei), as it is a significant medically important fungal pathogen. We have then used multiple sequence alignments with various enolase isoforms in order to identify C. krusei specific amino acid residues. The phylogenetic tree of enolases shows that the C. krusei enolase assembles on the tree with the fungal genes. Importantly, C. krusei lacks four amino acids in the active site compared to human enolase, as revealed by multiple sequence alignments. These differences in the substrate binding site may be exploited for the design of new anti-fungal drugs to selectively block this enzyme. The lack of the important amino acids in the active site also indicates that C. krusei enolase might have evolved as a member of a mechanistically diverse enolase superfamily catalying somewhat different reactions.
Resumo:
The genome sequence of Caloramator mitchellensis strain VF08, a rod-shaped, heterotrophic, strictly anaerobic bacterium iso-lated from the free-flowing waters of a Great Artesian Basin (GAB) bore well located in Mitchell, an outback Queensland town in Australia, is reported here. The analysis of the 2.42-Mb genome sequence indicates that the attributes of the genome are consistent with its physiological and phenotypic traits.
Resumo:
The complete genome of an Australian isolate of zantedeschia mild mosaic virus (ZaMMV) causing mosaic symptoms on Alocasia sp. (designated ZaMMVAU) was cloned and sequenced. The genome comprises 9942 nucleotides (excluding the poly-A tail) and encodes a polyprotein of 3167 amino acids. The sequence is most closely related to a previously reported ZaMMV isolate from Taiwan (ZaMMV-TW), with 82 and 86 % identity at the nucleotide and amino acid level, respectively. Unlike the amino acid sequence of ZaMMV-TW, however, ZaMMV-AU does not contain a polyglutamine stretch at the N-terminus of the coat-protein-coding region upstream of the DAG motif. This is the first report of ZaMMV from Australia and from Alocasia sp.
Resumo:
We report the first genome sequence of a Colocasia bobone disease-associated virus (CBDaV) derived from bobone-affected taro [Colocasia esculenta L. Schott] from Solomon Islands. The negative-strand RNA genome is 12,193 nt long, with six major open reading frames (ORFs) with the arrangement 3′-N-P-P3-M-G-L-5′. Typical of all rhabdoviruses, the 3′ leader and 5′ trailer sequences show complementarity to each other. Phylogenetic analysis indicated that CBDaV is a member of the genus Cytorhabdovirus, supporting previous reports of virus particles within the cytoplasm of bobone-infected taro cells. The availability of the CBDaV genome sequence now makes it possible to assess the role of this virus in bobone, and possibly alomae disease of taro and confirm that this sequence is that of Colocasia bobone disease virus (CBDV).
Resumo:
Laskowski inhibitors regulate serine proteases by an intriguing mode of action that involves deceiving the protease into synthesizing a peptide bond. Studies exploring naturally occurring Laskowski inhibitors have uncovered several structural features that convey the inhibitor's resistance to hydrolysis and exceptional binding affinity. However, in the context of Laskowski inhibitor engineering, the way that various modifications intended to fine-tune an inhibitor's potency and selectivity impact on its association and dissociation rates remains unclear. This information is important as Laskowski inhibitors are becoming increasingly used as design templates to develop new protease inhibitors for pharmaceutical applications. In this study, we used the cyclic peptide, sunflower trypsin inhibitor-1 (SFTI-1), as a model system to explore how the inhibitor's sequence and structure relate to its binding kinetics and function. Using enzyme assays, MD simulations and NMR spectroscopy to study SFTI variants with diverse sequence and backbone modifications, we show that the geometry of the binding loop mainly influences the inhibitor's potency by modulating the association rate, such that variants lacking a favourable conformation show dramatic losses in activity. Additionally, we show that the inhibitor's sequence (including both the binding loop and its scaffolding) influences its potency and selectivity by modulating both the association and the dissociation rates. These findings provide new insights into protease inhibitor function and design that we apply by engineering novel inhibitors for classical serine proteases, trypsin and chymotrypsin and two kallikrein-related peptidases (KLK5 and KLK14) that are implicated in various cancers and skin diseases.
Resumo:
This chapter defines food literacy and its components using the empirical data collected in two studies undertaken in 2010 and 2011 as part of the author’s PhD thesis. The first was a Delphi study of Australian food experts and the second was a study of young adults across a spectrum of disadvantage. Defining food literacy and identifying its components was an iterative process. At different times throughout the research, each study informed the other. This chapter will describe the components of food literacy, the data used to identify them and how they combined to produce a definition of food literacy.