46 resultados para Computational biology and bioinformatics
Resumo:
We present a method of estimating HIV incidence rates in epidemic situations from data on age-specific prevalence and changes in the overall prevalence over time. The method is applied to women attending antenatal clinics in Hlabisa, a rural district of KwaZulu/Natal, South Africa, where transmission of HIV is overwhelmingly through heterosexual contact. A model which gives age-specific prevalence rates in the presence of a progressing epidemic is fitted to prevalence data for 1998 using maximum likelihood methods and used to derive the age-specific incidence. Error estimates are obtained using a Monte Carlo procedure. Although the method is quite general some simplifying assumptions are made concerning the form of the risk function and sensitivity analyses are performed to explore the importance of these assumptions. The analysis shows that in 1998 the annual incidence of infection per susceptible woman increased from 5.4 per cent (3.3-8.5 per cent; here and elsewhere ranges give 95 per cent confidence limits) at age 15 years to 24.5 per cent (20.6-29.1 per cent) at age 22 years and declined to 1.3 per cent (0.5-2.9 per cent) at age 50 years; standardized to a uniform age distribution, the overall incidence per susceptible woman aged 15 to 59 was 11.4 per cent (10.0-13.1 per cent); per women in the population it was 8.4 per cent (7.3-9.5 per cent). Standardized to the age distribution of the female population the average incidence per woman was 9.6 per cent (8.4-11.0 per cent); standardized to the age distribution of women attending antenatal clinics, it was 11.3 per cent (9.8-13.3 per cent). The estimated incidence depends on the values used for the epidemic growth rate and the AIDS related mortality. To ensure that, for this population, errors in these two parameters change the age specific estimates of the annual incidence by less than the standard deviation of the estimates of the age specific incidence, the AIDS related mortality should be known to within +/-50 per cent and the epidemic growth rate to within +/-25 per cent, both of which conditions are met. In the absence of cohort studies to measure the incidence of HIV infection directly, useful estimates of the age-specific incidence can be obtained from cross-sectional, age-specific prevalence data and repeat cross-sectional data on the overall prevalence of HIV infection. Several assumptions were made because of the lack of data but sensitivity analyses show that they are unlikely to affect the overall estimates significantly. These estimates are important in assessing the magnitude of the public health problem, for designing vaccine trials and for evaluating the impact of interventions. Copyright (C) 2001 John Wiley & Sons, Ltd.
Resumo:
Our current, still limited, understanding of the comparative biology and evolution of polydnaviruses (PDVs) is reviewed, especially in the context of the possible origins of these parasitoid viruses and of their coevolution with carrier wasps. A hypothetical scenario of evolution of PDVs from ascovirus (or ascovirus-like) ancestors is presented, with examples of apparent extant transitional forms. PDVs appear, in the case of bracoviruses, to show phylogenetic relationships that mirror those of their wasp carriers: with ichno-viruses, the picture is less clear. Ongoing sequencing studies of entire PDV genomes from diverse wasp species are likely to greatly contribute to our understanding of PDV evolution. (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
A two-component survival mixture model is proposed to analyse a set of ischaemic stroke-specific mortality data. The survival experience of stroke patients after index stroke may be described by a subpopulation of patients in the acute condition and another subpopulation of patients in the chronic phase. To adjust for the inherent correlation of observations due to random hospital effects, a mixture model of two survival functions with random effects is formulated. Assuming a Weibull hazard in both components, an EM algorithm is developed for the estimation of fixed effect parameters and variance components. A simulation study is conducted to assess the performance of the two-component survival mixture model estimators. Simulation results confirm the applicability of the proposed model in a small sample setting. Copyright (C) 2004 John Wiley Sons, Ltd.
Resumo:
Keratins are the major structural proteins of keratinocytes, which are the most abundant cell type in the mammalian epidermis. Mutations in epidermal keratin genes have been shown to cause severe blistering skin abnormalities. One such disease, epidermolytic hyperkeratosis (EHK), also known as bullous congenital ichthyosiform erythroderma, occurs as a result of mutations in highly conserved regions of keratins K1 and K10. Patients with EHK first exhibit erythroderma with severe blistering, which later is replaced by thick patches of scaly skin. To assess the effect of a mutated K1 gene on skin biology and to produce an animal model for EHK, we removed 60 residues from the 2B segment of HK1 and observed the effects of its expression in the epidermis of transgenic mice. Phenotypes of the resultant mice closely resembled those observed in the human disease, first with epidermal blisters, then later with hyperkeratotic lesions. In neonatal mice homozygous for the transgene, the skin was thicker, with an increased labeling index, and the spinous cells showed a collapse of the keratin filament network around the nuclei, suggesting that a critical concentration of the mutant HK1, over the endogenous MK1, was required to disrupt the structural integrity of the spinous cells. Additionally, footpad epithelium, which is devoid of hair follicles, showed blistering in the spinous layer, suggesting that hair follicles can stabilize or protect the epidermis from trauma. Blisters were not evident in adult mice, but instead they showed a thick, scaly hyperkeratotic skin with increased mitosis, resulting in an increased number of corneocytes and granular cells. Irregularly shaped keratohyalin granules were also observed. To date, this is the only transgenic model to show the typical morphology found in the adult form of EHK.
Resumo:
Plant transformation is now a core research tool in plant biology and a practical tool for cultivar improvement. There are verified methods for stable introduction of novel genes into the nuclear genomes of over 120 diverse plant species. This review examines the criteria to verify plant transformation; the biological and practical requirements for transformation systems; the integration of tissue culture, gene transfer, selection, and transgene expression strategies to achieve transformation in recalcitrant species; and other constraints to plant transformation including regulatory environment, public perceptions, intellectual property, and economics. Because the costs of screening populations showing diverse genetic changes can far exceed the costs of transformation, it is important to distinguish absolute and useful transformation efficiencies. The major technical challenge facing plant transformation biology is the development of methods and constructs to produce a high proportion of plants showing predictable transgene expression without collateral genetic damage. This will require answers to a series of biological and technical questions, some of which are defined.
Resumo:
A mixture model incorporating long-term survivors has been adopted in the field of biostatistics where some individuals may never experience the failure event under study. The surviving fractions may be considered as cured. In most applications, the survival times are assumed to be independent. However, when the survival data are obtained from a multi-centre clinical trial, it is conceived that the environ mental conditions and facilities shared within clinic affects the proportion cured as well as the failure risk for the uncured individuals. It necessitates a long-term survivor mixture model with random effects. In this paper, the long-term survivor mixture model is extended for the analysis of multivariate failure time data using the generalized linear mixed model (GLMM) approach. The proposed model is applied to analyse a numerical data set from a multi-centre clinical trial of carcinoma as an illustration. Some simulation experiments are performed to assess the applicability of the model based on the average biases of the estimates formed. Copyright (C) 2001 John Wiley & Sons, Ltd.
Resumo:
Eukaryotic phenotypic diversity arises from multitasking of a core proteome of limited size. Multitasking is routine in computers, as well as in other sophisticated information systems, and requires multiple inputs and outputs to control and integrate network activity. Higher eukaryotes have a mosaic gene structure with a dual output, mRNA (protein-coding) sequences and introns, which are released from the pre-mRNA by posttranscriptional processing. Introns have been enormously successful as a class of sequences and comprise up to 95% of the primary transcripts of protein-coding genes in mammals. In addition, many other transcripts (perhaps more than half) do not encode proteins at all, but appear both to be developmentally regulated and to have genetic function. We suggest that these RNAs (eRNAs) have evolved to function as endogenous network control molecules which enable direct gene-gene communication and multitasking of eukaryotic genomes. Analysis of a range of complex genetic phenomena in which RNA is involved or implicated, including co-suppression, transgene silencing, RNA interference, imprinting, methylation, and transvection, suggests that a higher-order regulatory system based on RNA signals operates in the higher eukaryotes and involves chromatin remodeling as well as other RNA-DNA, RNA-RNA, and RNA-protein interactions. The evolution of densely connected gene networks would be expected to result in a relatively stable core proteome due to the multiple reuse of components, implying,that cellular differentiation and phenotypic variation in the higher eukaryotes results primarily from variation in the control architecture. Thus, network integration and multitasking using trans-acting RNA molecules produced in parallel with protein-coding sequences may underpin both the evolution of developmentally sophisticated multicellular organisms and the rapid expansion of phenotypic complexity into uncontested environments such as those initiated in the Cambrian radiation and those seen after major extinction events.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.
Resumo:
This paper presents a method of evaluating the expected value of a path integral for a general Markov chain on a countable state space. We illustrate the method with reference to several models, including birth-death processes and the birth, death and catastrophe process. (C) 2002 Elsevier Science Inc. All rights reserved.
Resumo:
In a typical isolated organ perfusion experiment, a substance is injected upstream of an organ and then collected at some distance downstream. To reach the organ from the injection site, and then from the organ to the collector, a solute passes through catheters, usually tubes with circular cross-sections. Catheters cause distortion to the concentration-time profile of the perfusion. In this paper, we analyse catheter distribution kinetics from a mathematical point of view, develop the function most suitable for modeling this distribution and successfully apply this function to experimental data. (C) 2002 Academic Press.
Resumo:
In this paper we refer to the gene-to-phenotype modeling challenge as the GP problem. Integrating information across levels of organization within a genotype-environment system is a major challenge in computational biology. However, resolving the GP problem is a fundamental requirement if we are to understand and predict phenotypes given knowledge of the genome and model dynamic properties of biological systems. Organisms are consequences of this integration, and it is a major property of biological systems that underlies the responses we observe. We discuss the E(NK) model as a framework for investigation of the GP problem and the prediction of system properties at different levels of organization. We apply this quantitative framework to an investigation of the processes involved in genetic improvement of plants for agriculture. In our analysis, N genes determine the genetic variation for a set of traits that are responsible for plant adaptation to E environment-types within a target population of environments. The N genes can interact in epistatic NK gene-networks through the way that they influence plant growth and development processes within a dynamic crop growth model. We use a sorghum crop growth model, available within the APSIM agricultural production systems simulation model, to integrate the gene-environment interactions that occur during growth and development and to predict genotype-to-phenotype relationships for a given E(NK) model. Directional selection is then applied to the population of genotypes, based on their predicted phenotypes, to simulate the dynamic aspects of genetic improvement by a plant-breeding program. The outcomes of the simulated breeding are evaluated across cycles of selection in terms of the changes in allele frequencies for the N genes and the genotypic and phenotypic values of the populations of genotypes.
Resumo:
We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright (C) 2003 John Wiley Sons, Ltd.
Resumo:
Recent population studies have demonstrated an association with the red-hair and fair-skin phenotype with variant alleles of the melanocortin-1 receptor (MC1R) which result in amino acid substitutions within the coding region leading to an altered receptor activity. In particular, Arg151Cys, Arg160Trp and Asp294His were the most commonly associated variants seen in the south-east Queensland population with at least one of these alleles found in 93% of those with red hair. In order to study the individual effects of these variants on melanocyte biology and melanocytic pigmentation, we established a series of human melanocyte strains genotyped for the MC1R receptor which included wild-type consensus, variant heterozygotes, compound heterozygotes and homozygotes for Arg151Cys, Arg160Trp, Val60Leu and Val92Met alleles. These strains ranged from darkly pigmented to amelanotic, with all strains of consensus sequence having dark pigmentation. UV sensitivity was found not to be associated with either MC1R genotype or the level of pigmentation with a range of sensitivities seen across all genotypes. Ultrastructural analysis demonstrated that while consensus strains contained stage IV melanosomes in their terminal dendrites, Arg151Cys and Arg160Trp homozygote strains contained only stage II melanosomes. This was despite being able to show expression of tyrosinase and tyrosinase-related protein-1 markers, although at reduced levels and an ability to convert exogenous 3,4-dihydroxyphenyl-alanine (DOPA) to melanin in these strains.
Resumo:
A number of studies indicated that lineages of animals with high rates of mitochondrial (mt) gene rearrangement might have high rates of mt nucleotide substitution. We chose the hemipteroid assemblage and the Insecta to test the idea that rates of mt gene rearrangement and mt nucleotide substitution are correlated. For this purpose, we sequenced the mt genome of a lepidopsocid from the Psocoptera, the only order of hemipteroid insects for which an entire mtDNA sequence is not available. The mt genome of this lepidopsocid is circular, 16,924 bp long, and contains 37 genes and a putative control region; seven tRNA genes and a protein-coding gene in this genome have changed positions relative to the ancestral arrangement of mt genes of insects. We then compared the relative rates of nucleotide substitution among species from each of the four orders of hemipteroid insects and among the 20 insects whose mt genomes have been sequenced entirely. All comparisons among the hernipteroid insects showed that species with higher rates of gene rearrangement also had significantly higher rates of nucleotide substitution statistically than did species with lower rates of gene rearrangement. In comparisons among the 20 insects, where the mt genomes of the two species differed by more than five breakpoints, the more rearranged species always had a significantly higher rate of nucleotide substitution than the less rearranged species. However, in comparisons where the mt genomes of two species differed by five or less breakpoints, the more rearranged species did not always have a significantly higher rate of nucleotide substitution than the less rearranged species. We tested the statistical significance of the correlation between the rates of mt gene rearrangement and mt nucleotide substitution with nine pairs of insects that were phylogenetically independent from one 2 another. We found that the correlation was positive and statistically significant (R-2 = 0.73, P = 0.01; R-s = 0.67, P < 0.05). We propose that increased rates of nucleotide substitution may lead to increased rates of gene rearrangement in the mt genomes of insects.