6 resultados para Data Coding.

em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thaumastocoris peregrinus is a recently introduced invertebrate pest of non-native Eucalyptus plantations in the Southern Hemisphere. It was first reported from South Africa in 2003 and in Argentina in 2005. Since then, populations have grown explosively and it has attained an almost ubiquitous distribution over several regions in South Africa on 26 Eucalyptus species. Here we address three key questions regarding this invasion, namely whether only one species has been introduced, whether there were single or multiple introductions into South Africa and South America and what the source of the introduction might have been. To answer these questions, bar-coding using mitochondrial DNA (COI) sequence diversity was used to characterise the populations of this insect from Australia, Argentina, Brazil, South Africa and Uruguay. Analyses revealed three cryptic species in Australia, of which only T. peregrinus is represented in South Africa and South America. Thaumastocoris peregrinus populations contained eight haplotypes, with a pairwise nucleotide distance of 0.2-0.9% from seventeen locations in Australia. Three of these haplotypes are shared with populations in South America and South Africa, but the latter regions do not share haplotypes. These data, together with the current distribution of the haplotypes and the known direction of original spread in these regions, suggest that at least three distinct introductions of the insect occurred in South Africa and South America before 2005. The two most common haplotypes in Sydney, one of which was also found in Brisbane, are shared with the non-native regions. Sydney populations of T. peregrinus, which have regularly reached outbreak levels in recent years, might thus have served as source of these three distinct introductions into other regions of the Southern Hemisphere.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.Results: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.Conclusions: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The data mining of Eucalyptus ESTs genome finds four clusters (EGCEST2257E11.g, EGBGRT3213F11.g, and EGCCFB1223H11.g) from highly conservative 14-3-3 protein family which modulates a wide variety of cellular processes. Multiple alignments were built from twenty four sequences of 14-3-3 proteins searched into the GenBank databases and into the four pools of Eucalyptus genome programs. The alignment has shown two regions highly conservative on the sequences corresponding to the motifs of protein phosphorylation and nine highly conservative regions on the sequence corresponding to the linkage regions of alpha helices structure based on three dimensional of dimer functional structure. The differences of amino acid into the structural and functional domains of 14-3-3 plant protein were identified and can explain the functional diversity of different isoforms. The phylogenic protein trees were built by the maximum parsimony and neighborjoining procedures of Clustal X alignments and PAUP software for phylogenic analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The 3'-terminal 853 nt (and the putative 283 aa) sequence of the VP2-encoding gene from 29 field strains of porcine parvovirus (PPV) were determined and compared both to each other and with other published sequences. Sequences were examined using maximum-parsimony and statistical analyses for nucleotide diversity and sequence variability. Among the nucleotide sequences of the PPV field strains, 26 polymorphic sites were encountered; 22 polymorphic sites were detected in the putative amino acid sequence. Mapping polymorphic sites of protein data onto the three-dimensional (3D) structure of PPV VP2 revealed that almost all substitutions were located on the external surface of the viral capsid. Mapping amino acid substitutions to the alignment between PPV VP2 sequences and the 3D structure of canine parvovirus (CPV) capsid, many PPV substitutions were observed to map to regions of recognized antigenicity and/or to contain phenotypically important residues for CPV and other parvoviruses. In spite of the high sequence similarity, genetic analysis has shown the existence of at least two virus lineages among the samples. In conclusion, these results highlight the need for close surveillance on PPV genetic drift, with an assessment of its potential ability to modify the antigenic make-up of the virus.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Despite the wide distribution of transposable elements (TEs) in mammalian genomes, part of their evolutionary significance remains to be discovered. Today there is a substantial amount of evidence showing that TEs are involved in the generation of new exons in different species. In the present study, we searched 22,805 genes and reported the occurrence of TE-cassettes in coding sequences of 542 cow genes using the RepeatMasker program. Despite the significant number (542) of genes with TE insertions in exons only 14 (2.6%) of them were translated into protein, which we characterized as chimeric genes. From these chimeric genes, only the FAST kinase domains 3 (FASTKD3) gene, present on chromosome BTA 20, is a functional gene and showed evidence of the exaptation event. The genome sequence analysis showed that the last exon coding sequence of bovine FASTKD3 is ∼85% similar to the ART2A retrotransposon sequence. In addition, comparison among FASTKD3 proteins shows that the last exon is very divergent from those of Homo sapiens, Pan troglodytes and Canis familiares. We suggest that the gene structure of bovine FASTKD3 gene could have originated by several ectopic recombinations between TE copies. Additionally, the absence of TE sequences in all other species analyzed suggests that the TE insertion is clade-specific, mainly in the ruminant lineage. ©FUNPEC-RP.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

HLA-E is a non-classical Human Leucocyte Antigen class I gene with immunomodulatory properties. Whereas HLA-E expression usually occurs at low levels, it is widely distributed amongst human tissues, has the ability to bind self and non-self antigens and to interact with NK cells and T lymphocytes, being important for immunosurveillance and also for fighting against infections. HLA-E is usually the most conserved locus among all class I genes. However, most of the previous studies evaluating HLA-E variability sequenced only a few exons or genotyped known polymorphisms. Here we report a strategy to evaluate HLA-E variability by next-generation sequencing (NGS) that might be used to other HLA loci and present the HLA-E haplotype diversity considering the segment encoding the entire HLA-E mRNA (including 5'UTR, introns and the 3'UTR) in two African population samples, Susu from Guinea-Conakry and Lobi from Burkina Faso. Our results indicate that (a) the HLA-E gene is indeed conserved, encoding mainly two different protein molecules; (b) Africans do present several unknown HLA-E alleles presenting synonymous mutations; (c) the HLA-E 3'UTR is quite polymorphic and (d) haplotypes in the HLA-E 3'UTR are in close association with HLA-E coding alleles. NGS has proved to be an important tool on data generation for future studies evaluating variability in non-classical MHC genes.