3 resultados para Canonical structure
em Brock University, Canada
Resumo:
The nucleotide sequence of a genomic DNA fragment thought previously to contain the dihydrofolate reductase gene (DFR1) of Saccharomyces cerevisiae by genetic criteria was determined. This DNA fragment of 1784' basepairs contains a large open reading frame from position 800 to 1432, which encodes a enzyme with a predicted molecular weight of 24,229.8 Daltons. Analysis of the amino acid sequence of this protein revealed that the yeast polypep·tide contained 211 amino acids, compared to the 186 residues commonly found in the polypeptides of other eukaryotes. The difference in size of the gene product can be attributed mainly to an insert in the yeast gene. Within this region, several consensus sequences required for processing of yeast nuclear and class II mitochondrial introns were identified, but appear not sufficient for the RNA splicing. The primary structure of the yeast DHFR protein has considerable sequence homology with analogous polypeptides from other organisms, especially in the consensus residues involved in cofactor and/or inhibitor binding. Analysis of the nucleotide sequence also revealed the presence of a number of canonical sequences identified in yeast as having some function in the regulation of gene expression. These include UAS elements (TGACTC) required for tIle amino acid general control response, and "TATA H boxes as well as several consensus sequences thought to be required for transcriptional termination and polyadenylation. Analysis of the codon usage of the yeast DFRl coding region revealed a codon bias index of 0.0083. this valve very close to zero suggestes 3 that the gene is expressed at a relatively low level under normal physiological conditions. The information concerning the organization of the DFRl were used to construct a variety of fusions of its 5' regulatory region with the coding region of the lacZ gene of E. coli. Some of such fused genes encoded a fusion product that expressed in E.coli and/or in yeast under the control of the 5' regulatory elements of the DFR1. Further studies with these fusion constructions revealed that the beta-galactosidase activity encoded on multicopy plasmids was stimulated transiently by prior exposure of yeast host cells to UV light. This suggests that the yeast PFRl gene is indu.ced by UV light and nlay in1ply a novel function of DHFR protein in the cellular responses to DNA damage. Another novel f~ature of yeast DHFR was revealed during preliminary studies of a diploid strain containing a heterozygous DFRl null allele. The strain was constructed by insertion of a URA3 gene within the coding region of DFR1. Sporulation of this diploid revealed that meiotic products segregated 2:0 for uracil prototrophy when spore clones were germinated on medium supplemented with 5-formyltetrahydrofolate (folinic acid). This finding suggests that, in addition to its catalytic activity, the DFRl gene product nlay play some role in the anabolisln of folinic acid. Alternatively, this result may indicate that Ura+ haploid segregants were inviable and suggest that the enzyme has an essential cellular function in this species.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.