944 resultados para multiple-locus variable-number tandem repeat analysis
Resumo:
Next-generation sequencing techniques such as exome sequencing can successfully detect all genetic variants in a human exome and it has been useful together with the implementation of variant filters to identify causing-disease mutations. Two filters aremainly used for the mutations identification: low allele frequency and the computational annotation of the genetic variant. Bioinformatic tools to predict the effect of a givenvariant may have errors due to the existing bias in databases and sometimes show a limited coincidence among them. Advances in functional and comparative genomics are needed in order to properly annotate these variants.The goal of this study is to: first, functionally annotate Common Variable Immunodeficiency disease (CVID) variants with the available bioinformatic methods in order to assess the reliability of these strategies. Sencondly, as the development of new methods to reduce the number of candidate genetic variants is an active and necessary field of research, we are exploring the utility of gene function information at organism level as a filter for rare disease genes identification. Recently, it has been proposed that only 10-15% of human genes are essential and therefore we would expect that severe rare diseases are mostly caused by mutations on them. Our goal is to determine whether or not these rare and severe diseases are caused by deleterious mutations in these essential genes. If this hypothesis were true, taking into account essential genes as a filter would be an interesting parameter to identify causingdisease mutations.
Resumo:
Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.
Resumo:
Gait analysis methods to estimate spatiotemporal measures, based on two, three or four gyroscopes attached on lower limbs have been discussed in the literature. The most common approach to reduce the number of sensing units is to simplify the underlying biomechanical gait model. In this study, we propose a novel method based on prediction of movements of thighs from movements of shanks. Datasets from three previous studies were used. Data from the first study (ten healthy subjects and ten with Parkinson's disease) were used to develop and calibrate a system with only two gyroscopes attached on shanks. Data from two other studies (36 subjects with hip replacement, seven subjects with coxarthrosis, and eight control subjects) were used for comparison with the other methods and for assessment of error compared to a motion capture system. Results show that the error of estimation of stride length compared to motion capture with the system with four gyroscopes and our new method based on two gyroscopes was close ( -0.8 ±6.6 versus 3.8 ±6.6 cm). An alternative with three sensing units did not show better results (error: -0.2 ±8.4 cm). Finally, a fourth that also used two units but with a simpler gait model had the highest bias compared to the reference (error: -25.6 ±7.6 cm). We concluded that it is feasible to estimate movements of thighs from movements of shanks to reduce number of needed sensing units from 4 to 2 in context of ambulatory gait analysis.
Resumo:
Due to the low workability of slipform concrete mixtures, the science of rheology is not strictly applicable for such concrete. However, the concept of rheological behavior may still be considered useful. A novel workability test method (Vibrating Kelly Ball or VKelly test) that would quantitatively assess the responsiveness of a dry concrete mixture to vibration, as is desired of a mixture suitable for slipform paving, was developed and evaluated. The objectives of this test method are for it to be cost-effective, portable, and repeatable while reporting the suitability of a mixture for use in slipform paving. The work to evaluate and refine the test was conducted in three phases: 1. Assess whether the VKelly test can signal variations in laboratory mixtures with a range of materials and proportions 2. Run the VKelly test in the field at a number of construction sites 3. Validate the VKelly test results using the Box Test developed at Oklahoma State University for slipform paving concrete The data collected to date indicate that the VKelly test appears to be suitable for assessing a mixture’s response to vibration (workability) with a low multiple operator variability. A unique parameter, VKelly Index, is introduced and defined that seems to indicate that a mixture is suitable for slipform paving when it falls in the range of 0.8 to 1.2 in./√s.
Resumo:
Quantitative trait loci analysis of natural Arabidopsis thaliana accessions is increasingly exploited for gene isolation. However, to date this has mostly revealed deleterious mutations. Among them, a loss-of-function allele identified the root growth regulator BREVIS RADIX (BRX). Here we present evidence that BRX and the paralogous BRX-LIKE (BRXL) genes are under selective constraint in monocotyledons as well as dicotyledons. Unexpectedly, however, whereas none of the Arabidopsis orthologs except AtBRXL1 could complement brx null mutants when expressed constitutively, nearly all monocotyledon BRXLs tested could. Thus, BRXL proteins seem to be more diversified in dicotyledons than in monocotyledons. This functional diversification was correlated with accelerated rates of sequence divergence in the N-terminal regions. Population genetic analyses of 30 haplotypes are suggestive of an adaptive role of AtBRX and AtBRXL1. In two accessions, Lc-0 and Lov-5, seven amino acids are deleted in the variable region between the highly conserved C-terminal, so-called BRX domains. Genotyping of 42 additional accessions also found this deletion in Kz-1, Pu2-7, and Ws-0. In segregating recombinant inbred lines, the Lc-0 allele (AtBRX(Lc-0)) conferred significantly enhanced root growth. Moreover, when constitutively expressed in the same regulatory context, AtBRX(Lc-0) complemented brx mutants more efficiently than an allele without deletion. The same was observed for AtBRXL1, which compared with AtBRX carries a 13 amino acid deletion that encompasses the deletion found in AtBRX(Lc-0). Thus, the AtBRX(Lc-0) allele seems to contribute to natural variation in root growth vigor and provides a rare example of an experimentally confirmed, hyperactive allelic variant.
Resumo:
Mouse NK cells express MHC class I-specific inhibitory Ly49 receptors. Since these receptors display distinct ligand specificities and are clonally distributed, their expression generates a diverse NK cell receptor repertoire specific for MHC class I molecules. We have previously found that the Dd (or Dk)-specific Ly49A receptor is usually expressed from a single allele. However, a small fraction of short-term NK cell clones expressed both Ly49A alleles, suggesting that the two Ly49A alleles are independently and randomly expressed. Here we show that the genes for two additional Ly49 receptors (Ly49C and Ly49G2) are also expressed in a (predominantly) mono-allelic fashion. Since single NK cells can co-express multiple Ly49 receptors, we also investigated whether mono-allelic expression from within the tightly linked Ly49 gene cluster is coordinate or independent. Our clonal analysis suggests that the expression of alleles of distinct Ly49 genes is not coordinate. Thus Ly49 alleles are apparently independently and randomly chosen for stable expression, a process that directly restricts the number of Ly49 receptors expressed per single NK cell. We propose that the Ly49 receptor repertoire specific for MHC class I is generated by an allele-specific, stochastic gene expression process that acts on the entire Ly49 gene cluster.
Resumo:
In the administration, planning, design, and maintenance of road systems, transportation professionals often need to choose between alternatives, justify decisions, evaluate tradeoffs, determine how much to spend, set priorities, assess how well the network meets traveler needs, and communicate the basis for their actions to others. A variety of technical guidelines, tools, and methods have been developed to help with these activities. Such work aids include design criteria guidelines, design exception analysis methods, needs studies, revenue allocation schemes, regional planning guides, designation of minimum standards, sufficiency ratings, management systems, point based systems to determine eligibility for paving, functional classification, and bridge ratings. While such tools play valuable roles, they also manifest a number of deficiencies and are poorly integrated. Design guides tell what solutions MAY be used, they aren't oriented towards helping find which one SHOULD be used. Design exception methods help justify deviation from design guide requirements but omit consideration of important factors. Resource distribution is too often based on dividing up what's available rather than helping determine how much should be spent. Point systems serve well as procedural tools but are employed primarily to justify decisions that have already been made. In addition, the tools aren't very scalable: a system level method of analysis seldom works at the project level and vice versa. In conjunction with the issues cited above, the operation and financing of the road and highway system is often the subject of criticisms that raise fundamental questions: What is the best way to determine how much money should be spent on a city or a county's road network? Is the size and quality of the rural road system appropriate? Is too much or too little money spent on road work? What parts of the system should be upgraded and in what sequence? Do truckers receive a hidden subsidy from other motorists? Do transportation professions evaluate road situations from too narrow of a perspective? In considering the issues and questions the author concluded that it would be of value if one could identify and develop a new method that would overcome the shortcomings of existing methods, be scalable, be capable of being understood by the general public, and utilize a broad viewpoint. After trying out a number of concepts, it appeared that a good approach would be to view the road network as a sub-component of a much larger system that also includes vehicles, people, goods-in-transit, and all the ancillary items needed to make the system function. Highway investment decisions could then be made on the basis of how they affect the total cost of operating the total system. A concept, named the "Total Cost of Transportation" method, was then developed and tested. The concept rests on four key principles: 1) that roads are but one sub-system of a much larger 'Road Based Transportation System', 2) that the size and activity level of the overall system are determined by market forces, 3) that the sum of everything expended, consumed, given up, or permanently reserved in building the system and generating the activity that results from the market forces represents the total cost of transportation, and 4) that the economic purpose of making road improvements is to minimize that total cost. To test the practical value of the theory, a special database and spreadsheet model of Iowa's county road network was developed. This involved creating a physical model to represent the size, characteristics, activity levels, and the rates at which the activities take place, developing a companion economic cost model, then using the two in tandem to explore a variety of issues. Ultimately, the theory and model proved capable of being used in full system, partial system, single segment, project, and general design guide levels of analysis. The method appeared to be capable of remedying many of the existing work method defects and to answer society's transportation questions from a new perspective.
Resumo:
Histone H1 in the parasitic protozoan Leishmania is a developmentally regulated protein encoded by the sw3 gene. Here we report that histone H1 variants exist in different Leishmania species and strains of L. major and that they are encoded by polymorphic genes. Amplification of the sw3 gene from the genome of three strains of L. major gave rise to different products in each strain, suggesting the presence of a multicopy gene family. In L. major, these genes were all restricted to a 50-kb Bg/II fragment found on a chromosomal band of 1.3 Mb (chromosome 27). The detection of RFLPs in this locus demonstrated its heterogeneity within several species and strains of Leishmania. Two different copies of sw3 (sw3.0 and sw3.1) were identified after screening a cosmid library containing L. major strain Friedlin genomic DNA. They were identical in their 5' UTRs and open reading frames, but differed in their 3' UTRs. With respect to the originally cloned copy of sw3 from L. major strain LV39, their open reading frames lacked a repeat unit of 9 amino acids. Immunoblots of L. guyanensis parasites transfected with these cosmids revealed that both copies could give rise to the histone H1 protein. The characterization of this locus will now make possible a detailed analysis of the function of histone H1 in Leishmania, as well as permit the dissection of the molecular mechanisms governing the developmental regulation of the sw3 gene.
Resumo:
The objective of this work was to identify expressed simple sequence repeats (SSR) markers associated to leaf miner resistance in coffee progenies. Identification of SSR markers was accomplished by directed searches on the Brazilian Coffee Expressed Sequence Tags (EST) database. Sequence analysis of 32 selected SSR loci showed that 65% repeats are of tetra-, 21% of tri- and 14% of dinucleotides. Also, expressed SSR are localized frequently in the 5'-UTR of gene transcript. Moreover, most of the genes containing SSR are associated with defense mechanisms. Polymorphisms were analyzed in progenies segregating for resistance to the leaf miner and corresponding to advanced generations of a Coffea arabica x Coffea racemosa hybrid. Frequency of SSR alleles was 2.1 per locus. However, no polymorphism associated with leaf miner resistance was identified. These results suggest that marker-assisted selection in coffee breeding should be performed on the initial cross, in which genetic variability is still significant.
Resumo:
OBJECTIVES: This study aimed at measuring the lipophilicity and ionization constants of diastereoisomeric dipeptides, interpreting them in terms of conformational behavior, and developing statistical models to predict them. METHODS: A series of 20 dipeptides of general structure NH(2) -L-X-(L or D)-His-OMe was designed and synthetized. Their experimental ionization constants (pK(1) , pK(2) and pK(3) ) and lipophilicity parameters (log P(N) and log D(7.4) ) were measured by potentiometry. Molecular modeling in three media (vacuum, water, and chloroform) was used to explore and sample their conformational space, and for each stored conformer to calculate their radius of gyration, virtual log P (preferably written as log P(MLP) , meaning obtained by the molecular lipophilicity potential (MLP) method) and polar surface area (PSA). Means and ranges were calculated for these properties, as was their sensitivity (i.e., the ratio between property range and number of rotatable bonds). RESULTS: Marked differences between diastereoisomers were seen in their experimental ionization constants and lipophilicity parameters. These differences are explained by molecular flexibility, configuration-dependent differences in intramolecular interactions, and accessibility of functional groups. Multiple linear equations correlated experimental lipophilicity parameters and ionization constants with PSA range and other calculated parameters. CONCLUSION: This study documents the differences in lipophilicity and ionization constants between diastereoisomeric dipeptides. Such configuration-dependent differences are shown to depend markedly on differences in conformational behavior and to be amenable to multiple linear regression. Chirality 24:566-576, 2012. © 2012 Wiley Periodicals, Inc.
Resumo:
Context : It is now clearly shown that genetic factors in association with environment play a key role in obesity and eating disorders. This project studies the clinical symptoms and molecular abnormalities in patients carrying a strong hereditary predisposition to obesity and eating behavior disorders. We have previously published the association between the 16:29.5-30.1 deletion and a very penetrant form of morbid obesity and macrocephaly. We have also demonstrated the association between the reciprocal 16:29.5-30.1 duplication and underweight and small head circumference. These 2 studies demonstrate that gene dosage of one or several genes in this region regulates BMI as well as brain growth. At present, there are no data pointing towards particular candidate genes. We are currently investigating a second non-overlapping recurrent CNV encompassing SH2B1, upstream of the aforementioned rearrangement. SNPs in this gene have been associated with BMI in GWAS studies and mice models confirmed this association. Bokuchova et al have reported an association between deletions encompassing this gene and severe early onset obesity, as well as insulin resistance. We are currently collecting and analyzing data to fully characterize the phenotype and the transcriptional patterns associated with this rearrangement. Aims : 1. Identify carriers of any CNVs in the greater 16p11.2 region (between 16:28MB and 32MB) in the EGG consortium. 2. Perform association studies between SNPs in the greater 16p11.2 region (16:28-32MB) and anthropometric measures with adjusted "locus-wide significance", to identify or prioritize candidate genes potentially driving the association observed in patients with the CNVs (and thus worthy of further validation and sequencing). 3. Explore associations between GSV genome-wide and brain volume. 4. Explore relationship between brain volumes (whole brain and regional for those who underwent brain MRI), head circumference and BMI. 5. Extrapolate this procedure to other regions covered by the Metabochip. Methods : - Examine and collect clinical informations, as well as molecular informations in these patients. - Analysis of MRI data in children and adults with BMI > 2SD. Compare changes to MRI data obtained in patients with monogenic forms of obesity (data from Lausanne study) and to underweight (BMI<-2SD) individuals from EGG. - Test whether opposite extremes of the phenotypic distribution may be highly informative Expected results : This is a highly focused study, pertaining to approximately 1 0/00 of the human genome. Yet it is clear that if successful, the lessons learned from this study could be extrapolated to other segments of the genome and would need validation and replication by additional studies. Altogether they will contribute to further explore the missing heritability and point to etiologic genes and pathways underlying these important health burdens.
Resumo:
Abstract We have analyzed purine (R) and pyrimidine (Y) codon patterns in variable and constant regions of HIV-1 gp120 in seven patients infected with different HIV-1 subtypes and naive to antiretroviral therapy. We have calculated the relative frequency of each in-frame codon RNY, YNR, RNR, and YNY (N=any nucleotide) in variable and constant regions of gp120, in the sequence within indels and at indels' flanking sites. Our data show that hypervariable regions V1, V2, V4, and V5 are characterized by the presence of long stretches of RNY codons constituting the majority of the sequence portion within insertions/deletions. In full-length gp120 and within inserted/deleted fragments the number of AVT (V=A, C, G) codons did not exceed 50% of the total RNY codons. RNY strings in variable regions spanned up to 21 codons and were always in frame. In contrast, RNY strings in constant regions were mostly out of frame and their length was limited to five codons. The frequency of the codon RNY was found to be significantly higher in variable regions (p<0.0001; t-test), within indels, and at indels' flanking sites (p<0.0001; χ(2) test). Analysis of the distribution of RNY strings equal to or longer than five codons in the full genome of HXB2 also shows that these sequences are mostly out of frame, unless they contain a potential N-glycosylation site or an asparagine. These data suggest that cryptic repeats of RNY may play a role in the genesis of multiple base insertions and deletions in hypervariable regions of gp120.
Resumo:
The "one-gene, one-protein" rule, coined by Beadle and Tatum, has been fundamental to molecular biology. The rule implies that the genetic complexity of an organism depends essentially on its gene number. The discovery, however, that alternative gene splicing and transcription are widespread phenomena dramatically altered our understanding of the genetic complexity of higher eukaryotic organisms; in these, a limited number of genes may potentially encode a much larger number of proteins. Here we investigate yet another phenomenon that may contribute to generate additional protein diversity. Indeed, by relying on both computational and experimental analysis, we estimate that at least 4%-5% of the tandem gene pairs in the human genome can be eventually transcribed into a single RNA sequence encoding a putative chimeric protein. While the functional significance of most of these chimeric transcripts remains to be determined, we provide strong evidence that this phenomenon does not correspond to mere technical artifacts and that it is a common mechanism with the potential of generating hundreds of additional proteins in the human genome.
Resumo:
The objectives of this work were to investigate the genetic variation in 79 soybean (Glycine max) accessions from different regions of the world, to cluster the accessions based on their similarity, and to test the correlation between the two types of markers used. Simple sequence repeat markers present in genomic (SSR) and in expressed regions (EST-SSR) were used. Thirty SSR primer-pairs were selected (20 genomic and 10 EST-SSR) based on their distribution on the 20 genetic linkage groups of soybean, on their trinucleotide repetition unit and on their polymorphism information content. All analyzed loci were polymorphic, and 259 alleles were found. The number of alleles per locus varied from 2-21, with an average of 8.63. The accessions exhibit a significant number of rare alleles, with genotypes 19, 35, 63 and 65 carrying the greater number of exclusive alleles. Accessions 75 and 79 were the most similar and accessions 31 and 35, and 40 and 78, were the most divergent ones. A low correlation between SSR and EST-SSR data was observed, thus genomic and expressed microsatellite markers are required for an appropriate analysis of genetic diversity in soybean. The genetic diversity observed was high and allowed the formation of five groups and several subgroups. A moderate relationship between genetic divergence and geographic origin of accessions was observed.
Resumo:
DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser.