58 resultados para Multiple classification
Resumo:
Background: Lynch syndrome (LS) is an autosomal dominant inherited cancer syndrome characterized by early onset cancers of the colorectum, endometrium and other tumours. A significant proportion of DNA variants in LS patients are unclassified. Reports on the pathogenicity of the c.1852_1853AA>GC (p.Lys618Ala) variant of the MLH1 gene are conflicting. In this study, we provide new evidence indicating that this variant has no significant implications for LS.Methods: The following approach was used to assess the clinical significance of the p.Lys618Ala variant: frequency in a control population, case-control comparison, co-occurrence of the p.Lys618Ala variant with a pathogenic mutation, co-segregation with the disease and microsatellite instability in tumours from carriers of the variant. We genotyped p.Lys618Ala in 1034 individuals (373 sporadic colorectal cancer [CRC] patients, 250 index subjects from families suspected of having LS [revised Bethesda guidelines] and 411 controls). Three well-characterized LS families that fulfilled the Amsterdam II Criteria and consisted of members with the p.Lys618Ala variant were included to assess co-occurrence and co-segregation. A subset of colorectal tumour DNA samples from 17 patients carrying the p.Lys618Ala variant was screened for microsatellite instability using five mononucleotide markers.Results: Twenty-seven individuals were heterozygous for the p.Lys618Ala variant; nine had sporadic CRC (2.41%), seven were suspected of having hereditary CRC (2.8%) and 11 were controls (2.68%). There were no significant associations in the case-control and case-case studies. The p.Lys618Ala variant was co-existent with pathogenic mutations in two unrelated LS families. In one family, the allele distribution of the pathogenic and unclassified variant was in trans, in the other family the pathogenic variant was detected in the MSH6 gene and only the deleterious variant co-segregated with the disease in both families. Only two positive cases of microsatellite instability (2/17, 11.8%) were detected in tumours from p.Lys618Ala carriers, indicating that this variant does not play a role in functional inactivation of MLH1 in CRC patients.Conclusions: The p.Lys618Ala variant should be considered a neutral variant for LS. These findings have implications for the clinical management of CRC probands and their relatives.
Resumo:
Background: Prolificacy is the most important trait influencing the reproductive efficiency of pig production systems. The low heritability and sex-limited expression of prolificacy have hindered to some extent the improvement of this trait through artificial selection. Moreover, the relative contributions of additive, dominant and epistatic QTL to the genetic variance of pig prolificacy remain to be defined. In this work, we have undertaken this issue by performing one-dimensional and bi-dimensional genome scans for number of piglets born alive (NBA) and total number of piglets born (TNB) in a three generation Iberian by Meishan F2 intercross. Results: The one-dimensional genome scan for NBA and TNB revealed the existence of two genome-wide highly significant QTL located on SSC13 (P < 0.001) and SSC17 (P < 0.01) with effects on both traits. This relative paucity of significant results contrasted very strongly with the wide array of highly significant epistatic QTL that emerged in the bi-dimensional genome-wide scan analysis. As much as 18 epistatic QTL were found for NBA (four at P < 0.01 and five at P < 0.05) and TNB (three at P < 0.01 and six at P < 0.05), respectively. These epistatic QTL were distributed in multiple genomic regions, which covered 13 of the 18 pig autosomes, and they had small individual effects that ranged between 3 to 4% of the phenotypic variance. Different patterns of interactions (a × a, a × d, d × a and d × d) were found amongst the epistatic QTL pairs identified in the current work.Conclusions: The complex inheritance of prolificacy traits in pigs has been evidenced by identifying multiple additive (SSC13 and SSC17), dominant and epistatic QTL in an Iberian × Meishan F2 intercross. Our results demonstrate that a significant fraction of the phenotypic variance of swine prolificacy traits can be attributed to first-order gene-by-gene interactions emphasizing that the phenotypic effects of alleles might be strongly modulated by the genetic background where they segregate.
Resumo:
For the ∼1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of “unannotated transcription.” We use a number of disparate features to classify the 6988 novel TARs—array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that ∼14% of the novel TARs can be associated with known genes, while ∼21% can be clustered into ∼200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.
Resumo:
Genomic plasticity of human chromosome 8p23.1 region is highly influenced by two groups of complex segmental duplications (SDs), termed REPD and REPP, that mediate different kinds of rearrangements. Part of the difficulty to explain the wide range of phenotypes associated with 8p23.1 rearrangements is that REPP and REPD are not yet well characterized, probably due to their polymorphic status. Here, we describe a novel primate-specific gene family, named FAM90A (family with sequence similarity 90), found within these SDs. According to the current human reference sequence assembly, the FAM90A family includes 24 members along 8p23.1 region plus a single member on chromosome 12p13.31, showing copy number variation (CNV) between individuals. These genes can be classified into subfamilies I and II, which differ in their upstream and 5′-untranslated region sequences, but both share the same open reading frame and are ubiquitously expressed. Sequence analysis and comparative fluorescence in situ hybridization studies showed that FAM90A subfamily II suffered a big expansion in the hominoid lineage, whereas subfamily I members were likely generated sometime around the divergence of orangutan and African great apes by a fusion process. In addition, the analysis of the Ka/Ks ratios provides evidence of functional constraint of some FAM90A genes in all species. The characterization of the FAM90A gene family contributes to a better understanding of the structural polymorphism of the human 8p23.1 region and constitutes a good example of how SDs, CNVs and rearrangements within themselves can promote the formation of new gene sequences with potential functional consequences.
Resumo:
Background: The analysis of the promoter sequence of genes with similar expression patterns isa basic tool to annotate common regulatory elements. Multiple sequence alignments are on thebasis of most comparative approaches. The characterization of regulatory regions from coexpressedgenes at the sequence level, however, does not yield satisfactory results in manyoccasions as promoter regions of genes sharing similar expression programs often do not shownucleotide sequence conservation.Results: In a recent approach to circumvent this limitation, we proposed to align the maps ofpredicted transcription factors (referred as TF-maps) instead of the nucleotide sequence of tworelated promoters, taking into account the label of the corresponding factor and the position in theprimary sequence. We have now extended the basic algorithm to permit multiple promotercomparisons using the progressive alignment paradigm. In addition, non-collinear conservationblocks might now be identified in the resulting alignments. We have optimized the parameters ofthe algorithm in a small, but well-characterized collection of human-mouse-chicken-zebrafishorthologous gene promoters.Conclusion: Results in this dataset indicate that TF-map alignments are able to detect high-levelregulatory conservation at the promoter and the 3'UTR gene regions, which cannot be detectedby the typical sequence alignments. Three particular examples are introduced here to illustrate thepower of the multiple TF-map alignments to characterize conserved regulatory elements inabsence of sequence similarity. We consider this kind of approach can be extremely useful in thefuture to annotate potential transcription factor binding sites on sets of co-regulated genes fromhigh-throughput expression experiments.
Resumo:
Automatic classification of makams from symbolic data is a rarely studied topic. In this paper, first a review of an n-gram based approach is presented using various representations of the symbolic data. While a high degree of precision can be obtained, confusion happens mainly for makams using (almost) the same scale and pitch hierarchy but differ in overall melodic progression, seyir. To further improve the system, first n-gram based classification is tested for various sections of the piece to take into account a feature of the seyir that melodic progression starts in a certain region of the scale. In a second test, a hierarchical classification structure is designed which uses n-grams and seyir features in different levels to further improve the system.
Resumo:
This paper presents several algorithms for joint estimation of the target number and state in a time-varying scenario. Building on the results presented in [1], which considers estimation of the target number only, we assume that not only the target number, but also their state evolution must be estimated. In this context, we extend to this new scenario the Rao-Blackwellization procedure of [1] to compute Bayes recursions, thus defining reduced-complexity solutions for the multi-target set estimator. A performance assessmentis finally given both in terms of Circular Position Error Probability - aimed at evaluating the accuracy of the estimated track - and in terms of Cardinality Error Probability, aimed at evaluating the reliability of the target number estimates.
Resumo:
Subjective language detection is one of the most important challenges in Sentiment Analysis. Because of the weight and frequency in opinionated texts, adjectives are considered a key piece in the opinion extraction process. These subjective units are more and more frequently collected in polarity lexicons in which they appear annotated with their prior polarity. However, at the moment, any polarity lexicon takes into account prior polarity variations across domains. This paper proves that a majority of adjectives change their prior polarity value depending on the domain. We propose a distinction between domain dependent and romain independent adjectives. Moreover, our analysis led us to propose a further classification related to subjectivity degree: constant, mixed and highly subjective adjectives. Following this classification, polarity values will be a better support for Sentiment Analysis.
Resumo:
The work we present here addresses cue-based noun classification in English and Spanish. Its main objective is to automatically acquire lexical semantic information by classifying nouns into previously known noun lexical classes. This is achieved by using particular aspects of linguistic contexts as cues that identify a specific lexical class. Here we concentrate on the task of identifying such cues and the theoretical background that allows for an assessment of the complexity of the task. The results show that, despite of the a-priori complexity of the task, cue-based classification is a useful tool in the automatic acquisition of lexical semantic classes.
Resumo:
The ways in which the dominant cultural majority frames the educationalsystem determine perceptions of its own identity and understandings ofthe ‘other.’ In this article I take a political approach, by examining themanagement of cultural diversity within Spanish education policies, treating“education as the mirror of society”. This article analyzes Spanish challengesand policies approaches towards the management of immigrationrelated diversity in education. The main finding is that there is not one approach,but several, due to both the decentralized character of the educationsystem and the multiplicity of diversity that is at stake (i.e. language,religion, culture etc.)
Resumo:
When continuous data are coded to categorical variables, two types of coding are possible: crisp coding in the form of indicator, or dummy, variables with values either 0 or 1; or fuzzy coding where each observation is transformed to a set of "degrees of membership" between 0 and 1, using co-called membership functions. It is well known that the correspondence analysis of crisp coded data, namely multiple correspondence analysis, yields principal inertias (eigenvalues) that considerably underestimate the quality of the solution in a low-dimensional space. Since the crisp data only code the categories to which each individual case belongs, an alternative measure of fit is simply to count how well these categories are predicted by the solution. Another approach is to consider multiple correspondence analysis equivalently as the analysis of the Burt matrix (i.e., the matrix of all two-way cross-tabulations of the categorical variables), and then perform a joint correspondence analysis to fit just the off-diagonal tables of the Burt matrix - the measure of fit is then computed as the quality of explaining these tables only. The correspondence analysis of fuzzy coded data, called "fuzzy multiple correspondence analysis", suffers from the same problem, albeit attenuated. Again, one can count how many correct predictions are made of the categories which have highest degree of membership. But here one can also defuzzify the results of the analysis to obtain estimated values of the original data, and then calculate a measure of fit in the familiar percentage form, thanks to the resultant orthogonal decomposition of variance. Furthermore, if one thinks of fuzzy multiple correspondence analysis as explaining the two-way associations between variables, a fuzzy Burt matrix can be computed and the same strategy as in the crisp case can be applied to analyse the off-diagonal part of this matrix. In this paper these alternative measures of fit are defined and applied to a data set of continuous meteorological variables, which are coded crisply and fuzzily into three categories. Measuring the fit is further discussed when the data set consists of a mixture of discrete and continuous variables.
Resumo:
It is common in econometric applications that several hypothesis tests arecarried out at the same time. The problem then becomes how to decide whichhypotheses to reject, accounting for the multitude of tests. In this paper,we suggest a stepwise multiple testing procedure which asymptoticallycontrols the familywise error rate at a desired level. Compared to relatedsingle-step methods, our procedure is more powerful in the sense that itoften will reject more false hypotheses. In addition, we advocate the useof studentization when it is feasible. Unlike some stepwise methods, ourmethod implicitly captures the joint dependence structure of the teststatistics, which results in increased ability to detect alternativehypotheses. We prove our method asymptotically controls the familywise errorrate under minimal assumptions. We present our methodology in the context ofcomparing several strategies to a common benchmark and deciding whichstrategies actually beat the benchmark. However, our ideas can easily beextended and/or modied to other contexts, such as making inference for theindividual regression coecients in a multiple regression framework. Somesimulation studies show the improvements of our methods over previous proposals. We also provide an application to a set of real data.
Resumo:
The first generation models of currency crises have often been criticized because they predict that, in the absence of very large triggering shocks, currency attacks should be predictable and lead to small devaluations. This paper shows that these features of first generation models are not robust to the inclusion of private information. In particular, this paper analyzes a generalization of the Krugman-Flood-Garber (KFG) model, which relaxes the assumption that all consumers are perfectly informed about the level of fundamentals. In this environment, the KFG equilibrium of zero devaluation is only one of many possible equilibria. In all the other equilibria, the lack of perfect information delays the attack on the currency past the point at which the shadow exchange rate equals the peg, giving rise to unpredictable and discrete devaluations.
Resumo:
We examine the effect of unilateral and mutual partner selection in the context of prisoner's dilemmas experimentally. Subjects play simultaneously several finitely repeated two-person prisoner's dilemma games. We find that unilateral choice is the best system. It leads to low defection and fewer singles than with mutual choice. Furthermore, with the unilateral choice setup we are able to show that intendingdefectors are more likely to try to avoid a match than intending cooperators. We compare our results of multiple games with single game PD-experiments and find no difference in aggregate behavior. Hence the multiple game technique is robust and might therefore be an important tool in the future for testing the use of mixed strategies.
Resumo:
Consider the problem of testing k hypotheses simultaneously. In this paper,we discuss finite and large sample theory of stepdown methods that providecontrol of the familywise error rate (FWE). In order to improve upon theBonferroni method or Holm's (1979) stepdown method, Westfall and Young(1993) make eective use of resampling to construct stepdown methods thatimplicitly estimate the dependence structure of the test statistics. However,their methods depend on an assumption called subset pivotality. The goalof this paper is to construct general stepdown methods that do not requiresuch an assumption. In order to accomplish this, we take a close look atwhat makes stepdown procedures work, and a key component is a monotonicityrequirement of critical values. By imposing such monotonicity on estimatedcritical values (which is not an assumption on the model but an assumptionon the method), it is demonstrated that the problem of constructing a validmultiple test procedure which controls the FWE can be reduced to the problemof contructing a single test which controls the usual probability of a Type 1error. This reduction allows us to draw upon an enormous resamplingliterature as a general means of test contruction.