8 resultados para annotation sémantique

em Helda - Digital Repository of University of Helsinki


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modal cohesion and subordination. The Finnish conditional and jussive moods in comparison to the French subjunctive This study examines verb moods in subordinate clauses in French and Finnish. The first part of the analysis deals with the syntax and semantics of the French subjunctive, mood occurring mostly in subordinate positions. The second part investigates Finnish verb moods. Although subordinate positions in Finnish grammar have no special finite verb form, certain uses of Finnish verb moods have been compared to those of subjunctives and conjunctives in other languages. The present study focuses on the subordinate uses of the Finnish conditional and jussive (i.e. the third person singular and plural of the imperative mood). The third part of the analysis discusses the functions of subordinate moods in contexts beyond complex sentences. The data used for the analysis include 1834 complex sentences gathered from newspapers, online discussion groups and blog texts, as well as audio-recorded interviews and conversations. The data thus consist of both written and oral texts as well as standard and non-standard variants. The analysis shows that the French subjunctive codes theoretical modality. The subjunctive does not determine the temporal and modal meaning of the event, but displays the event as virtual. In a complex sentence, the main clause determines the temporal and modal space within which the event coded by the subjunctive clause is interpreted. The subjunctive explicitly indicates that the space constructed in the main clause extends its scope over the subordinate clause. The subjunctive can therefore serve as a means for creating modal cohesion in the discourse. The Finnish conditional shares the function of making explicit the modal link between the components of a complex construction with the French subjunctive, but the two moods differ in their semantics. The conditional codes future time and can therefore occur only in non-factual or counterfactual contexts, whereas the event expressed by French subjunctive clauses can also be interpreted as realized. Such is the case when, for instance, generic and habitual meaning is involved. The Finnish jussive mood is used in a relatively limited number of subordinate clause types, but in these contexts its modal meaning is strikingly close to that of the French subjunctive. The permissive meaning, typical of the jussive in main clause positions, is modified in complex sentences so that it entails inter-clausal relation, namely concession. Like the French subjunctive, the jussive codes theoretical modal meaning with no implication of the truth value of the proposition. Finally, the analysis shows that verb moods mark modal cohesion, not only on the syntagmatic level (namely in complexe sentences), but also on the paradigmatic axis of discourse in order to create semantic links over entire segments of talk. In this study, the subjunctive thus appears, not as an empty category without function, as it is sometimes described, but as an open form that conveys the temporal and modal meanings emerging from the context.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, I look into a grammatical phenomenon found among speakers of the Cambridgeshire dialect of English. According to my hypothesis, the phenomenon is a new entry into the past BE verb paradigm in the English language. In my paper, I claim that the structure I have found complements the existing two verb forms, was and were, with a third verb form that I have labelled ‘intermediate past BE’. The paper is divided into two parts. In the first section, I introduce the theoretical ground for the study of variation, which is founded on empiricist principles. In variationist linguistics, the main claim is that heterogeneous language use is structured and ordered. In the last 50 years of history in modern linguistics, this claim is controversial. In the 1960s, the generativist movement spearheaded by Noam Chomsky diverted attention away from grammatical theories that are based on empirical observations. The generativists steered away from language diversity, variation and change in favour of generalisations, abstractions and universalist claims. The theoretical part of my paper goes through the main points of the variationist agenda and concludes that abandoning the concept of language variation in linguistics is harmful for both theory and methodology. In the method part of the paper, I present the Helsinki Archive of Regional English Speech (HARES) corpus. It is an audio archive that contains interviews conducted in England in the 1970s and 1980s. The interviews were done in accordance to methods used generally in traditional dialectology. The informants are mostly elderly male people who have lived in the same region throughout their lives and who have left school at an early age. The interviews are actually conversations: the interviewer allowed the informant to pick the topic of conversation to induce a maximally relaxed and comfortable atmosphere and thus allow the most natural dialect variant to emerge in the informant’s speech. In the paper, the corpus chapter introduces some of the transcription and annotation problems associated with spoken language corpora (especially those containing dialectal speech). Questions surrounding the concept of variation are present in this part of the paper too, as especially transcription work is troubled by the fundamental problem of having to describe the fluctuations of everyday speech in text. In the empirical section of the paper, I use HARES to analyse the speech of four informants, with special focus on the emergence of the intermediate past BE variant. My observations and the subsequent analysis permit me to claim that my hypothesis seems to hold. The intermediate variant occupies almost all contexts where one would expect was or were in the informants’ speech. This means that the new variant is integrated into the speakers’ grammars and exemplifies the kind of variation that is at the heart of this paper.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

DEVELOPING A TEXTILE ONTOLOGY FOR THE SEMANTIC WEB AND CONNECTING IT TO MUSEUM CATALOGING DATA The goal of the Semantic Web is to share concept-based information in a versatile way on the Internet. This is achievable using formal data structures called ontologies. The goal of this re-search is to increase the usability of museum cataloging data in information retrieval. The work is interdisciplinary, involving craft science, terminology science, computer science, and museology. In the first part of the dissertation an ontology of concepts of textiles, garments, and accessories is developed for museum cataloging work. The ontology work was done with the help of thesauri, vocabularies, research reports, and standards. The basis of the ontology development was the Museoalan asiasanasto MASA, a thesaurus for museum cataloging work which has been enriched by other vocabularies. Concepts and terms concerning the research object, as well as the material names of textiles, costumes, and accessories, were focused on. The research method was terminological concept analysis complemented by an ontological view of the Semantic Web. The concept structure was based on the hierarchical generic relation. Attention was also paid to other relations between terms and concepts, and between concepts themselves. Altogether 977 concept classes were created. Issues including how to choose and name concepts for the ontology hierarchy and how deep and broad the hierarchy could be are discussed from the viewpoint of the ontology developer and museum cataloger. The second part of the dissertation analyzes why some of the cataloged terms did not match with the developed textile ontology. This problem is significant because it prevents automatic ontological content integration of the cataloged data on the Semantic Web. The research datasets, i.e. the cataloged museum data on textile collections, came from three museums: Espoo City Museum, Lahti City Museum and The National Museum of Finland. The data included 1803 textile, costume, and accessory objects. Unmatched object and textile material names were analyzed. In the case of the object names six categories (475 cases), and of the material names eight categories (423 cases), were found where automatic annotation was not possible. The most common explanation was that the cataloged field was filled with a long sentence comprised of many terms. Sometimes in the compound term, the object name and material, or the name and the way of usage, were combined. As well, numeric values in the material name cataloging field prevented annotation and so did the absence of a corresponding concept in the ontology. Ready-made drop-down lists of materials used in one cataloging system facilitated the annotation. In the case of naming objects and materials, one should use terms in basic form without attributes. The developed textile ontology has been applied in two cultural portals, MuseumFinland and Culturesampo, where one can search for and browse information based on cataloged data using integrated ontologies in an interoperable way. The textile ontology is also part of the national FinnONTO ontology infrastructure. Keywords: annotation, concept, concept analysis, cataloging, museum collection, ontology, Semantic Web, textile collection, textile material

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Microarrays have a wide range of applications in the biomedical field. From the beginning, arrays have mostly been utilized in cancer research, including classification of tumors into different subgroups and identification of clinical associations. In the microarray format, a collection of small features, such as different oligonucleotides, is attached to a solid support. The advantage of microarray technology is the ability to simultaneously measure changes in the levels of multiple biomolecules. Because many diseases, including cancer, are complex, involving an interplay between various genes and environmental factors, the detection of only a single marker molecule is usually insufficient for determining disease status. Thus, a technique that simultaneously collects information on multiple molecules allows better insights into a complex disease. Since microarrays can be custom-manufactured or obtained from a number of commercial providers, understanding data quality and comparability between different platforms is important to enable the use of the technology to areas beyond basic research. When standardized, integrated array data could ultimately help to offer a complete profile of the disease, illuminating mechanisms and genes behind disorders as well as facilitating disease diagnostics. In the first part of this work, we aimed to elucidate the comparability of gene expression measurements from different oligonucleotide and cDNA microarray platforms. We compared three different gene expression microarrays; one was a commercial oligonucleotide microarray and the others commercial and custom-made cDNA microarrays. The filtered gene expression data from the commercial platforms correlated better across experiments (r=0.78-0.86) than the expression data between the custom-made and either of the two commercial platforms (r=0.62-0.76). Although the results from different platforms correlated reasonably well, combining and comparing the measurements were not straightforward. The clone errors on the custom-made array and annotation and technical differences between the platforms introduced variability in the data. In conclusion, the different gene expression microarray platforms provided results sufficiently concordant for the research setting, but the variability represents a challenge for developing diagnostic applications for the microarrays. In the second part of the work, we performed an integrated high-resolution microarray analysis of gene copy number and expression in 38 laryngeal and oral tongue squamous cell carcinoma cell lines and primary tumors. Our aim was to pinpoint genes for which expression was impacted by changes in copy number. The data revealed that especially amplifications had a clear impact on gene expression. Across the genome, 14-32% of genes in the highly amplified regions (copy number ratio >2.5) had associated overexpression. The impact of decreased copy number on gene underexpression was less clear. Using statistical analysis across the samples, we systematically identified hundreds of genes for which an increased copy number was associated with increased expression. For example, our data implied that FADD and PPFIA1 were frequently overexpressed at the 11q13 amplicon in HNSCC. The 11q13 amplicon, including known oncogenes such as CCND1 and CTTN, is well-characterized in different type of cancers, but the roles of FADD and PPFIA1 remain obscure. Taken together, the integrated microarray analysis revealed a number of known as well as novel target genes in altered regions in HNSCC. The identified genes provide a basis for functional validation and may eventually lead to the identification of novel candidates for targeted therapy in HNSCC.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Autoimmune diseases are more common in dogs than in humans and are already threatening the future of some highly predisposed dog breeds. Susceptibility to autoimmune diseases is controlled by environmental and genetic factors, especially the major histocompatibility complex (MHC) gene region. Dogs show a similar physiology, disease presentation and clinical response as humans, making them an excellent disease model for autoimmune diseases common to both species. The genetic background of canine autoimmune disorders is largely unknown, but recent annotation of the dog genome and subsequent development of new genomic tools offer a unique opportunity to map novel autoimmune genes in various breeds. Many autoimmune disorders show breed-specific enrichment, supporting a strong genetic background. Furthermore, the presence of hundreds of breeds as genetic isolates facilitates gene mapping in complex autoimmune disorders. Identification of novel predisposing genes establishes breeds as models and may reveal novel candidate genes for the corresponding human disorders. Genetic studies will eventually shed light on common biological functions and interactions between genes and the environment. This study aimed to identify genetic risk factors in various autoimmune disorders, including systemic lupus erythematosus (SLE)-related diseases, comprising immune-mediated rheumatic disease (IMRD) and steroid-responsive meningitis arteritis (SMRA) as well as Addison s disease (AD) in Nova Scotia Duck Tolling Retrievers (NSDTRs) and chronic superficial keratitis (CSK) in German Shepherd dogs (GSDs). We used two different approaches to identify genetic risk factors. Firstly, a candidate gene approach was applied to test the potential association of MHC class II, also known as a dog leukocyte antigen (DLA) in canine species. Secondly, a genome-wide association study (GWAS) was performed to identify novel risk loci for SLE-related disease and AD in NSDTRs. We identified DLA risk haplotypes for an IMRD subphenotype of SLE-related disease, AD and CSK, but not in SMRA, and show that the MHC class II gene region is a major genetic risk factor in canine autoimmune diseases. An elevated risk was found for IMRD in dogs that carried the DLA-DRB1*00601/DQA1*005011/DQB1*02001 haplotype (OR = 2.0, 99% CI = 1.03-3.95, p = 0.01) and for ANA-positive IMRD dogs (OR = 2.3, 99% CI = 1.07-5.04, p-value 0.007). We also found that DLA-DRB1*01502/DQA*00601/DQB1*02301 haplotype was significantly associated with AD in NSDTRs (OR = 2.1, CI = 1.0-4.4, P = 0.044) and the DLA-DRB1*01501/DQA1*00601/DQB1*00301 haplotype with the CSK in GSDs (OR=2.67, CI=1.17-6.44, p= 0.02). In addition, we found that homozygosity for the risk haplotype increases the risk for each disease phenotype and that an overall homozygosity for the DLA region predisposes to CSK and AD. Our results have enabled the development of genetic tests to improve breeding practices by avoiding the production of puppies homozygous for risk haplotypes. We also performed the first successful GWAS for a complex disease in dogs. With less than 100 cases and 100 controls, we identified five risk loci for SLE-related disease and AD and found strong candidate genes involved in a novel T-cell activation pathway. We show that an inbred dog population has fewer risk factors, but each of them has a stronger genetic risk. Ongoing studies aim to identify the causative mutations and bring new knowledge to help diagnostics, treatment and understanding of the aetiology of SLE-related diseases.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Gene expression is one of the most critical factors influencing the phenotype of a cell. As a result of several technological advances, measuring gene expression levels has become one of the most common molecular biological measurements to study the behaviour of cells. The scientific community has produced enormous and constantly increasing collection of gene expression data from various human cells both from healthy and pathological conditions. However, while each of these studies is informative and enlighting in its own context and research setup, diverging methods and terminologies make it very challenging to integrate existing gene expression data to a more comprehensive view of human transcriptome function. On the other hand, bioinformatic science advances only through data integration and synthesis. The aim of this study was to develop biological and mathematical methods to overcome these challenges and to construct an integrated database of human transcriptome as well as to demonstrate its usage. Methods developed in this study can be divided in two distinct parts. First, the biological and medical annotation of the existing gene expression measurements needed to be encoded by systematic vocabularies. There was no single existing biomedical ontology or vocabulary suitable for this purpose. Thus, new annotation terminology was developed as a part of this work. Second part was to develop mathematical methods correcting the noise and systematic differences/errors in the data caused by various array generations. Additionally, there was a need to develop suitable computational methods for sample collection and archiving, unique sample identification, database structures, data retrieval and visualization. Bioinformatic methods were developed to analyze gene expression levels and putative functional associations of human genes by using the integrated gene expression data. Also a method to interpret individual gene expression profiles across all the healthy and pathological tissues of the reference database was developed. As a result of this work 9783 human gene expression samples measured by Affymetrix microarrays were integrated to form a unique human transcriptome resource GeneSapiens. This makes it possible to analyse expression levels of 17330 genes across 175 types of healthy and pathological human tissues. Application of this resource to interpret individual gene expression measurements allowed identification of tissue of origin with 92.0% accuracy among 44 healthy tissue types. Systematic analysis of transcriptional activity levels of 459 kinase genes was performed across 44 healthy and 55 pathological tissue types and a genome wide analysis of kinase gene co-expression networks was done. This analysis revealed biologically and medically interesting data on putative kinase gene functions in health and disease. Finally, we developed a method for alignment of gene expression profiles (AGEP) to perform analysis for individual patient samples to pinpoint gene- and pathway-specific changes in the test sample in relation to the reference transcriptome database. We also showed how large-scale gene expression data resources can be used to quantitatively characterize changes in the transcriptomic program of differentiating stem cells. Taken together, these studies indicate the power of systematic bioinformatic analyses to infer biological and medical insights from existing published datasets as well as to facilitate the interpretation of new molecular profiling data from individual patients.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We outline the design and creation of a syntactically and morphologically annotated corpora of Finnish for use by the research community. We motivate a definitional, systematic “grammar definition corpus” as a first step in an three-year annotation effort to help create higher-quality, better-documented extensive parsebanks at a later stage. The syntactic representation, consisting of a dependency structure and a basic set of dependency functions, is outlined with examples. Reference is made to double-blind annotation experiments to measure the applicability of the newgrammar definition corpus methodology.