63 resultados para Automatic term extraction

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The methodology of extracting information from texts has widely been described in the current literature. However, the methodology has been developed mainly for the purposes of other fields than terminology science. In addition, the research has been English language oriented. Therefore, there are no satisfactory language-independent methods for extracting terminological information from texts. The aim of the present study is to form the basis for a further improvement of methods for extraction of terminological information. A further aim is to determine differences in term extraction between subject groups with or without knowledge of the special field in question. The study is based on the theory of terminology, and has mainly a qualitative approach. The research material consists of electronically readable specialized texts in the subject domain of maritime safety. Textbooks, conference papers, research reports and articles from professional journals in Finnish and in Russian are included. The thesis first deals with certain term extraction methods. These are manual term identification and semi-automatic term extraction, the latter of which was carried out by using three commercial computer programs. The results of term extraction were compared and the recall and precision of the methods were evaluated. The latter part of the study is dedicated to the identification of concept relations. Certain linguistic expressions, which some researchers call knowledge probes, were applied to identify concept relations. The results of the present thesis suggest that special field knowledge is an advantage in manual term identification. However, in the candidate term lists the variation between subject groups was not as remarkable as it was between individual subjects. The term extraction software tested here produces candidate term lists which can be useful, but only after some manual work. Therefore, the work emphasizes the need to further develop term extraction software. Furthermore, the analyses indicate that there are a certain number of terms which were extracted by all the subjects and the software. These terms we call core terms. As the result of the experiment on linguistic expressions which signal concept relations, a proposal of Finnish and Russian knowledge probes in the field of maritime safety was made. The main finding was that it would be useful to combine the use of knowledge probes with semi-automatic term extraction since knowledge probes usually occur in the vicinity of terms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The quantification and characterisation of soil phosphorus (P) is of agricultural and environmental importance and different extraction methods are widely used to asses the bioavailability of P and to characterize soil P reserves. However, the large variety of extractants, pre-treatments and sample preparation procedures complicate the comparison of published results. In order to improve our understanding of the behaviour and cycling of P in soil, it is crucial to know the scientific relevance of the methods used for various purposes. The knowledge of the factors affecting the analytical outcome is a prerequisite for justified interpretation of the results. The aim of this thesis was to study the effects of sample preparation procedures on soil P and to determine the dependence of the recovered P pool on the chemical nature of extractants. Sampling is a critical step in soil testing and sampling strategy is dependent on the land-use history and the purpose of sampling. This study revealed that pre-treatments changed soil properties and air-drying was found to affect soil P, particularly extractable organic P, by disrupting organic matter. This was evidenced by an increase in the water-extractable small-sized (<0.2 µm) P that, at least partly, took place at the expense of the large-sized (>0.2 µm) P. However, freezing induced only insignificant changes and thus, freezing can be taken to be a suitable method for storing soils from the boreal zone that naturally undergo periodic freezing. The results demonstrated that chemical nature of the extractant affects its sensitivity to detect changes in soil P solubility. Buffered extractants obscured the alterations in P solubility induced by pH changes; however, water extraction, though sensitive to physicochemical changes, can be used to reveal short term changes in soil P solubility. As for the organic P, the analysis was found to be sensitive to the sample preparation procedures: filtering may leave a large proportion of extractable organic P undetected, whereas the outcome of centrifugation was found to be affected by the ionic strength of the extractant. Widely used sequential fractionation procedures proved to be able to detect land-use -derived differences in the distribution of P among fractions of different solubilities. However, interpretation of the results from extraction experiments requires better understanding of the biogeochemical function of the recovered P fraction in the P cycle in differently managed soils under dissimilar climatic conditions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The project consisted of two long-term follow-up studies of preterm children addressing the question whether intrauterine growth restriction affects the outcome. Assessment at 5 years of age of 203 children with a birth weight less than 1000 g born in Finland in 1996-1997 showed that 9% of the children had cognitive impairment, 14% cerebral palsy, and 4% needed a hearing aid. The intelligence quotient was lower (p<0.05) than the reference value. Thus, 20% exhibited major, 19% minor disabilities, and 61% had no functional abnormalities. Being small for gestational age (SGA) was associated with sub-optimal growth later. In children born before 27 gestational weeks, the SGA had more neuropsychological disabilities than those appropriate for gestational age (AGA). In another cohort with birth weight less than 1500 g assessed at 5 years of age, echocardiography showed a thickened interventricular septum and a decreased left ventricular end-diastolic diameter in both SGA and AGA born children. They also had a higher systolic blood pressure than the reference. Laser-Doppler flowmetry showed different endothelium-dependent and -independent vasodilation responses in the AGA children compared to those of the controls. SGA was not associated with cardio-vascular abnormalities. Auditory event-related potentials (AERPs) were recorded using an oddball paradigm with frequency deviants (standard tone 500 Hz and deviant 750-Hz with 10% probability). At term, the P350 was smaller in SGA and AGA infants than in controls. At 12 months, the automatic change detection peak (mismatch negativity, MMN) was observed in the controls. However, the pre-term infants had a difference positivity that correlated with their neurodevelopment scores. At 5 years of age, the P1-deflection, which reflects primary auditory processing, was smaller, and the MMN larger in the preterm than in the control children. Even with a challenging paradigm or a distraction paradigm, P1 was smaller in the preterm than in the control children. The SGA and AGA children showed similar AERP responses. Prematurity is a major risk factor for abnormal brain development. Preterm children showed signs of cardiovascular abnormality suggesting that prematurity per se may carry a risk for later morbidity. The small positive amplitudes in AERPs suggest persisting altered auditory processing in the preterm in-fants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis discusses the use of sub- and supercritical fluids as the medium in extraction and chromatography. Super- and subcritical extraction was used to separate essential oils from herbal plant Angelica archangelica. The effect of extraction parameters was studied and sensory analyses of the extracts were done by an expert panel. The results of the sensory analyses were compared to the analytically determined contents of the extracts. Sub- and supercritical fluid chromatography (SFC) was used to separate and purify high-value pharmaceuticals. Chiral SFC was used to separate the enantiomers of racemic mixtures of pharmaceutical compounds. Very low (cryogenic) temperatures were applied to substantially enhance the separation efficiency of chiral SFC. The thermodynamic aspects affecting the resolving ability of chiral stationary phases are briefly reviewed. The process production rate which is a key factor in industrial chromatography was optimized by empirical multivariate methods. General linear model was used to optimize the separation of omega-3 fatty acid ethyl esters from esterized fish oil by using reversed-phase SFC. Chiral separation of racemic mixtures of guaifenesin and ferulic acid dimer ethyl ester was optimized by using response surface method with three variables per time. It was found that by optimizing four variables (temperature, load, flowate and modifier content) the production rate of the chiral resolution of racemic guaifenesin by cryogenic SFC could be increased severalfold compared to published results of similar application. A novel pressure-compensated design of industrial high pressure chromatographic column was introduced, using the technology developed in building the deep-sea submersibles (Mir 1 and 2). A demonstration SFC plant was built and the immunosuppressant drug cyclosporine A was purified to meet the requirements of US Pharmacopoeia. A smaller semi-pilot size column with similar design was used for cryogenic chiral separation of aromatase inhibitor Finrozole for use in its development phase 2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Colorectal cancer (CRC) is a major health concern and demands long-term efforts in developing strategies for screening and prevention. CRC has become a preventable disease as a consequence of a better understanding of colorectal carcinogenesis. However, current therapy is unsatisfactory and necessitates the exploration of other approaches for the prevention and treatment of cancer. Plant based products have been recognized as preventive with regard to the development of colon cancer. Therefore, the potential chemopreventive use and mechanism of action of Lebanese natural product were evaluated. Towards this aim the antitumor activity of Onopordum cynarocephalum and Centaurea ainetensis has been studied using in vitro and in vivo models. In vitro, both crude extracts were non cytotoxic to normal intestinal cells and inhibited the proliferation of colon cancer cells in a dose-dependent manner. In vivo, both crude extracts reduced the number of tumors by an average of 65% at weeks 20 (adenomas stage) and 30 (adenocarcinomas stage). The activity of the C. ainetensis extract was attributed to Salograviolide A, a guaianolide-type sesquiterpene lactone, which was isolated and identified through bio-guided fractionation. The mechanism of action of thymoquinone (TQ), the active component of Nigella sativa, was established in colon cancer cells using in vitro models. By the use of N-acetyl cysteine, a radical scavenger, the direct involvement of reactive oxygen species in TQ-induced apoptotic cells was established. The analytical detection of TQ from spiked serum and its protein binding were evaluated. The average recovery of TQ from spiked serum subjected to several extraction procedures was 2.5% proving the inability of conventional methods to analyze TQ from serum. This has been explained by the extensive binding (>98%) of TQ to serum and major serum components such as bovine serum albumin (BSA) and alpha-1-acid glycoprotein (AGP). Using mass spectrometry analysis, TQ was confirmed to bind covalently to the free cysteine in position 34 and 147 of the amino acid sequence of BSA and AGP, respectively. The results of this work put at the disposal for future development new plants with anti-cancer activities and enhance the understanding of the pharmaceutical properties of TQ, a prerequisite for its future clinical development.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

When experts construct mental images, they do not rely only on perceptual features; they also access domain-specific knowledge and skills in long-term memory, which enables them to exceed the capacity limitations of the short-term working memory system. The central question of the present dissertation was whether the facilitating effect of long-term memory knowledge on working memory imagery tasks is primarily based on perceptual chunking or whether it relies on higher-level conceptual knowledge. Three domains of expertise were studied: chess, music, and taxi driving. The effects of skill level, stimulus surface features, and the stimulus structure on incremental construction of mental images were investigated. A method was developed to capture the chunking mechanisms that experts use in constructing images: chess pieces, street names, and visual notes were presented in a piecemeal fashion for later recall. Over 150 experts and non-experts participated in a total of 13 experiments, as reported in five publications. The results showed skill effects in all of the studied domains when experts performed memory and problem solving tasks that required mental imagery. Furthermore, only experts' construction of mental images benefited from meaningful stimuli. Manipulation of the stimulus surface features, such as replacing chess pieces with dots, did not significantly affect experts' performance in the imagery tasks. In contrast, the structure of the stimuli had a significant effect on experts' performance in every task domain. For example, taxi drivers recalled more street names from lists that formed a spatially continuous route than from alphabetically organised lists. The results suggest that the mechanisms of conceptual chunking rather than automatic perceptual pattern matching underlie expert performance, even though the tasks of the present studies required perception-like mental representations. The results show that experts are able to construct skilled images that surpass working memory capacity, and that their images are conceptually organised and interpreted rather than merely depictive.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim was to analyse the growth and compositional development of the receptive and expressive lexicons between the ages 0,9 and 2;0 in the full-term (FT) and the very-low-birth-weight (VLBW) children who are acquiring Finnish. The associations between the expressive lexicon and grammar at 1;6 and 2;0 in the FT children were also studied. In addition, the language skills of the VLBW children at 2;0 were analysed, as well as the predictive value of early lexicon to the later language performance. Four groups took part in the studies: the longitudinal (N = 35) and cross-sectional (N = 146) samples of the FT children, and the longitudinal (N = 32) and cross-sectional (N = 66) samples of VLBW children. The data was gathered by applying of the structured parental rating method (the Finnish version of the Communicative Development Inventory), through analysis of the children´s spontaneous speech and by administering a a formal test (Reynell Developmental Language Scales). The FT children acquired their receptive lexicons earlier, at a faster rate and with larger individual variation than their expressive lexicons. The acquisition rate of the expressive lexicon increased from slow to faster in most children (91%). Highly parallel developmental paths for lexical semantic categories were detected in the receptive and expressive lexicons of the Finnish children when they were analysed in relation to the growth of the lexicon size, as described in the literature for children acquiring other languages. The emergence of grammar was closely associated with expressive lexical growth. The VLBW children acquired their receptive lexicons at a slower rate and had weaker language skills at 2;0 than the full-term children. The compositional development of both lexicons happened at a slower rate in the VLBW children when compared to the FT controls. However, when the compositional development was analysed in relation to the growth of lexicon size, this development occurred qualitatively in a nearly parallel manner in the VLBW children as in the FT children. Early receptive and expressive lexicon sizes were significantly associated with later language skills in both groups. The effect of the background variables (gender, length of the mother s basic education, birth weight) on the language development in the FT and the VLBW children differed. The results provide new information of early language acquisition by the Finnish FT and VLBW children. The results support the view that the early acquisition of the semantic lexical categories is related to lexicon growth. The current findings also propose that the early grammatical acquisition is closely related to the growth of expressive vocabulary size. The language development of the VLBW children should be followed in clinical work.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

DEVELOPING A TEXTILE ONTOLOGY FOR THE SEMANTIC WEB AND CONNECTING IT TO MUSEUM CATALOGING DATA The goal of the Semantic Web is to share concept-based information in a versatile way on the Internet. This is achievable using formal data structures called ontologies. The goal of this re-search is to increase the usability of museum cataloging data in information retrieval. The work is interdisciplinary, involving craft science, terminology science, computer science, and museology. In the first part of the dissertation an ontology of concepts of textiles, garments, and accessories is developed for museum cataloging work. The ontology work was done with the help of thesauri, vocabularies, research reports, and standards. The basis of the ontology development was the Museoalan asiasanasto MASA, a thesaurus for museum cataloging work which has been enriched by other vocabularies. Concepts and terms concerning the research object, as well as the material names of textiles, costumes, and accessories, were focused on. The research method was terminological concept analysis complemented by an ontological view of the Semantic Web. The concept structure was based on the hierarchical generic relation. Attention was also paid to other relations between terms and concepts, and between concepts themselves. Altogether 977 concept classes were created. Issues including how to choose and name concepts for the ontology hierarchy and how deep and broad the hierarchy could be are discussed from the viewpoint of the ontology developer and museum cataloger. The second part of the dissertation analyzes why some of the cataloged terms did not match with the developed textile ontology. This problem is significant because it prevents automatic ontological content integration of the cataloged data on the Semantic Web. The research datasets, i.e. the cataloged museum data on textile collections, came from three museums: Espoo City Museum, Lahti City Museum and The National Museum of Finland. The data included 1803 textile, costume, and accessory objects. Unmatched object and textile material names were analyzed. In the case of the object names six categories (475 cases), and of the material names eight categories (423 cases), were found where automatic annotation was not possible. The most common explanation was that the cataloged field was filled with a long sentence comprised of many terms. Sometimes in the compound term, the object name and material, or the name and the way of usage, were combined. As well, numeric values in the material name cataloging field prevented annotation and so did the absence of a corresponding concept in the ontology. Ready-made drop-down lists of materials used in one cataloging system facilitated the annotation. In the case of naming objects and materials, one should use terms in basic form without attributes. The developed textile ontology has been applied in two cultural portals, MuseumFinland and Culturesampo, where one can search for and browse information based on cataloged data using integrated ontologies in an interoperable way. The textile ontology is also part of the national FinnONTO ontology infrastructure. Keywords: annotation, concept, concept analysis, cataloging, museum collection, ontology, Semantic Web, textile collection, textile material

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this thesis was to study the basic relationships between thinning and fertilisation, tree growth rate and wood properties of Norway spruce (Picea abies (L.) Karst.) throughout a stand rotation. The material consisted of a total of 109 trees from both long-term thinning (Heinola, 61°10'N, 26°01'E; Punkaharju, 61°49'N, 29°19'E) and fertilisation-thinning experiments (Parikkala, 61°36'N, 29°22'E; Suonenjoki, 62°45'N, 27°00'E) in Finland. Wood properties, i.e., radial increment, wood density, latewood proportion, tracheid length, cell wall thickness and lumen diameter, as well as relative lignin content, were measured in detail from the pith to the bark, as well as from the stem base towards the stem apex. Intensive thinning and fertilisation treatments of Norway spruce stands increased (8% 64%) the radial increment of studied trees at breast height (1.3 m). At the same time, a faster growth rate slightly decreased average wood density (2% 7%), tracheid length (0% 9%) and cell wall thickness (1% 17%). The faster growth resulted in only small changes (0% 9%) in lumen diameter and relative lignin content (1% 2%; lignin content was 25.4% 26%). However, the random variation in wood properties was large both between and within trees and annual rings. The results of this thesis indicate that the prevailing thinning and fertilisation treatments of Norway spruce stands in Fennoscandia may significantly enhance the radial increment of individual trees, and cause only small or no detrimental changes in wood and tracheid properties.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Scots pine (Pinus sylvestris L.) and Norway spruce (Picea abies (L.) Karst.) forests dominate in Finnish Lapland. The need to study the effect of both soil factors and site preparation on the performance of planted Scots pine has increased due to the problems encountered in reforestation, especially on mesic and moist, formerly spruce-dominated sites. The present thesis examines soil hydrological properties and conditions, and effect of site preparation on them on 10 pine- and 10 spruce-dominated upland forest sites. Finally, the effects of both the site preparation and reforestation methods, and soil hydrology on the long-term performance of planted Scots pine are summarized. The results showed that pine and spruce sites differ significantly in their soil physical properties. Under field capacity or wetter soil moisture conditions, planted pines presumably suffer from excessive soil water and poor soil aeration on most of the originally spruce sites, but not on the pine sites. The results also suggested that site preparation affects the soil-water regime and thus prerequisites for forest growth over two decades after site preparation. High variation in the survival and mean height of planted pine was found. The study suggested that on spruce sites, pine survival is the lowest on sites that dry out slowly after rainfall events, and that height growth is the fastest on soils that reach favourable aeration conditions for root growth soon after saturation, and/or where the average air-filled porosity near field capacity is large enough for good root growth. Survival, but not mean height can be enhanced by employing intensive site preparation methods on spruce sites. On coarser-textured pine sites, site preparation methods don t affect survival, but methods affecting soil fertility, such as prescribed burning and ploughing, seem to enhance the height growth of planted Scots pines over several decades. The use of soil water content in situ as the sole criterion for sites suitable for pine reforestation was tested and found to be a relatively uncertain parameter. The thesis identified new potential soil variables, which should be tested using other data in the future.