12 resultados para latent semantic analysis

em Helda - Digital Repository of University of Helsinki


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Title of the Master's thesis: Análisis de la preposición hacia y establecimiento de sus equivalentes en finés (trans. Analysis of the Spanish preposition hacia and the finding of its equivalents in Finnish) Abstracts: The aim of this Master thesis is to provide a detailed analysis of the Spanish preposition hacia from a cognitive perspective and to establish its equivalents in Finnish language. In this sense, my purpose is to demonstrate the suitability of both cognitive perspectives and Contrastive Linguistics for semantic analysis. This thesis is divided into five chapters. The first chapter includes a presentation and a critical review of the monolingual lexical processing and semantic analysis of the Spanish preposition hacia in major reference works. Through this chapter it is possible to see both the inadequacies and omissions that are present in all the given definitions. In this sense, this chapter shows that these problems are not but the upper stage of an ontological (and therefore methodological) problem in the treatment of prepositions. The second chapter covers the presentation of the methodological and theoretical perspective adopted for this thesis for the monolingual analysis and definition of the Spanish preposition hacia, following mainly the guidelines established by G. Lakoff (1987) and R. Langacker (2008) in his Cognitive grammar. Taken together, and within the same paradigm, recent analytical and methodological contributions are discussed critically for the treatment of polysemy in language (cf. Tyler ja Evans 2003). In the third chapter, and in accordance with the requirements regarding the use of empirical data from corpora, is my aim to set out a monolingual original analysis of the Spanish preposition hacia in observance of the principles and the methodology spelled out in the second chapter. The main objective of this chapter is to build a full fledged semantic representation of the polysemy of this preposition in order to understand and articulate its meanings with Finnish language (and other possible languages). The fourth chapter, in accordance with the results of chapter 3, examines and describes and establishes the corresponding equivalents in Finnish for this preposition. The results obtained in this chapter are also contrasted with the current bilingual lexicographical definitions found in the most important dictionaries and grammars. Finally, in the fifth chapter of this thesis, the results of this work are discussed critically. In this way, some observations are given regarding both the ontological and theoretical assumptions as well regarding the methodological perspective adopted. I also present some notes for the construction of a general methodology for the semantic analysis of Spanish prepositions to be carried out in further investigations. El objetivo de este trabajo, que caracterizamos como una tarea de carácter comparativo-analítico, es brindar un análisis detallado de la preposición castellana hacia desde una perspectiva cognitiva en tanto y a través del establecimiento de sus equivalentes en finés. Se procura, de esta forma, demostrar la adecuación de una perspectiva cognitiva tanto para el examen como para el establecimiento y articulación de la serie de equivalentes que una partícula, en nuestro caso una preposición, encuentra en otra lengua. De esta forma, y frente a definiciones canónicas que advierten sobre la imposibilidad de una caracterización acabada del conjunto de usos de una preposición, se observa como posible, a través de la aplicación de una metodología teórica-analítica adecuada, la construcción de una definición viable tanto en un nivel jerárquico como descriptivo. La presente tesis se encuentra dividida en cinco capítulos. El primer capítulo comprende una exposición y revisión critica del tratamiento monolingüe lexicográfico y analítico que la preposición hacia ha recibido en las principales obras de referencia, donde se observa que las inadecuaciones y omisiones presentes en la totalidad de las definiciones analizadas representan tan sólo el estadio superior de una problemática de carácter ontológico y, por tanto, metodológico, en el tratamiento de las preposiciones. El capítulo segundo comprende la presentación de la perspectiva teórica metodológica adoptada en esta tesis para el análisis y definición monolingüe de la preposición hacia, teniendo por líneas directrices las propuestas realizadas por G. Lakoff , así como a los fundamentos establecidos por R. Langacker en su propuesta cognitiva para una nueva gramática. En forma conjunta y complementaria, y dentro del mismo paradigma, empleamos, discutimos críticamente y desarrollamos diferentes aportes analítico-metodológicos para el tratamiento de la polisemia en unidades lingüísticas locativas. En el capítulo tercero, y en acuerdo con las exigencias respecto a la utilización de datos empíricos obtenidos a partir de corpus textuales, se expone un análisis original monolingüe de la preposición hacia en observancia de los principios y la metodología explicitada en el capítulo segundo, teniendo por principal objetivo la construcción de una representación semántica de la polisemia de la preposición que comprenda y articule los sentidos prototípicos para ésta especificados. El capítulo cuarto, y en acuerdo con los resultados de nuestro análisis monolingual de la preposición, se examinan, describen y establecen los equivalentes correspondientes en finés para hacia; asimismo, se contrastan en este capítulo los resultados obtenidos con las definiciones lexicográficas bilingües vigentes. Se recogen en el último y quinto capítulo de esta tesis algunas observaciones tanto respecto a los postulados ontológicos y teórico-metodológicos de la perspectiva adoptada, así como algunas notas para la construcción de una metodología general para el análisis semántico preposicional.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aims of this dissertation were 1) to investigate associations of weight status of adolescents with leisure activities, and computer and cell phone use, and 2) to investigate environmental and genetic influences on body mass index (BMI) during adolescence. Finnish twins born in 1983–1987 responded to postal questionnaires at the ages of 11-12 (5184 participants), 14 (4643 participants), and 17 years (4168 participants). Information was obtained on weight and height, leisure activities including television viewing, video viewing, computer games, listening to music, board games, musical instrument playing, reading, arts, crafts, socializing, clubs, sports, and outdoor activities, as well as computer and cell phone use. Activity patterns were studied using latent class analysis. The relationship between leisure activities and weight status was investigated using logistic and linear regression. Genetic and environmental effects on BMI were studied using twin modeling. Of individual leisure activities, sports were associated with decreased overweight risk among boys in both cross-sectional and longitudinal analyses, but among girls only cross-sectionally. Many sedentary leisure activities, such as video viewing (boys/girls), arts (boys), listening to music (boys), crafts (girls), and board games (girls), had positive associations with being overweight. Computer use was associated with a higher prevalence of overweight in cross-sectional analyses. However, musical instrument playing, commonly considered as a sedentary activity, was associated with a decreased overweight risk among boys. Four patterns of leisure activities were found: ‘Active and sociable’, ‘Active but less sociable’, ‘Passive but sociable’, and ‘Passive and solitary’. The prevalence of overweight was generally highest among the ‘Passive and solitary’ adolescents. Overall, leisure activity patterns did not predict overweight risk later in adolescence. An exception were 14-year-old ‘Passive and solitary’ girls who had the greatest risk of becoming overweight by 17 years of age. Heritability of BMI was high (0.58-0.83). Common environmental factors shared by family-members affected the BMI at 11-12 and 14 years but their effect had disappeared by 17 years of age. Additive genetic factors explained 90-96% of the BMI stability across adolescence. Genetic correlations across adolescence were high, which suggests similar genetic effects on BMI throughout adolescence, while unique environmental effects on BMI appeared to vary. These findings suggest that family-based interventions hold promise for obesity prevention into early and middle adolescence, but that later in adolescence obesity prevention should focus on individuals. A useful target could be adolescents' leisure time, and our findings highlight the importance of versatility in leisure activities.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation is an onomastic study of variation in women s name phrases in official documents in Finland during the period 1780−1930. The aim is to discuss from a socio-onomastic perspective both the changeover from patronymics to inherited family names and the use of surnames after marriage (i.e. whether women adopted their husbands family names or retained their maiden names), before new laws in this area entered into force in Finland in the early 20th century. In 1920, a law on family names that required fixed names put an end to the use of the patronymic as a person s only surname. After 1929, it was no longer possible for a married woman to retain her maiden name. Methodologically, to explain this development from a socio-onomastic perspective, I have based my study on a syntactic-semantic analysis of the actual name phrases. To be able to demonstrate the extensive material, I have elaborated a scheme to divide the 115 different types of name phrases into 13 main categories. The analysis of the material for Helsinki is based on frequency calculations of the different types of name phrases every thirtieth year, as well as on describing variation in the structure and semantic content of the name phrases, e.g. social variation in the use of titles and epithets. In addition to this, by applying a biographic-genealogical method, I have conducted two case studies of the usage of women s name phrases in the two chosen families. The study is based on parish registers from the period 1780−1929, estate inventory documents from the period 1780−1928, registration forms for liberty of trade from the period 1880−1908, family announcements on newspapers from the period 1829−1888, gravestones from the period 1796−1929 and diaries from the periods 1799−1801 and 1818−1820 providing a corpus of 5 950 name phrases. The syntactic-semantic analysis has revealed the overall picture of various ways of denoting women in official documents. In Helsinki, towards the end of the 19th century, the use of inherited family names seems to be almost fully developed in official contexts. At the late 19th century, a patronymic still appears as the only surname of some working-class women whereas in the early 20th century patronymics were only entered in the parish register as a kind of middle name. In the beginning of the 19th century, most married women were still registered under their maiden names, with a few exceptions among the bourgeoisie and upper class. The comparative analysis of name phrases in diaries, however, indicates that the use of the husband s family name by married women was a much earlier phenomenon in private contexts than in official documents. Keywords: socio-onomastics, syntactic-semantic analysis, name phrase, patronymic, maiden name, husband s family name

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Alzheimer's disease (AD) is characterized by an impairment of the semantic memory responsible for processing meaning-related knowledge. This study was aimed at examining how Finnish-speaking healthy elderly subjects (n = 30) and mildly (n=20) and moderately (n = 20) demented AD patients utilize semantic knowledge to performa semantic fluency task, a method of studying semantic memory. In this task subjects are typically given 60 seconds to generate words belonging to the semantic category of animals. Successful task performance requires fast retrieval of subcategory exemplars in clusters (e.g., farm animals: 'cow', 'horse', 'sheep') and switching between subcategories (e.g., pets, water animals, birds, rodents). In this study, thescope of the task was extended to cover various noun and verb categories. The results indicated that, compared with normal controls, both mildly and moderately demented AD patients showed reduced word production, limited clustering and switching, narrowed semantic space, and an increase in errors, particularly perseverations. However, the size of the clusters, the proportion of clustered words, and the frequency and prototypicality of words remained relatively similar across the subject groups. Although the moderately demented patients showed a poor eroverall performance than the mildly demented patients in the individual categories, the error analysis appeared unaffected by the severity of AD. The results indicate a semantically rather coherent performance but less specific, effective, and flexible functioning of the semantic memory in mild and moderate AD patients. The findings are discussed in relation to recent theories of word production and semantic representation. Keywords: semantic fluency, clustering, switching, semantic category, nouns, verbs, Alzheimer's disease

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In genetic epidemiology, population-based disease registries are commonly used to collect genotype or other risk factor information concerning affected subjects and their relatives. This work presents two new approaches for the statistical inference of ascertained data: a conditional and full likelihood approaches for the disease with variable age at onset phenotype using familial data obtained from population-based registry of incident cases. The aim is to obtain statistically reliable estimates of the general population parameters. The statistical analysis of familial data with variable age at onset becomes more complicated when some of the study subjects are non-susceptible, that is to say these subjects never get the disease. A statistical model for a variable age at onset with long-term survivors is proposed for studies of familial aggregation, using latent variable approach, as well as for prospective studies of genetic association studies with candidate genes. In addition, we explore the possibility of a genetic explanation of the observed increase in the incidence of Type 1 diabetes (T1D) in Finland in recent decades and the hypothesis of non-Mendelian transmission of T1D associated genes. Both classical and Bayesian statistical inference were used in the modelling and estimation. Despite the fact that this work contains five studies with different statistical models, they all concern data obtained from nationwide registries of T1D and genetics of T1D. In the analyses of T1D data, non-Mendelian transmission of T1D susceptibility alleles was not observed. In addition, non-Mendelian transmission of T1D susceptibility genes did not make a plausible explanation for the increase in T1D incidence in Finland. Instead, the Human Leucocyte Antigen associations with T1D were confirmed in the population-based analysis, which combines T1D registry information, reference sample of healthy subjects and birth cohort information of the Finnish population. Finally, a substantial familial variation in the susceptibility of T1D nephropathy was observed. The presented studies show the benefits of sophisticated statistical modelling to explore risk factors for complex diseases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Extracellular matrix (ECM) is a complex network of various proteins and proteoglycans which provides tissues with structural strength and resilience. By harvesting signaling molecules like growth factors ECM has the capacity to control cellular functions including proliferation, differentiation and cell survival. Latent transforming growth factor β (TGF-β) binding proteins (LTBPs) associate fibrillar structures of the ECM and mediate the efficient secretion and ECM deposition of latent TGF-β. The current work was conducted to determine the regulatory regions of LTBP-3 and -4 genes to gain insight into their tissue-specific expression which also has impact on TGF-β biology. Furthermore, the current research aimed at defining the ECM targeting of the N-terminal variants of LTBP-4 (LTBP-4S and -4L), which is required to understand their functions in tissues and to gain insight into conditions in which TGF-β is activated. To characterize the regulatory regions of LTBP-3 and -4 genes in silico and functional promoter analysis techniques were employed. It was found that the expression of LTBP-4S and -4L are under control of two independent promoters. This finding was in accordance with the observed expression patterns of LTBP-4S and -4L in human tissues. All promoter regions characterized in this study were TATAless, GC-rich and highly conserved between human and mouse species. Putative binding sites for Sp1 and GATA family of transcription factors were recognized in all of these regulatory regions. It is possible that these transcription factors control the basal expression of LTBP-3 and -4 genes. Smad binding element was found within the LTBP-3 and -4S promoter regions, but it was not present in LTBP-4L promoter. Although this element important for TGF-β signaling was present in LTBP-4S promoter, TGF-β did not induce its transcriptional activity. LTBP-3 promoter activity and mRNA expression instead were stimulated by TGF-β1 in osteosarcoma cells. It was found that the stimulatory effect of TGF-β was mediated by Smad and Erk MAPK signaling pathways. The current work explored the ECM targeting of LTBP-4S and identified binding partners of this protein. It was found that the N-terminal end of LTBP-4S possesses fibronectin (FN) binding sites which are critical for its ECM targeting. FN deficient fibroblasts incorporated LTBP-4S into their ECM only after addition of exogenous FN. Furthermore, LTBP-4S was found to have heparin binding regions, of which the C-terminal binding site mediated fibroblast adhesion. Soluble heparin prevented the ECM association of LTBP-4S in fibroblast cultures. In the current work it was observed that there are significant differences in the secretion, processing and ECM targeting of LTBP-4S and -4L. Interestingly, it was observed that most of the secreted LTBP-4L was associated with latent TGF-β1, whereas LTBP-4S was mainly secreted as a free form from CHO cells. This thesis provides information on transcriptional regulation of LTBP-3 and -4 genes, which is required for the deeper understanding of their tissue-specific functions. Further, the current work elucidates the structural variability of LTBPs, which appears to have impact on secretion and ECM targeting of TGF-β. These findings may advance understanding the abnormal activation of TGF-β which is associated with connective tissue disorders and cancer.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Latent transforming growth factor-beta (TGF-beta) binding proteins (LTBPs) -1, -3 and -4 are ECM components whose major function is to augment the secretion and matrix targeting of TGF-beta, a multipotent cytokine. LTBP-2 does not bind small latent TGF-beta but has suggested functions as a structural protein in ECM microfibrils. In the current work we focused on analyzing possible adhesive functions of LTBP-2 as well as on characterizing the kinetics and regulation of LTBP-2 secretion and ECM deposition. We also explored the role of TGF-beta binding LTBPs in endothelial cells activated to mimic angiogenesis as well as in malignant mesothelioma. We found that, unlike most adherent cells, several melanoma cell lines efficiently adhered to purified recombinant LTBP-2. Further characterization revealed that the adhesion was mediated by alpha3beta1 and alpha6beta1 integrins. Heparin also inhibited the melanoma cell adhesion suggesting a role for heparan sulphate proteoglycans. LTBP-2 was also identified as a haptotactic substrate for melanoma cell migration. We used cultured human embryonic lung fibroblasts to analyze the temporal and spatial association of LTBP-2 into ECM. By We found that LTBP-2 was efficiently assembled to the ECM only in confluent cultures following the deposition of fibronectin (FN) and fibrillin-1. In early, subconfluent cultures it remained primarily in soluble form after secretion. LTBP-2 colocalized transiently with FN and fibrillin-1. Silencing of fibrillin-1 expression by lentiviral shRNAs profoundly disrupted the deposition of LTBP-2 indicating that the ECM association of LTBP-2 depends on a pre-formed fibrillin-1 network. Considering the established role of TGF-beta as a regulator of angiogenesis we induced morphological activation of endothelial cells by phorbol 12-myristate 13-acetate (PMA) and followed the fate of LTBP-1 in the endothelial ECM. This resulted in profound proteolytic processing of LTBP-1 and release of latent TGF-beta complexes from the ECM. The processing was coupled with increased activation of MT-MMPs and specific upregulation of MT1-MMP. The major role of MT1-MMP in the proteolysis of LTBP-1 was confirmed by suppressing the expression with lentivirally induced short-hairpin RNAs as well as by various metalloproteinases inhibitors. TGF-beta can promote tumorigenesis of malignant mesothelioma (MM), which is an aggressive tumor of the pleura with poor prognosis. TGF-beta activity was analyzed in a panel of MM tumors by immunohistochemical staining of phosphorylated Smad-2 (P-Smad2). The tumor cells were strongly positive for P-Smad2 whereas LTBP-1 immunoreactivity was abundant in the stroma, and there was a negative correlation between LTBP-1 and P-Smad2 staining. In addition, the high P-Smad2 immunoreactivity correlated with shorter survival of patients. mRNA analysis revealed that TGF-beta1 was the most highly expressed isoform in both normal human pleura and MM tissue. LTBP-1 and LTBP-3 were both abundantly expressed. LTBP-1 was the predominant isoform in established MM cell lines whereas the expression of LTBP-3 was high in control cells. Suppression of LTBP-3 expression by siRNAs resulted in increased TGF-beta activity in MM cell lines accompanied by decreased proliferation. Our results suggest that decreased expression of LTBP-3 in MM could alter the targeting of TGF-beta to the ECM and lead to its increased activation. The current work emphasizes the coordinated process of the assembly and appropriate targeting of LTBPs with distinct adhesive or cytokine harboring properties into the ECM. The hierarchical assembly may have implications in the modulation of signaling events during morphogenesis and tissue remodeling.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pectobacterium atrosepticum on Gram-negatiivinen bakteeri, joka aiheuttaa perunan tyvi- ja märkämätää. P. atrosepticum bakteerin optimilämpötila on melko alhainen ja se on yleinen lauhkeilla alueilla. Tyvimätä leviää pääasiassa siemenperunan välityksellä ja siksi se on ongelma erityisesti siemenperunan tuotannossa. P. atrosepticum kannan SCRI1043 genomi on julkaistu ja sitä tutkitaan malliorganismina märkä- ja tyvimädän taudinaiheuttamisen ymmärtämiseksi. Tämä opportunistinen taudinaiheuttaja voi elää isäntäkasvissa kuukausia piilevänä, aiheuttamatta näkyviä oireita. Suotuisissa olosuhteissa bakteerit alkavat jakautua ja tuottaa kasvin kudoksia hajottavia entsyymejä. Mädäntyvä kasvimassa tarjoaa ravinteita bakteerien kasvuun ja mahdollistaa isäntäkasvin asuttamisen. Soluseiniä hajottavien entsyymien merkitys taudinaiheuttamisessa on hyvin tunnettu, mutta oireettomasta jaksosta ja taudin alkuvaiheista tiedätään vain vähän. Bakteerin genomi sisältää monia toksiineja, adhesiineja, hemolysiineja ja muita proteiineja, joilla saattaa olla merkitys taudinaiheuttamisessa. Tässä työssä käytettiin proteomiikkaa ja mikrosiruanalysiä P. atrosepticum bakteerin erittyvien proteiinien ja geeniekspression tutkimiseen. Proteiinit, jotka eritetään ulos bakteerista, toimivat todennäköisesti taudinaiheuttamisessa, koska ne ovat suorassa kontaktissa isäntäkasvin kanssa. Analyysit suoritettiin olosuhteissa, jotka muistuttavat kasvin soluvälitilaa: matala pH, vähän ravinteita ja matala lämpötila. Isäntäkasvin läsnäolon vaikutusta proteiinien tuottoon ja geeniekspressioon tutkittiin lisäämällä perunauutetta kasvatusalustaan. Tutkimuksessa tunnistettiin P. atrosepticum bakteerin monia jo tunnettuja ja mahdollisesti taudinaiheuttamiseen liittyviä proteiineja. Perunauute lisäsi hiljattain tunnistetun, proteiinien eritysreittiä (tyyppi VI sekreetio, T6SS) koodaavien geenien ilmentymistä. Lisäksi bakteerin havaittiin erittävän useita T6SS:n liittyviä proteiineja kasvualustaan, johon oli lisätty perunauutetta. T6SS:n merkitys bakteereille on vielä epäselvä ja sen vaikutuksesta taudinaiheuttamiseen on julkaistu ristiriitaisia tuloksia. Märkä- ja tyvimädän ymmärtäminen molekulaarisella tasolla luo pohjan tautien kontrollointiin tähtäävään soveltavaan tutkimukseen. Tämä tutkimus lisää tietoa kasvi-patogeeni- interaktiosta ja sitä voidaan tulevaisuudessa käyttää hyväksi esimerkiksi diagnostiikassa, resistenttien perunalajikkeiden jalostuksessa tai viljely- ja varastointiolosuhteiden parantamisessa.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A straightforward computation of the list of the words (the `tail words' of the list) that are distributionally most similar to a given word (the `head word' of the list) leads to the question: How semantically similar to the head word are the tail words; that is: how similar are their meanings to its meaning? And can we do better? The experiment was done on nearly 18,000 most frequent nouns in a Finnish newsgroup corpus. These nouns are considered to be distributionally similar to the extent that they occur in the same direct dependency relations with the same nouns, adjectives and verbs. The extent of the similarity of their computational representations is quantified with the information radius. The semantic classification of head-tail pairs is intuitive; some tail words seem to be semantically similar to the head word, some do not. Each such pair is also associated with a number of further distributional variables. Individually, their overlap for the semantic classes is large, but the trained classification-tree models have some success in using combinations to predict the semantic class. The training data consists of a random sample of 400 head-tail pairs with the tail word ranked among the 20 distributionally most similar to the head word, excluding names. The models are then tested on a random sample of another 100 such pairs. The best success rates range from 70% to 92% of the test pairs, where a success means that the model predicted my intuitive semantic class of the pair. This seems somewhat promising when distributional similarity is used to capture semantically similar words. This analysis also includes a general discussion of several different similarity formulas, arranged in three groups: those that apply to sets with graded membership, those that apply to the members of a vector space, and those that apply to probability mass functions.