11 resultados para Similarity measure

em Helda - Digital Repository of University of Helsinki


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The study of social phenomena in the World Wide Web has been rather fragmentary, andthere is no coherent, reseach-based theory about sense of community in Web environment. Sense of community means part of one's self-concept that has to do with perceiving oneself belonging to, and feeling affinity to a certain social grouping. The present study aimed to find evidence for sense of community in Web environment, and specifically find out what the most critical psychological factors of sense of community would be. Based on known characteristics of real life communities and sense of community, and few occational studies of Web-communities, it was hypothesized that the following factors would be the most critical ones and that they could be grouped as prerequisites, facilitators and consequences of sense of community: awareness and social presence (prerequisites), criteria for membership and borders, common purpose, social interaction and reciprocity, norms and conformity, common history (facilitators), trust and accountability (consequences). In addition to critical factors, the present study aimed to find out if this kind of grouping would be valid. Furthermore, the effect of Web-community members' background variables to sense of community was of interest. In order to answer the questions, an online-questionnaire was created and tested. It included propositions that reflect factors that precede, facilitate and follow the sense of community in Web environment. A factor analysis was calculated to find out the critical factors and analyses of variance were calculated to see if the grouping to prerequisites, facilitators and consequences was right and how the background variables would affect the sense of community in Web environment. The results indicated that the psychological structure of sense of community in Web environment could not be presented with critical variables grouped as prerequisites, facilitators and consequences. Most factors did facilitate the sense of community, but based on this data it could not be argued that some of the factors chronologically precedesense of community and some follow it. Instead, the factor analysis revealed that the most critical factors in sense of community in Web environment are 1) reciprocal involvement, 2) basic trust for others, 3) similarity and common purpose of members, and 4) shared history of members. The most influencing background variables were the member's own participation activity (indicated with reading and writing messages) and the phase in membership lifecycle (from visitor to leader). The more the member participated and the further in membership life cycle he was, the more he felt sense of community. There are many descreptions of sense of community, but the present study was one of the first to actually measure the phenomenon in Web environment, and that gained well documented, valid results based on large data, proving that sense of community in Web environment is possible, and clarifying its psychological structure, thus enhancing the understanding of sense of community in Web environment. Keywords: sense of community, Web-community, psychology of the Internet

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study addresses three important issues in tree bucking optimization in the context of cut-to-length harvesting. (1) Would the fit between the log demand and log output distributions be better if the price and/or demand matrices controlling the bucking decisions on modern cut-to-length harvesters were adjusted to the unique conditions of each individual stand? (2) In what ways can we generate stand and product specific price and demand matrices? (3) What alternatives do we have to measure the fit between the log demand and log output distributions, and what would be an ideal goodness-of-fit measure? Three iterative search systems were developed for seeking stand-specific price and demand matrix sets: (1) A fuzzy logic control system for calibrating the price matrix of one log product for one stand at a time (the stand-level one-product approach); (2) a genetic algorithm system for adjusting the price matrices of one log product in parallel for several stands (the forest-level one-product approach); and (3) a genetic algorithm system for dividing the overall demand matrix of each of the several log products into stand-specific sub-demands simultaneously for several stands and products (the forest-level multi-product approach). The stem material used for testing the performance of the stand-specific price and demand matrices against that of the reference matrices was comprised of 9 155 Norway spruce (Picea abies (L.) Karst.) sawlog stems gathered by harvesters from 15 mature spruce-dominated stands in southern Finland. The reference price and demand matrices were either direct copies or slightly modified versions of those used by two Finnish sawmilling companies. Two types of stand-specific bucking matrices were compiled for each log product. One was from the harvester-collected stem profiles and the other was from the pre-harvest inventory data. Four goodness-of-fit measures were analyzed for their appropriateness in determining the similarity between the log demand and log output distributions: (1) the apportionment degree (index), (2) the chi-square statistic, (3) Laspeyres quantity index, and (4) the price-weighted apportionment degree. The study confirmed that any improvement in the fit between the log demand and log output distributions can only be realized at the expense of log volumes produced. Stand-level pre-control of price matrices was found to be advantageous, provided the control is done with perfect stem data. Forest-level pre-control of price matrices resulted in no improvement in the cumulative apportionment degree. Cutting stands under the control of stand-specific demand matrices yielded a better total fit between the demand and output matrices at the forest level than was obtained by cutting each stand with non-stand-specific reference matrices. The theoretical and experimental analyses suggest that none of the three alternative goodness-of-fit measures clearly outperforms the traditional apportionment degree measure. Keywords: harvesting, tree bucking optimization, simulation, fuzzy control, genetic algorithms, goodness-of-fit

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A straightforward computation of the list of the words (the `tail words' of the list) that are distributionally most similar to a given word (the `head word' of the list) leads to the question: How semantically similar to the head word are the tail words; that is: how similar are their meanings to its meaning? And can we do better? The experiment was done on nearly 18,000 most frequent nouns in a Finnish newsgroup corpus. These nouns are considered to be distributionally similar to the extent that they occur in the same direct dependency relations with the same nouns, adjectives and verbs. The extent of the similarity of their computational representations is quantified with the information radius. The semantic classification of head-tail pairs is intuitive; some tail words seem to be semantically similar to the head word, some do not. Each such pair is also associated with a number of further distributional variables. Individually, their overlap for the semantic classes is large, but the trained classification-tree models have some success in using combinations to predict the semantic class. The training data consists of a random sample of 400 head-tail pairs with the tail word ranked among the 20 distributionally most similar to the head word, excluding names. The models are then tested on a random sample of another 100 such pairs. The best success rates range from 70% to 92% of the test pairs, where a success means that the model predicted my intuitive semantic class of the pair. This seems somewhat promising when distributional similarity is used to capture semantically similar words. This analysis also includes a general discussion of several different similarity formulas, arranged in three groups: those that apply to sets with graded membership, those that apply to the members of a vector space, and those that apply to probability mass functions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acute heart failure (AHF) is a complex syndrome associated with exceptionally high mortality. Still, characteristics and prognostic factors of contemporary AHF patients have been inadequately studied. Kidney function has emerged as a very powerful prognostic risk factor in cardiovascular disease. This is believed to be the consequence of an interaction between the heart and kidneys, also termed the cardiorenal syndrome, the mechanisms of which are not fully understood. Renal insufficiency is common in heart failure and of particular interest for predicting outcome in AHF. Cystatin C (CysC) is a marker of glomerular filtration rate with properties making it a prospective alternative to the currently used measure creatinine for assessment of renal function. The aim of this thesis is to characterize a representative cohort of patients hospitalized for AHF and to identify risk factors for poor outcome in AHF. In particular, the role of CysC as a marker of renal function is evaluated, including examination of the value of CysC as a predictor of mortality in AHF. The FINN-AKVA (Finnish Acute Heart Failure) study is a national prospective multicenter study conducted to investigate the clinical presentation, aetiology and treatment of, as well as concomitant diseases and outcome in, AHF. Patients hospitalized for AHF were enrolled in the FINN-AKVA study, and mortality was followed for 12 months. The mean age of patients with AHF is 75 years and they frequently have both cardiovascular and non-cardiovascular co-morbidities. The mortality after hospitalization for AHF is high, rising to 27% by 12 months. The present study shows that renal dysfunction is very common in AHF. CysC detects impaired renal function in forty percent of patients. Renal function, measured by CysC, is one of the strongest predictors of mortality independently of other prognostic risk markers, such as age, gender, co-morbidities and systolic blood pressure on admission. Moreover, in patients with normal creatinine values, elevated CysC is associated with a marked increase in mortality. Acute kidney injury, defined as an increase in CysC within 48 hours of hospital admission, occurs in a significant proportion of patients and is associated with increased short- and mid-term mortality. The results suggest that CysC can be used for risk stratification in AHF. Markers of inflammation are elevated both in heart failure and in chronic kidney disease, and inflammation is one of the mechanisms thought to mediate heart-kidney interactions in the cardiorenal syndrome. Inflammatory cytokines such as interleukin-6 (IL-6) and tumor necrosis factor-alpha (TNF-α) correlate very differently to markers of cardiac stress and renal function. In particular, TNF-α showed a robust correlation to CysC, but was not associated with levels of NT-proBNP, a marker of hemodynamic cardiac stress. Compared to CysC, the inflammatory markers were not strongly related to mortality in AHF. In conclusion, patients with AHF are elderly with multiple co-morbidities, and renal dysfunction is very common. CysC demonstrates good diagnostic properties both in identifying impaired renal function and acute kidney injury in patients with AHF. CysC, as a measure of renal function, is also a powerful prognostic marker in AHF. CysC shows promise as a marker for assessment of kidney function and risk stratification in patients hospitalized for AHF.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Based on the Aristotelian criterion referred to as 'abductio', Peirce suggests a method of hypothetical inference, which operates in a different way than the deductive and inductive methods. “Abduction is nothing but guessing” (Peirce, 7.219). This principle is of extreme value for the study of our understanding of mathematical self-similarity in both of its typical presentations: relative or absolute. For the first case, abduction incarnates the quantitative/qualitative relationships of a self-similar object or process; for the second case, abduction makes understandable the statistical treatment of self-similarity, 'guessing' the continuity of geometric features to the infinity through the use of a systematic stereotype (for instance, the assumption that the general shape of the Sierpiński triangle continuates identically into its particular shapes). The metaphor coined by Peirce, of an exact map containig itself the same exact map (a map of itself), is not only the most important precedent of Mandelbrot’s problem of measuring the boundaries of a continuous irregular surface with a logarithmic ruler, but also still being a useful abstraction for the conceptualisation of relative and absolute self-similarity, and its mechanisms of implementation. It is useful, also, for explaining some of the most basic geometric ontologies as mental constructions: in the notion of infinite convergence of points in the corners of a triangle, or the intuition for defining two parallel straight lines as two lines in a plane that 'never' intersect.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

FTIR-spektroskopia (Fourier-muunnosinfrapunaspektroskopia) on nopea analyysimenetelmä. Fourier-laitteissa interferometrin käyttäminen mahdollistaa koko infrapunataajuusalueen mittaamisen muutamassa sekunnissa. ATR-liitännäisellä varustetun FTIR-spektrometrin käyttö ei edellytä juuri näytteen valmistusta ja siksi menetelmä on käytössä myös helppo. ATR-liitännäinen mahdollistaa myös monien erilaisten näytteiden analysoinnin. Infrapunaspektrin mittaaminen onnistuu myös sellaisista näytteistä, joille perinteisiä näytteenvalmistusmenetelmiä ei voida käyttää. FTIR-spektroskopian avulla saatu tieto yhdistetään usein tilastollisiin monimuuttuja-analyyseihin. Klusterianalyysin avulla voidaan spektreistä saatu tieto ryhmitellä samanlaisuuteen perustuen. Hierarkkisessa klusterianalyysissa objektien välinen samanlaisuus määritetään laskemalla niiden välinen etäisyys. Pääkomponenttianalyysin avulla vähennetään datan ulotteisuutta ja luodaan uusia korreloimattomia pääkomponentteja. Pääkomponenttien tulee säilyttää mahdollisimman suuri määrä alkuperäisen datan variaatiosta. FTIR-spektroskopian ja monimuuttujamenetelmien sovellusmahdollisuuksia on tutkittu paljon. Elintarviketeollisuudessa sen soveltuvuutta esimerkiksi laadun valvontaan on tutkittu. Menetelmää on käytetty myös haihtuvien öljyjen kemiallisten koostumusten tunnistukseen sekä öljykasvien kemotyyppien havaitsemiseen. Tässä tutkimuksessa arvioitiin menetelmän käyttöä suoputken uutenäytteiden luokittelussa. Tutkimuksessa suoputken eri kasvinosien uutenäytteiden FTIR-spektrejä vertailtiin valikoiduista puhdasaineista mitattuihin FTIR-spektreihin. Puhdasaineiden FTIR-spektreistä tunnistettiin niiden tyypilliset absorptiovyöhykkeet. Furanokumariinien spektrien intensiivisten vyöhykkeiden aaltolukualueet valittiin monimuuttuja-analyyseihin. Monimuuttuja-analyysit tehtiin myös IR-spektrin sormenjälkialueelta aaltolukualueelta 1785-725 cm-1. Uutenäytteitä pyrittiin luokittelemaan niiden keräyspaikan ja kumariinipitoisuuden mukaan. Keräyspaikan mukaan ryhmittymistä oli havaittavissa, mikä selittyi vyöhykkeiden aaltolukualueiden mukaan tehdyissä analyyseissa pääosin kumariinipitoisuuksilla. Näissä analyyseissa uutenäytteet pääosin ryhmittyivät ja erottuivat kokonaiskumariinipitoisuuksien mukaan. Myös aaltolukualueen 1785-725 cm-1 analyyseissa havaittiin keräyspaikan mukaan ryhmittymistä, mitä kumariinipitoisuudet eivät kuitenkaan selittäneet. Näihin ryhmittymisiin vaikuttivat mahdollisesti muiden yhdisteiden samanlaiset pitoisuudet näytteissä. Analyyseissa käytettiin myös muita aaltolukualueita, mutta tulokset eivät juuri poikenneet aiemmista. 2. kertaluvun derivaattaspektrien monimuuttuja-analyysit sormenjälkialueelta eivät myöskään muuttaneet tuloksia havaittavasti. Jatkotutkimuksissa nyt käytettyä menetelmää on mahdollista edelleen kehittää esimerkiksi tutkimalla monimuuttuja-analyyseissa 2. kertaluvun derivaattaspektreistä suppeampia, tarkkaan valittuja aaltolukualueita.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ecology and evolutionary biology is the study of life on this planet. One of the many methods applied to answering the great diversity of questions regarding the lives and characteristics of individual organisms, is the utilization of mathematical models. Such models are used in a wide variety of ways. Some help us to reason, functioning as aids to, or substitutes for, our own fallible logic, thus making argumentation and thinking clearer. Models which help our reasoning can lead to conceptual clarification; by expressing ideas in algebraic terms, the relationship between different concepts become clearer. Other mathematical models are used to better understand yet more complicated models, or to develop mathematical tools for their analysis. Though helping us to reason and being used as tools in the craftmanship of science, many models do not tell us much about the real biological phenomena we are, at least initially, interested in. The main reason for this is that any mathematical model is a simplification of the real world, reducing the complexity and variety of interactions and idiosynchracies of individual organisms. What such models can tell us, however, both is and has been very valuable throughout the history of ecology and evolution. Minimally, a model simplifying the complex world can tell us that in principle, the patterns produced in a model could also be produced in the real world. We can never know how different a simplified mathematical representation is from the real world, but the similarity models do strive for, gives us confidence that their results could apply. This thesis deals with a variety of different models, used for different purposes. One model deals with how one can measure and analyse invasions; the expanding phase of invasive species. Earlier analyses claims to have shown that such invasions can be a regulated phenomena, that higher invasion speeds at a given point in time will lead to a reduction in speed. Two simple mathematical models show that analysis on this particular measure of invasion speed need not be evidence of regulation. In the context of dispersal evolution, two models acting as proof-of-principle are presented. Parent-offspring conflict emerges when there are different evolutionary optima for adaptive behavior for parents and offspring. We show that the evolution of dispersal distances can entail such a conflict, and that under parental control of dispersal (as, for example, in higher plants) wider dispersal kernels are optimal. We also show that dispersal homeostasis can be optimal; in a setting where dispersal decisions (to leave or stay in a natal patch) are made, strategies that divide their seeds or eggs into fractions that disperse or not, as opposed to randomized for each seed, can prevail. We also present a model of the evolution of bet-hedging strategies; evolutionary adaptations that occur despite their fitness, on average, being lower than a competing strategy. Such strategies can win in the long run because they have a reduced variance in fitness coupled with a reduction in mean fitness, and fitness is of a multiplicative nature across generations, and therefore sensitive to variability. This model is used for conceptual clarification; by developing a population genetical model with uncertain fitness and expressing genotypic variance in fitness as a product between individual level variance and correlations between individuals of a genotype. We arrive at expressions that intuitively reflect two of the main categorizations of bet-hedging strategies; conservative vs diversifying and within- vs between-generation bet hedging. In addition, this model shows that these divisions in fact are false dichotomies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Self-similarity, a concept taken from mathematics, is gradually becoming a keyword in musicology. Although a polysemic term, self-similarity often refers to the multi-scalar feature repetition in a set of relationships, and it is commonly valued as an indication for musical coherence and consistency . This investigation provides a theory of musical meaning formation in the context of intersemiosis, that is, the translation of meaning from one cognitive domain to another cognitive domain (e.g. from mathematics to music, or to speech or graphic forms). From this perspective, the degree of coherence of a musical system relies on a synecdochic intersemiosis: a system of related signs within other comparable and correlated systems. This research analyzes the modalities of such correlations, exploring their general and particular traits, and their operational bounds. Looking forward in this direction, the notion of analogy is used as a rich concept through its two definitions quoted by the Classical literature: proportion and paradigm, enormously valuable in establishing measurement, likeness and affinity criteria. Using quantitative qualitative methods, evidence is presented to justify a parallel study of different modalities of musical self-similarity. For this purpose, original arguments by Benoît B. Mandelbrot are revised, alongside a systematic critique of the literature on the subject. Furthermore, connecting Charles S. Peirce s synechism with Mandelbrot s fractality is one of the main developments of the present study. This study provides elements for explaining Bolognesi s (1983) conjecture, that states that the most primitive, intuitive and basic musical device is self-reference, extending its functions and operations to self-similar surfaces. In this sense, this research suggests that, with various modalities of self-similarity, synecdochic intersemiosis acts as system of systems in coordination with greater or lesser development of structural consistency, and with a greater or lesser contextual dependence.